| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With current makefiles the build fails because source and build paths
are generated incorrectly. With Android build system the top_srcdir and
top_builddir variables are undefined and all paths are relative to where
Android.mk is located. This ends up with path likes
external/mesa/src/mesa/src/mesa/ for both source and build paths, which
are obviously wrong.
This patch fixes this by overriding resulting SRCDIR and BUILDDIR
variables with empty string, so that paths end up being relative to
Android.mk file again. Appending correct build path to generated files
is already done in Android.gen.mk.
Signed-off-by: Tomasz Figa <[email protected]>
CC: <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
(cherry picked from commit b4ffd19e6c9f61dfa4e0eda1f606cd255b27208f)
|
|
|
|
|
|
|
|
|
|
|
| |
Current Android makefiles lack generation of format_info.c, which is
a dependency of main/format.c. This patch adds necessary code to
Android.gen.mk.
Signed-off-by: Tomasz Figa <[email protected]>
CC: <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
(cherry picked from commit 98445fd25e4f0bd7dc4d7a2a843b7fbe76c9756d)
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes Android build failures by including src/util directory
in compilation. Files inside of this directory are compiled into
libmesa_util static library and linked with resulting libGLES_mesa.
Signed-off-by: Tomasz Figa <[email protected]>
CC: <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
(cherry picked from commit d703abf735bc2fe27af893d07e44598b8601b172)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of just segfaulting in the driver when a buffer allocation fails,
report error messages indicating what went wrong so that we can debug things.
As a simple example, chromium wraps Mesa in a sandbox which doesn't allow
access to most syscalls, including the ability to create shared memory
segments for fences. Before, you'd get a simple segfault in mesa and your 3D
acceleration would fail. Now you get:
$ chromium --disable-gpu-blacklist
[10618:10643:0930/200525:ERROR:nss_util.cc(856)] After loading Root Certs, loaded==false: NSS error code: -8018
libGL: pci id for fd 12: 8086:0a16, driver i965
libGL: OpenDriver: trying /local-miki/src/mesa/mesa/lib/i965_dri.so
libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted.
libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted.
libGL error: DRI3 Fence object allocation failure Operation not permitted
[10618:10618:0930/200525:ERROR:command_buffer_proxy_impl.cc(153)] Could not send GpuCommandBufferMsg_Initialize.
[10618:10618:0930/200525:ERROR:webgraphicscontext3d_command_buffer_impl.cc(236)] CommandBufferProxy::Initialize failed.
[10618:10618:0930/200525:ERROR:webgraphicscontext3d_command_buffer_impl.cc(256)] Failed to initialize command buffer.
This made it pretty easy to diagnose the problem in the referenced bug report.
Bugzilla: https://code.google.com/p/chromium/issues/detail?id=415681
Signed-off-by: Keith Packard <[email protected]>
Cc: [email protected]
Reviewed-by: Matt Turner <[email protected]>
(cherry picked from commit 3202926746298468805f54ac5b39d62f9585dabf)
|
|
|
|
|
|
|
|
|
|
|
| |
Commit "st/xa: scissor to help tilers" broke xa_yuv_planar_blit() and vmwgfx
textured video. Fix this by implementing scissors also in the yuv draw path.
Signed-off-by: Thomas Hellstrom <[email protected]>
Reviewed-by: Sinclair Yeh <[email protected]>
Cc: Rob Clark <[email protected]>
Cc: "10.2 10.3" <[email protected]>
(cherry picked from commit 46537f1d03ba6de83be70ac574f633bb4342a327)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some users don't understand that these variables can break OpenGL.
The general is rule is that if an app supports MSAA, you mustn't use
GALLIUM_MSAA.
For example, if an app has an 8xMSAA FBO and GALLIUM_MSAA=4
is set, resolving the FBO to the back buffer will be rejected which will look
like this on all gallium drivers:
http://www.phoronix.com/scan.php?page=article&item=amd_radeonsi_msaa
The environment variables also have no effect on modern apps like TF2, but
there is still a performance hit due to wasted bandwidth and VRAM.
In a nutshell, it does more harm than good.
Cc: 10.2 10.3 <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
(cherry picked from commit 8449121971ce1db03fea19665d314e523fdc10dd)
|
|
|
|
|
|
|
|
|
|
| |
This is the only guaranteed way get the patch level for llvm,
since the define cannot always be found in config.h depending
on the version of llvm or the build system used.
CC: 10.2 10.3 <[email protected]>
Reviewed-by: Jonathan Gray <[email protected]>
(cherry picked from commit ec566e0f169dac33814463e913e5d844a782c61e)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the thing being dereferenced is a record or an array of records, it
should be treated as row-major. The ir_type_derference_record path
already does this, and I think I intended to do the same for this path
in b17a4d5d.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83741
Cc: [email protected]
(cherry picked from commit c3f17bb18f597d7f606805ae94363dae7fd51582)
|
|
|
|
|
|
|
|
|
|
|
|
| |
Per rule #9, the size of the structure is vec4 aligned. The MAX2 in the
loop ensures that sizes >= 16 bytes are vec4 aligned. The new MAX2
after the loop ensures that sizes < 16 bytes are vec4 aligned.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82932
Cc: [email protected]
(cherry picked from commit 2ab71e1486e76722154b48faef8216ff8173fd30)
|
|
|
|
|
|
|
|
|
|
|
|
| |
Whether or not the field is row-major (because it might be a bvec2 or
something) does not affect the array itself. We need to know whether an
array element in its entirety is row-major.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83506
Cc: [email protected]
(cherry picked from commit 5c75270c344815b15ef73e83421192fd7de35972)
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the linker would correctly calculate the layout, but the
lower_ubo_reference pass would not apply correct alignment to fields
following small (less than 16-byte) nested structures.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83533
Cc: [email protected]
(cherry picked from commit 8e01c66da6c780601f941aa5b9939962c219fdbd)
|
|
|
|
|
|
|
|
|
|
| |
Such buffers can only be useful by reading from them with the CPU, so we
need to make sure CPU reads are fast.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84178
Reviewed-by: Marek Olšák <[email protected]>
Cc: [email protected]
(cherry picked from commit 7e55c3b352b6616fa2780f683dd6c8e1a3f61815)
|
|
|
|
|
|
|
|
|
|
| |
There is no dedicated instruction for this, so just combine it with the
constant offset.
Acked-by: Ben Skeggs <[email protected]>
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "10.3" <[email protected]>
(cherry picked from commit a5bbfeda977a62aa3349f0c7d04c5c20156c1faf)
|
|
|
|
|
|
|
|
|
| |
This was missed in the commit that enabled it for fermi/kepler as part
of ARB_gpu_shader5
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "10.3" <[email protected]>
(cherry picked from commit cdc4de121564a47cbdac760622b6dc7112e548aa)
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "10.3" <[email protected]>
(cherry picked from commit 0532a5fd00cdddda0fd1727fb519cb4312f47e83)
|
|
|
|
|
|
|
|
| |
This parallels the fixes in commit afea9bae.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "10.3" <[email protected]>
(cherry picked from commit d3c3bba6d07c97cfc1499a6bda73337584943971)
|
|
|
|
|
|
|
|
|
|
|
|
| |
What happens is that a SPLIT operation is part of the spill node, and as
a pseudo op, the instruction gets erased after processing its first def.
However the later defs still need to refer to it, so instead delay
deleting until after that whole RA node is done processing.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79462
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "10.2 10.3" <[email protected]>
(cherry picked from commit 0147c10c5f00b43696ba660aab604d674a75e83c)
|
|
|
|
|
|
|
|
|
|
|
|
| |
I'm not familiar with this code, but this sure appears to be a typo.
It looks like the intent is to set each array element, not arrays[0]
each time. Notably, the loop just below uses "array", not "arrays".
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Fredrik Höglund <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Cc: [email protected]
(cherry picked from commit f81052dc9b99eca765a44decd01af0335350d0b2)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code in get.c that handles this uses ctx->Array.VAO->VertexAttrib,
which is a gl_vertex_attrib_array structure, not a gl_client_array.
The offsets of all fields happened to be the same in both structures, at
least on x86_64. "Size," "Type," and "Stride" are obviously the same:
both structures start with the same fields, in the same order.
"Enabled" is dicier: there are different fields before it in both
structures, including pointer sized values which might need special
alignment.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Fredrik Höglund <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Cc: [email protected]
(cherry picked from commit d0ec6e85099af68b8a36f9815f4e3d43d767bb38)
|
|
|
|
|
|
|
|
| |
Cc: 10.2 10.3 <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
(cherry picked from commit dc05a9e4e089d66a2ffe8919857ad9660e108c28)
[Emil Velikov: remove unref scratch_bo, s/si_shader/si_pipe_shader/]
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
Cc: 10.2 10.3 <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
(cherry picked from commit 711623f7c8113d43f2d54ebfe5cbed3d406a3c79)
[Emil Velikov: s/ring/ring.buffer/]
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Width and Height of the imported image was never initialized from the
imported bo.
Cc: 10.2 10.3 <[email protected]>
Signed-off-by: Andreas Pokorny <[email protected]>
Reviewed-by: Daniel Stone <[email protected]>
(cherry picked from commit df341320c9be34c40b76e42510640120e0ebe0d3)
|
|
|
|
|
|
|
|
|
|
| |
This changes enables EGL_KHR_image_pixmap in the egl drm platform, which is implemented
there but has not been advertised yet.
Cc: 10.2 10.3 <[email protected]>
Signed-off-by: Andreas Pokorny <[email protected]>
Reviewed-by: Daniel Stone <[email protected]>
(cherry picked from commit 53b614bfd3c12368347b2953121e815add68d90b)
|
|
|
|
|
|
|
|
|
|
|
|
| |
ffeb77c7b0552a8624e46e65d6347240ac5ae84d had a typo which turned all signed
integer divisions into unsigned ones. Oops.
This gets us back the 51 little piglits
(all from glsl built-in-functions, fs/vs/gs-op-div-int-ivec2 and similar).
Cc: "10.2 10.3" <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
(cherry picked from commit 5e1fcc625824ae962d5f658e151e6bc2665adce8)
|
|
|
|
|
|
|
|
|
|
|
|
| |
While the result of signed integer division by zero is undefined by glsl
(and doesn't exist with d3d10), we must not crash, so need to make sure we
don't get sigfpe much like udiv already does.
Unlike udiv where we return 0xffffffff (as required by d3d10) there is
no requirement right now to return anything specific so we use zero.
(cherry picked from commit ffeb77c7b0552a8624e46e65d6347240ac5ae84d)
Nominated-by: Roland Scheidegger <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83570
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The spec says that mem objects should maintain a stack of callbacks
not just one.
v2:
- Remove stray printf.
Reviewed-by: Francisco Jerez <[email protected]>
CC: "10.3" <[email protected]>
(cherry picked from commit c6d980140913307d48648058ec24da42a31fc37c)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SWZ instruction can have swizzle terms >4 (SWIZZLE_ZERO, SWIZZLE_ONE).
These swizzle terms caused a few assertions to fail.
This started happening after the commit "mesa: Actually use the Mesa IR
optimizer for ARB programs." when replaying some apitrace files.
A new piglit test (tests/asmparsertest/shaders/ARBfp1.0/swz-08.txt)
exercises this.
Cc: "10.3" <[email protected]>
Reviewed-by: Charmaine Lee <[email protected]>
(cherry picked from commit 7b2c7032446da4138dedeee8feaa79d741f1f108)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Luminance is the least-significant byte of the uint16, rather than the
lowest byte in memory. Other parts of mesa already handle this correctly
for big-endian, and swrast already handles other MESA_FORMAT_x8y8 formats
correctly. This case was just an odd-one-out.
Signed-off-by: Richard Sandiford <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Cc: <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
(cherry picked from commit ecc48f83c8359e3ef64ea40dfb6074f4a1a38dc1)
|
|
|
|
|
|
|
|
|
|
|
|
| |
The function was using the "X" component as the alpha channel,
rather than setting alpha to 1.0.
Signed-off-by: Richard Sandiford <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
(cherry picked from commit 3ff5c6a6c472288fa5f50d880621f38ea94b9c23)
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit 567e2769b81863b6dffdac3826a6b729ce6ea37c ("ra: make the p, q
test more efficient") I unknowingly introduced a new requirement to the
register allocator API: the user must set the register class of all
nodes before setting up their interferences, because
ra_add_conflict_list() now uses the classes of the two interfering
nodes. i965 already did this, but r300g was setting up register classes
interleaved with setting up the interference graph. This led to us
calculating the wrong q total, and in certain cases
e78a01d5e6f77e075fe667a0f0ccb10d89c0dd58 (" ra: optimistically color
only one node at a time") made it so that this bug caused a segfault. In
particular, the error occurred if the q total was decremented to 1 below
0 for the last node to be pushed onto the stack. Since q_total is an
unsigned integer, it overflowed to 0xffffffff, which is what
lowest_q_total happens to be initialzed to. This means that we would
fail the "new_q_total < lowest_q_total" check on line 476 of
register_allocate.c, and so the node would never be pushed onto the
stack, which led to segfaults in ra_select() when we failed to ever give
it a register.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82828
Cc: "10.3" <[email protected]>
Signed-off-by: Connor Abbott <[email protected]>
Tested-by: Pavel Ondračka <[email protected]>
Reviewed-by: Tom Stellard <[email protected]>
(cherry picked from commit afd82dcad127b64381ca6d80d0e499368074f474)
|
|
|
|
|
|
|
|
|
|
| |
This allows for importing foreign buffers in RGB32 native endian
byte order, i.e. DRM_FORMAT_XBGR8888, and DRM_FORMAT_ABGR8888.
Signed-off-by: Gwenole Beauchesne <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
Cc: "10.3" <[email protected]>
(cherry picked from commit e1c50abf8a0ca1d541c4e2dbd5ed1805ed958ba7)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit afe3d1556f6b77031f7025309511a0eea2a3e8df (i965: Stop doing
remapping of "special" regs.) stopped remapping delta_x/delta_y, and
additionally stopped considering them always-live. We later realized
delta_x was used in register allocaiton, so we actually needed to remap
it, which was fixed in commit 23d782067ae834ad53522b46638ea21c62e94ca3
(i965/fs: Keep track of the register that hold delta_x/delta_y.).
However, that commit didn't restore the "always consider it live" part.
If all the code using delta_x was eliminated, fs_visitor::delta_x would
be left pointing at its old register number. Later code in register
allocation would handle that register number specially...even though it
wasn't actually delta_x.
To combat this, set delta_x/y to BAD_FILE if they're eliminated, and
check for that.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83127
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: "10.3" <[email protected]>
(cherry picked from commit 78bd12619474e98503965541c61c5d7e9c408110)
|
|
|
|
|
|
|
|
|
|
|
| |
Fallback cases in lp_bld_arit.c used 2^24 to mean "2 to the power 24",
but in C it's "2 xor 24", i.e. 26. Fixed by using 1<< instead.
Signed-off-by: Richard Sandiford <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
Cc: "10.2 10.3" <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
(cherry picked from commit 1a65629ccc590fe04a97b6df63d73e349b793619)
|
|
|
|
|
|
|
|
| |
Reported by Coverity
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "10.2 10.3" <[email protected]>
(cherry picked from commit b13a4ca3f7f622cbf688eec14d3f4156533af44e)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unreference the ctx->_Shader object before we delete all the pipeline
objects in the hash table. Before, ctx->_Shader could point to freed
memory when _mesa_reference_pipeline_object(ctx, &ctx->_Shader, NULL)
was called.
Fixes crash when exiting the piglit rendezvous_by_location test on
Windows.
Cc: [email protected]
Reviewed-by: Ian Romanick <[email protected]>
(cherry picked from commit 0d73ac6b02cac46d4a8f3cd1ffa591e071577fa7)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Needed for assert.
Fixes build on BE archs with -Werror=implicit-function-declaration.
In file included from
../../../../../src/gallium/auxiliary/draw/draw_fs.c:30:0:
../../../../../src/gallium/auxiliary/util/u_math.h: In function
'util_memcpy_cpu_to_le32':
../../../../../src/gallium/auxiliary/util/u_math.h:810:4: error:
implicit declaration of function 'assert'
[-Werror=implicit-function-declaration]
assert(n % 4 == 0);
^
Cc: "10.3" <[email protected]>
Signed-off-by: Andreas Boll <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
(cherry picked from commit 2a13ff954d3d8cea73bbcf728edffa867828cb78)
|
|
|
|
|
|
|
|
| |
The _Enabled property already has the relevant information.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "10.2 10.3" <[email protected]>
(cherry picked from commit 3c81de58512f0615df1d90aa79a22c9a44c7189e)
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "10.2 10.3" <[email protected]>
(cherry picked from commit 79959e5de518c59b327a9df4a6fa80a68213b873)
|
|
|
|
|
|
|
|
|
| |
No idea why it was added, but the code runs fine even on videos
where it triggers.
Signed-off-by: Maarten Lankhorst <[email protected]>
Cc: "10.2 10.3" <[email protected]>
(cherry picked from commit 8ab85bfcd5ddd44c50e5b384222731cb2a1a1496)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes a regression from "nouveau/vdec: small fixes to h264 handling"
New picking order for frames:
1. Vidbuf pointer matches.
2. Take the first kicked ref.
3. If that fails, take a ref that has a different last_used.
Signed-off-by: Maarten Lankhorst <[email protected]>
Cc: "10.2 10.3" <[email protected]>
(cherry picked from commit a41aad843108cec1901c88a76d5ceb4ede2e062b)
|
|
|
|
|
|
|
|
| |
Reorder some fields to make I-frame decoding work correctly.
Signed-off-by: Maarten Lankhorst <[email protected]>
Cc: "10.2 10.3" <[email protected]>
(cherry picked from commit 121ceb38f45daacc938349d9d5aa82776b78dbab)
|
|
|
|
|
|
|
|
|
|
| |
The BSP bo might be too small to contain all of the bsp data,
bump its size on overflow. Also bump inter_bo when this happens,
it might be too small otherwise.
Signed-off-by: Maarten Lankhorst <[email protected]>
Cc: "10.2 10.3" <[email protected]>
(cherry picked from commit f6afed7076a6ef446dbec7cb10c8f8c60efafccd)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the source is not a GRF, it could have a register >= virtual_grf_count.
Accessing virtual_grf_end with such a register would lead to
out-of-bounds access. Make sure the source is a GRF before accessing
virtual_grf_end.
Fixes Valgrind complaints while compiling some shaders.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: [email protected]
(cherry picked from commit 7aeb853c90c2e84fdd4b6b0af97566562c912861)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
So far we have been using CL_INVOCATION_COUNT to resolve this query but this
is no good with streams, as only stream 0 reaches the clipping stage.
From ARB_transform_feedback3:
"When a generated primitive query for a vertex stream is active, the
primitives-generated count is incremented every time a primitive emitted to
that stream reaches the Discarding Rasterization stage (see Section 3.x)
right before rasterization. This counter is incremented whether or not
transform feedback is active."
Unfortunately, we don't have any registers that provide the number of primitives
written to a specific stream other than the ones that track the number of
primitives written to transform feedback in the SOL stage, so we can't
implement this exactly as specified.
In the past we implemented this feature by activating the SOL unit even if
transform feeback was disabled, but making it so that all buffers were
disabled and it only recorded statistics, which gave us the right semantics
(see 3178d2474ae5bdd1102fb3d76a60d1d63c961ff5). Unfortunately, this came with
a significant performance impact and had to be reverted.
This new take does not intend to implement the exact semantics required by
the spec, but improves what we have now, since now we return the primitive
count for stream 0 in all cases. With this patch we use
GEN7_SO_PRIM_STORAGE_NEEDED to resolve GL_PRIMITIVES_GENERATED queries
for non-zero streams. This would return the number of primitives written
to transform feedback for each stream instead. Since non-zero streams are
only useful in combination with transform feedback this should not be too
bad, and the only case that I think we would not be supporting would be
the one in which we want to use both GL_PRIMITIVES_GENERATED and
GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN on the same non-zero stream to
detect buffer overflow.
This patch also fixes the following piglit test:
arb_gpu_shader5-xfb-streams-without-invocations
This test uses both GL_PRIMITIVES_GENERATED and
GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN queries on non-zero streams, but it
does never hit the overflow case, so both queries are always expected to return
the same value.
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "10.3" <[email protected]>
(cherry picked from commit f976b4c1bf2271cf986be8204147ae986380cc91)
Nominated-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ir_rvalue::constant_expression_value() recursively walks down an IR
tree, attempting to reduce it to a single constant value. This is
useful when you want to know whether a variable has a constant
expression value at all, and if so, what it is.
The constant folding optimization pass attempts to replace rvalues with
their constant expression value from the bottom up. That way, we can
optimize subexpressions, and ideally stop as soon as we find a
non-constant subexpression.
In order to obtain the actual value of an expression, the optimization
pass calls constant_expression_value(). But it should only do so if it
knows the value can be combined into a constant. Otherwise, at each
step of walking back up the tree, it will walk down the tree again, only
to discover what it already knew: it isn't constant.
We properly avoided this call for ir_expression nodes, but not for
ir_swizzle nodes. This patch fixes that, drastically reducing compile
times on certain shaders where tree grafting has given us huge
expression trees. It also fixes SuperTuxKart.
Thanks to Iago and Mike for help in tracking this down.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78468
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Cc: [email protected]
(cherry picked from commit 84a40ce86b1010873b194eb9bf0b8744234b829c)
|