| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
The aim of this is to work towards removing UniformHash from the program
struct so that we don't need to hold onto it in memory and pass it around
outside the linker.
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are never render target reads, so there are no scheduling hazards.
Giving the extra flexibility to the scheduler makes it possible to do
FB writes as soon as their sources are available, reducing register
pressure. It also makes it possible to do the payload setup for more
than one FB write message at a time, which could better hide latency.
shader-db results on Skylake:
total instructions in shared programs: 9110254 -> 9110211 (-0.00%)
instructions in affected programs: 2898 -> 2855 (-1.48%)
helped: 3
HURT: 0
LOST: 0
GAINED: 1
A reduction in instruction counts is surprising, but legitimate:
the three shaders helped were spilling, and reducing register
pressure allowed us to issue fewer spills/fills.
total cycles in shared programs: 69035108 -> 68928820 (-0.15%)
cycles in affected programs: 4412402 -> 4306114 (-2.41%)
helped: 4457
HURT: 213
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
| |
Acked-by: Ilia Mirkin <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
reuse the sampler deref handling code to do the same
thing for atomics.
Acked-by: Ilia Mirkin <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
Since we fixed the glsl->tgsi conversion we no longer need
this function.
Reviewed-by: Timothy Arceri <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The state tracker never handled this properly, and it finally
annoyed me for the second time so I decided to fix it properly.
This is inspired by the NIR sampler lowering code and I only realised
NIR seems to do its deref ordering different to GLSL at the last
minute, once I got that things got much easier.
it fixes a bunch of tests in
tests/spec/arb_gpu_shader5/execution/sampler_array_indexing/
v2: fix AoA tests when forced on.
I was right I didn't need all that code, fixing the AoA code
meant cleaning up a chunk of code I didn't like in the array
handling.
v3: start generalising the code a bit more for atomics.
v3.1: use UniformRemapTable
v4: handle uniforms differently using the param_index,
and go back to UniformStorage
fix issues identified by Timothy with deref handling.
v4.1: squash const fix and move handling 1D const out
of recursive function.
Reviewed-by: Timothy Arceri <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have a requirement to store the index into the mesa parameterlist
for uniforms. Up until now we've overwritten var->data.location with
this info. However this then stops us accessing UniformStorage,
which is needed to do proper dereferencing.
Add a new variable to ir_variable to store this value in, and change
the two uses to use it correctly.
Reviewed-by: Timothy Arceri <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
Its previous name was somewhat misleading, this really behaves like a
RW cache flush rather than an invalidation.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The state cache is also L3-backed so it seems sensible to make sure
it's clean as we do for other RO caches before repartitioning the L3.
This wasn't part of my original L3 partitioning code because I was
able to reproduce hangs on Gen7 hardware when the state cache
invalidation happened asynchronously with previous 3D rendering, which
should no longer be possible after the previous change.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to split the stalling flush from the RO cache invalidation
into a different PIPE_CONTROL command to make sure that the top of the
pipe invalidation happens after any previous rendering is complete.
Otherwise it's possible for previous rendering to pollute the L3 cache
in the short window of time between RO invalidation and the completion
of the stalling flush. Fixes rendering artifacts on Unigine Heaven,
Metro Last Light Redux and Metro 2033 Redux.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93540
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93599
Tested-by: Darius Spitznagel <[email protected]>
Tested-by: Martin Peres <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The SEL instruction with predication mode NONE emitted when the atomic
operation doesn't need to be predicated is a no-op and might rely on
undocumented hardware behaviour. Noticed by chance while looking at
the assembly output.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94050
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
The errors.c file had grown quite large so split off this extension
code into its own file. This involved making a handful of functions
non-static.
Acked-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Fixes MesaExtensionsTest.AlphabeticallySorted.
Fixes: 1d79b9958090 ("mesa: implement GL_NVX_gpu_memory_info (v2)")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94016
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
trivial.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The vec4 backend, at the end, does this:
if (inst->is_3src()) {
for (int i = 0; i < 3; i++) {
if (inst->src[i].vstride == BRW_VERTICAL_STRIDE_0)
assert(brw_is_single_value_swizzle(inst->src[i].swizzle));
So make sure that we use the same conditions when trying to
copy-propagate. UNIFORMs will be converted to vstride 0 in
convert_to_hw_regs, but so will ATTRs when interleaved (as will happen
in a GS with multiple attributes). Since the vstride is not set at
copy-prop time, infer it by inspecting dispatch_mode and reject ATTRs if
they have non-scalar swizzles and are interleaved.
Fixes assertion errors in dolphin-generated geometry shaders (or
misrendering on opt builds) on Sandybridge or on IVB/HSW with
INTEL_DEBUG=nodualobj.
Co-authored-by: Ilia Mirkin <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93418
Cc: "11.0 11.1" <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: assert and return if query_memory_info is not set
rebase
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
| |
v2: rebase
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: implement eviction queries properly
add gl_memory_info structure
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GLsync objects had a race condition when used from multiple threads
(which is the main point of the extension, really); it could be
validated as a sync object at the beginning of the function, and then
deleted by another thread before use, causing crashes. Fix this by
changing all casts from GLsync to struct gl_sync_object to a new
function _mesa_get_and_ref_sync() that validates and increases
the refcount.
In a similar vein, validation itself uses _mesa_set_search(), which
requires synchronization -- it was called without a mutex held, causing
spurious error returns and other issues. Since _mesa_get_and_ref_sync()
now takes the shared context mutex, this problem is also resolved.
Fixes bug #92757, found while developing Nageru, my live video mixer
(due for release at FOSDEM 2016).
v2: Marek: silence warnings, fix declaration after code
Signed-off-by: Steinar H. Gunderson <[email protected]>
Cc: "11.0 11.1" <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Also fixes a resource leak when an upload_mgr is used for constants.
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
At the same time, fix a memory leak noticed by Ilia Mirkin.
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
We can get rid of our reference immediately, since the driver will hold
onto it for us.
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
While rather unlikely, uploads _can_ fail. Doing them earlier means
we'll have to restore less state when they do fail, and it's slightly
easier to check the restore code.
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the framebuffer default sample count was taken directly
from the value given by the application. On the i965 driver on HSW if
the value wasn't one that is supported by the hardware it would hit an
assert when it tried to program the state for it. This patch fixes it
by adding a derived sample count to the state for the default
framebuffer. The driver can then quantize this to one of the valid
values in its UpdateState handler when the _NEW_BUFFERS state changes.
_mesa_geometric_samples is changed to use the new derived value.
Fixes the piglit test arb_framebuffer_no_attachments-query
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93957
Cc: Ilia Mirkin <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Otherwise it won't take into account the default samples for
framebuffers with no attachments.
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Otherwise it won't take into account the default samples for
framebuffers with no attachments.
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Otherwise it won't take into account the default samples for
framebuffers with no attachments.
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Forwards query result writes to drivers.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Rafal Mielniczuk <[email protected]>
[imirkin: move to GL/GL_CORE section]
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Add QueryBuffer and initialise it to NullBufferObj on start
Signed-off-by: Rafal Mielniczuk <[email protected]>
[imirkin: also release QueryBuffer on free]
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Rafal Mielniczuk <[email protected]>
[imirkin: add string to extensions.c]
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
| |
Add config option override_vendorid to report a fake card in d3dadapter9 drm.
Signed-off-by: Patrick Rudolph <[email protected]>
Reviewed-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
No instruction counts changed, but:
total cycles in shared programs: 64834502 -> 64781530 (-0.08%)
cycles in affected programs: 16331544 -> 16278572 (-0.32%)
helped: 4757
HURT: 4288
GAINED: 66
LOST: 20
I remember trying this when I first wrote the pass, but it wasn't
helpful at the time.
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
64-bit Pentium 4 CPUs don't have the 3DNow prefetch instructions
which results in an Illegal instruction crash.
Cc: "11.0 11.1" <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
Tested-by: Timothy Arceri <[email protected]>
https://bugs.freedesktop.org/show_bug.cgi?id=27512
|
|
|
|
|
|
|
| |
Spotted by Coverity
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
| |
v2:
- use st->pbo_upload.enabled flag
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
| |
This is where PBO upload will go.
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
| |
We will write our own version of texsubimage for PBO uploads, and we will
want to call that here as well.
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Use instancing to generate two triangles for each destination layer and use
a geometry shader to route the layer index.
v2:
- directly write layer in VS if supported by the driver (Marek Olšák)
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Create a PIPE_BUFFER sampler view on the pixel-unpack buffer, and draw
the image on the texture with a fragment shader that maps fragment
coordinates to buffer coordinates.
Modifications by Nicolai Hähnle:
- various cleanups and fixes (e.g. error handling, corner cases)
- split try_pbo_upload into two functions, which will allow code to be
shared with compressed texture uploads
- modify the source format selection to only test for support against
the PIPE_BUFFER target
v2:
- update handling of TGSI_SEMANTIC_POSITION for recent changes in master
- MaxTextureBufferSize is number of texels, not bytes (Ilia Mirkin)
- only enable when integers are supported (Marek Olšák)
- try harder to hit the TextureBufferOffsetAlignment
- remove unnecessary MOV from the fragment shader
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to tell the address generation functions about the dimensionality of
the texture to correctly implement the part of Section 3.8.1 (Texture Image
Specification) of the OpenGL 2.1 specification which says:
"For the purposes of decoding the texture image, TexImage2D is
equivalent to calling TexImage3D with corresponding arguments
and depth of 1, except that
...
* UNPACK SKIP IMAGES is ignored."
Fixes a low impact bug that was found by chance while browsing the spec and
extending piglit tests.
Cc: "11.0 11.1" <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
When set to a truish value, this globally disables the minmax cache for all
buffer objects.
No #ifdef DEBUG guards because this option can be interesting for
benchmarking.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When applications stream their index buffers, the caches for those BOs become
useless and add overhead, so we want to disable them. The tricky part is
coming up with the right heuristic for *when* to disable them.
The first question is which hit rate to aim for. Since I'm not aware of any
interesting borderline applications that do something like "draw two or three
times for each upload", I just kept it simple.
The second question is how soon we should give up on the caching. Applications
might have a warm-up phase where they fill a buffer gradually but then keep
reusing it. For this reason, I count the number of indices that hit and miss
(instead of the number of calls that hit or miss), since comparing that to
the size of the buffer makes sense.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|