| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are never render target reads, so there are no scheduling hazards.
Giving the extra flexibility to the scheduler makes it possible to do
FB writes as soon as their sources are available, reducing register
pressure. It also makes it possible to do the payload setup for more
than one FB write message at a time, which could better hide latency.
shader-db results on Skylake:
total instructions in shared programs: 9110254 -> 9110211 (-0.00%)
instructions in affected programs: 2898 -> 2855 (-1.48%)
helped: 3
HURT: 0
LOST: 0
GAINED: 1
A reduction in instruction counts is surprising, but legitimate:
the three shaders helped were spilling, and reducing register
pressure allowed us to issue fewer spills/fills.
total cycles in shared programs: 69035108 -> 68928820 (-0.15%)
cycles in affected programs: 4412402 -> 4306114 (-2.41%)
helped: 4457
HURT: 213
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
| |
Acked-by: Ilia Mirkin <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
reuse the sampler deref handling code to do the same
thing for atomics.
Acked-by: Ilia Mirkin <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
Since we fixed the glsl->tgsi conversion we no longer need
this function.
Reviewed-by: Timothy Arceri <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The state tracker never handled this properly, and it finally
annoyed me for the second time so I decided to fix it properly.
This is inspired by the NIR sampler lowering code and I only realised
NIR seems to do its deref ordering different to GLSL at the last
minute, once I got that things got much easier.
it fixes a bunch of tests in
tests/spec/arb_gpu_shader5/execution/sampler_array_indexing/
v2: fix AoA tests when forced on.
I was right I didn't need all that code, fixing the AoA code
meant cleaning up a chunk of code I didn't like in the array
handling.
v3: start generalising the code a bit more for atomics.
v3.1: use UniformRemapTable
v4: handle uniforms differently using the param_index,
and go back to UniformStorage
fix issues identified by Timothy with deref handling.
v4.1: squash const fix and move handling 1D const out
of recursive function.
Reviewed-by: Timothy Arceri <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have a requirement to store the index into the mesa parameterlist
for uniforms. Up until now we've overwritten var->data.location with
this info. However this then stops us accessing UniformStorage,
which is needed to do proper dereferencing.
Add a new variable to ir_variable to store this value in, and change
the two uses to use it correctly.
Reviewed-by: Timothy Arceri <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
Its previous name was somewhat misleading, this really behaves like a
RW cache flush rather than an invalidation.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The state cache is also L3-backed so it seems sensible to make sure
it's clean as we do for other RO caches before repartitioning the L3.
This wasn't part of my original L3 partitioning code because I was
able to reproduce hangs on Gen7 hardware when the state cache
invalidation happened asynchronously with previous 3D rendering, which
should no longer be possible after the previous change.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to split the stalling flush from the RO cache invalidation
into a different PIPE_CONTROL command to make sure that the top of the
pipe invalidation happens after any previous rendering is complete.
Otherwise it's possible for previous rendering to pollute the L3 cache
in the short window of time between RO invalidation and the completion
of the stalling flush. Fixes rendering artifacts on Unigine Heaven,
Metro Last Light Redux and Metro 2033 Redux.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93540
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93599
Tested-by: Darius Spitznagel <[email protected]>
Tested-by: Martin Peres <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The SEL instruction with predication mode NONE emitted when the atomic
operation doesn't need to be predicated is a no-op and might rely on
undocumented hardware behaviour. Noticed by chance while looking at
the assembly output.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94050
|
| |
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
| |
Just to be consistent.
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
| |
New functions for examining instructions, declarations, etc.
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
| |
|
|
|
|
|
|
|
|
| |
The errors.c file had grown quite large so split off this extension
code into its own file. This involved making a handful of functions
non-static.
Acked-by: Timothy Arceri <[email protected]>
|
| |
|
|
|
|
| |
Still some nonsensical comments.
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
To try to reduce the clutter in u_debug.[ch]
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
This fixes a crash with bin/arb_clear_texture-base-formats and
probably some other tests which use clear_texture().
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
There is no need to allocate memory when unwrapping the indirect buf.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Fixes MesaExtensionsTest.AlphabeticallySorted.
Fixes: 1d79b9958090 ("mesa: implement GL_NVX_gpu_memory_info (v2)")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94016
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The builtin data can get released with a glReleaseShaderCompiler call.
We're careful everywhere to clone everything that comes out of builtins
except here, where we accidentally return the signature belonging to the
builtin version, rather than the locally-cloned one.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Tested-by: Rob Herring <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
| |
The builtin function shader is part of the builtin state, released
when glReleaseShaderCompiler is called. We must ensure that the
builtins have been (re)initialized before attempting to link with the
builtin shader.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Tested-by: Rob Herring <[email protected]>
Cc: [email protected]
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Serge Martin <[email protected]>
|
|
|
|
|
|
|
| |
If we are not even looking for one don't bother generating a candidate
list.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
All interface blocks will have been lowered by this point so just
use an assert. Returning false would have caused all sorts of
problems if they were not lowered yet and there is an assert to
catch this later anyway.
We also update the tests to reflect this change.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
found by inspection.
Signed-off-by: Jan Vesely <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes compute since 7dd31b81fee7fe40bd09cf3fbc324fcc32782479
gallium/radeon: support PIPE_CAP_SURFACE_REINTERPRET_BLOCKS
Signed-off-by: Jan Vesely <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
| |
trivial.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The vec4 backend, at the end, does this:
if (inst->is_3src()) {
for (int i = 0; i < 3; i++) {
if (inst->src[i].vstride == BRW_VERTICAL_STRIDE_0)
assert(brw_is_single_value_swizzle(inst->src[i].swizzle));
So make sure that we use the same conditions when trying to
copy-propagate. UNIFORMs will be converted to vstride 0 in
convert_to_hw_regs, but so will ATTRs when interleaved (as will happen
in a GS with multiple attributes). Since the vstride is not set at
copy-prop time, infer it by inspecting dispatch_mode and reject ATTRs if
they have non-scalar swizzles and are interleaved.
Fixes assertion errors in dolphin-generated geometry shaders (or
misrendering on opt builds) on Sandybridge or on IVB/HSW with
INTEL_DEBUG=nodualobj.
Co-authored-by: Ilia Mirkin <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93418
Cc: "11.0 11.1" <[email protected]>
|
|
|
|
|
|
|
| |
v2: don't use DIV_ROUND_UP (no so useful)
also return eviction stats
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: assert and return if query_memory_info is not set
rebase
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
| |
v2: rebase
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: implement eviction queries properly
add gl_memory_info structure
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
If you're worried about the duplication of some CAPs, we can remove them
later.
v2: add fields for memory eviction stats
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
| |
amdgpu doesn't have to set this, because radeonsi gets it from tile mode
arrays by default.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
| |
v2: print an error to stderr
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GLsync objects had a race condition when used from multiple threads
(which is the main point of the extension, really); it could be
validated as a sync object at the beginning of the function, and then
deleted by another thread before use, causing crashes. Fix this by
changing all casts from GLsync to struct gl_sync_object to a new
function _mesa_get_and_ref_sync() that validates and increases
the refcount.
In a similar vein, validation itself uses _mesa_set_search(), which
requires synchronization -- it was called without a mutex held, causing
spurious error returns and other issues. Since _mesa_get_and_ref_sync()
now takes the shared context mutex, this problem is also resolved.
Fixes bug #92757, found while developing Nageru, my live video mixer
(due for release at FOSDEM 2016).
v2: Marek: silence warnings, fix declaration after code
Signed-off-by: Steinar H. Gunderson <[email protected]>
Cc: "11.0 11.1" <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Yet another change motivated by AMD GPUPerfStudio compatibility. These groups
are not directly accessible from userspace, and AMD GPUPerfStudio does not
actually query them - it just requires them to be there. Hence, adding
a placeholder for now.
Reviewed-by: Edward O'Callaghan <[email protected]>
Acked-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
This is yet another change motivated by appeasing AMD GPUPerfStudio's
hardcoding of performance counter group numbers.
Reviewed-by: Edward O'Callaghan <[email protected]>
Acked-by: Marek Olšák <[email protected]>
|