| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
This is useful in order to identify codegen issues caused by SIMD32.
It doesn't currently have any effect on compute shaders since SIMD32
dispatch is only enabled for CS when it's strictly necessary to do so
in order to support the workgroup size requested for the shader --
That might change in the future though when we hook up the SIMD32
heuristic to CS compilation.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
shaders.
The heuristic enables the SIMD32 fragment shader based on whether the
IR performance modeling pass predicts it to have greater throughput
than the SIMD16 and SIMD8 variants of the same shader. It would be
straightforward to do the same thing in order to control whether
SIMD16 dispatch is enabled, but it's pending additional performance
evaluation.
The INTEL_DEBUG=do32 option is left around in order to force the
SIMD32 shader to be used regardless of the result of the heuristic,
since it's useful as a debugging aid e.g. in order to identify
SIMD32-specific codegen issues which may be masked by the SIMD32
heuristic, or cases where the heuristic is incorrectly disabling
SIMD32 shaders that offer a performance advantage.
Currently this is only enabled on Gen6+, since SIMD32 codegen support
is incomplete on earlier platforms.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes brw_compile_fs() look a bit more similar to
brw_compile_cs(). It saves us three v*_shader_stats local variables,
and will save us additional triplicated declarations as we start
tracking IR performance analysis results.
The triplicated cfg pointers are left around because they're set to
NULL to mark specific dispatch modes as disabled (e.g. in order to
enforce hardware restrictions). Doing the same thing with the visitor
pointers would cause data leaks.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This introduces an analysis pass intended to estimate several
performance statistics of the shader, including cycle count latency
and throughput values, based on static modeling. It has instruction
performance information more comprehensive than the current scheduling
pass for all platforms between Gen4-11, and works on both the FS and
VEC4 back-end.
The most immediate purpose of this pass is to implement a heuristic
meant to determine whether using SIMD32 dispatch for a fragment shader
can be expected to help more than it hurts. In addition this will
allow the effect of passes run after scheduling (e.g. the TGL software
scoreboard pass and the VEC4 dependency control pass) to be visible in
shader-db statistics.
But that isn't the end of the story, other potential applications of
this pass (not part of this MR) I've been playing around with are:
- Implement a similar SIMD16 heuristic allowing the identification of
inefficient SIMD16 fragment shaders.
- Implement similar SIMD16 and SIMD32 heuristics for the compute
shader stage -- Currently compute shader builds always use the
SIMD16 shader if available and never use the SIMD32 shader unless
strictly necessary, which is suboptimal under certain conditions.
- Hook up to the instruction scheduler in order to improve the
accuracy of its timing information.
- Use as heuristic in order to drive the selection of scheduling
modes (Matt was experimenting with that).
- Plug to the TGL software scoreboard pass in order to implement a
more effective SBID token allocation algorithm, since in general
the optimal token allocation depends on the timings of all
instructions in the program.
- Use its bottleneck detection functionality in order to implement a
heuristic computing a more optimal bound for the number of fragment
shader threads executed in parallel (by adjusting the
MaximumNumberofThreadsPerPSD control of 3DSTATE_PS).
As a follow-up I'm planning to submit updated timing information for
Gen12 platforms -- Everything else required to support Gen12 like SWSB
handling is already included in this patch, but there were some IP
concerns regarding the TGL timing parameters since they cannot
currently be obtained with the documentation and hardware which is
publicly available. The timing parameters for any previous Gen7-11
platforms can be obtained by anyone by sampling the timestamp register
using e.g. shader_time, though I have some more convenient
instrumentation coming up.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
This will be re-usable by the IR performance analysis pass.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
fs_instruction_scheduler::is_compressed().
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Makes more sense considering SIMD32. Relaxing the assertion in
brw_ir_fs.h will be required in order to avoid assertion failures on
SNB with SIMD32 fragment shaders.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
layout.
In Gen12 the Poly 0 Info DWORD containing the Viewport Index and
Render Target Index fields were moved from r0.0 to r1.1 in order to
make room for dual-polygon dispatch. The render target message format
was updated to expect that information in the same location, so we
didn't need to make any changes for framebuffer fetch to work with
SIMD8 and SIMD16 dispatch. Unfortunately that won't work with SIMD32,
since the render target message header is assembled from r0 and r2
instead of r1, and the r2 thread payload wasn't updated with an
additional copy of the same information. We need to fix things up
manually instead. This avoids a handful of
EXT_shader_framebuffer_fetch regressions in combination with SIMD32
fragment shaders.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
SIMD32.
This applies the same work-around I commited as b84fa0b31e67
"intel/fs/gen11: Work around dual-source blending hangs in combination
with SIMD32." to Gen12, which seems to suffer from the same hardware
bug found empirically. The failure mode seems to be identical.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Gen12 docs are rather contradictory regarding the dispatch
configurations supported by the fragment shader -- The same table
present in previous generations seems to imply that only one dispatch
mode can be enabled when doing per-sample shading, but a restriction
documented in the 3DSTATE_PS_BODY page implies the opposite: That
SIMD32 can only be used in combination with some other dispatch mode.
The latter seems to match the behavior of real hardware as I could
tell from my testing: A bunch of multisample test-cases that do
per-sample shading hang if we only provide a SIMD32 shader.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Section 2.2.2 (Data Conversions For State Query Commands) of the
OpenGL 4.5 spec says:
Following these steps, if a value is so large in magnitude that
it cannot be represented by the returned data type, then the
nearest value representable using that type is returned.
The current code doesn't do the correct thing, because it truncates a
long (potentially a 64bit values) to an int.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2828
Fixes: 53c36dfcfe3eb3749a53267f054870280afb0d71
("replace IROUND with util functions")
Reviewed-by: Matt Turner <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4673>
|
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4790>
|
|
|
|
|
|
|
| |
No shifting yet.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4790>
|
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4790>
|
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4790>
|
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4790>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a resource's backing bo changes, its seqno will be incremented.
Which would result in a new tex state cache key, and nothing to clean
up the old tex state until the sampler view/state is destroyed. But
in some games, that may never happen, or at least not happen before
we run out of memory.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2830
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>
|
|
|
|
|
|
|
|
|
|
|
| |
This will matter in the next patch, where we need the original
rsc->seqno.
It means slight shuffling of where we call rebind_resource() in the
`fd_try_shadow_resource()` path.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>
|
|
|
|
|
|
|
| |
If the resource is rebound, we need to invalidate in all contexts.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>
|
|
|
|
|
|
|
|
|
|
|
| |
Track how resources are used, ie. which state they may potentially dirty
if the backing bo is changed/reallocated, to optimize rebind_resource().
This will be more important in a later patch when we hook up eviction of
entries in a6xx tex state cache.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>
|
|
|
|
|
|
|
| |
Plus a bonus typo fix.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>
|
|
|
|
|
|
|
|
|
| |
The `DISCARD_WHOLE_RESOURCE` is just a hint. And `rebind_resource()` is
a bunch of faffing about (and going to get worse in a later patch), so
let's not bother when the bo is already idle.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>
|
|
|
|
|
|
|
|
|
|
|
| |
This commit fixes two problems:
- In some cases SWR does not correctly report to Gallium
which formats are supported.
- Incorrect LLVM instructions are used in vertex fetch in some situations
Reviewed-by: Krzysztof Raszkowski <[email protected]>
Reviewed-by: Bruce Cherniak <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4788>
|
|
|
|
|
|
|
| |
~50MB gzip'd log files are nicer than ~300MB uncompressed
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4750>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We should only rely on overflow detection for indirect draws, where we
have no other option.
This doesn't use quite the worst-possible-case sizes, which in practice
seem to be ~20x larger than what is required. But instead uses roughly
half of that.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4750>
|
|
|
|
|
|
|
|
|
| |
For vsc size calculation, we need to know the # of bins per pipe. Or at
least the worst-case # of bins, assuming we don't eliminate an unused depth/
stencil buffer.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4750>
|
|
|
|
|
|
|
|
| |
These are the draw-stream and primitive-stream, so lets give them more
descriptive names.
Signed-off-by: Rob Clark <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4750>
|
|
|
|
|
|
| |
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
|
|
|
|
|
|
|
|
| |
Make it work with 16-bit and GFX10.
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
|
|
|
|
|
|
| |
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
|
|
|
|
|
|
| |
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
|
|
|
|
|
|
| |
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
|
|
|
|
|
|
| |
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
|
|
|
|
|
|
| |
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
|
|
|
|
|
|
| |
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Example situation:
v1 = {v0.hi, v1.lo}
v0.hi = v1.hi
The 4-byte copy's definition is completely used, but swapping it makes no
sense. We have to split it to generate correct code:
swap(v0.hi, v1.lo)
swap(v0.hi, v1.hi)
Found in dEQP-VK.spirv_assembly.type.vec3.i16.constant_composite_vert
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
|
|
|
|
|
|
| |
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
|
|
|
|
|
|
|
|
|
| |
When splitting a v6b vector into v1 and v2b components, we should ensure
the v1 definition doesn't start at the upper half.
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
|
|
|
|
|
|
| |
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
|
|
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>
|
|
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>
|
|
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>
|
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>
|
|
|
|
|
|
|
|
| |
We can't remove volatile writes and we can't combine them with other
volatile writes so all we can do is clear the unused bits.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>
|
|
|
|
|
|
|
| |
Fixes: 62332d139c8f6 "nir: Add a local variable-based copy prop..."
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>
|
|
|
|
|
|
|
|
|
| |
For deref_store, we can still delete invalid stores that write to
statically OOB data. For everything, we need to make sure that we kill
aliases of destinations even if it's volatile.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>
|
|
|
|
|
| |
Acked-by: Caio Marcelo de Oliveira Filho <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>
|