| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we run RA in a loop, before each iteration after a failed
allocation we choose a spill node and spill it to Thread Local Storage
using st_int4/ld_int4 instructions (for spills and fills respectively).
This allows us to compile complex shaders that normally would not fit
within the 16 work register limits, although it comes at a fairly steep
performance penalty.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
Helps scan the MIR for uses of an index.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
If we write to an index before reading it, the old copy we're checking
liveness for isn't live in this block, even if it does get read later.
Fixes abnormally high register pressure in shaders with loops.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
Midgard bundles contain a tag, as well as a copy of the tag of the next
bundle to facilitate prefetch. Do some simple static analysis to detect
certain tag errors (particularly on shaders without branching).
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
Rather than rewriting an index away across the whole block, we expose
finer (per-instruction) granularity for rewrites.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
It doesn't make any sense to look at it.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
These are used to load/store from Thread Local Storage, which is memory
allocated per-thread (corresponding to ctx->scratchpad in the command
stream) and used for register spilling.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
It was a crazy idea that didn't pan out. We're better served by a good
copyprop pass. It's also unused now.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Rather than creating either a load or a uniform register read with a
fixed beginning offset, we always create a load and then promote to a
uniform register later. This will allow us to promote in a register
pressure aware manner.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will allow us to insert instructions as a result of register
allocation, permitting spilling to be implemented. As a side effect,
with the assert commented out this would fix a bunch of glamor crashes
(due to RA failures) so MATE becomes useable.
Ideally we'll have scheduling or RA actually sorted out before the
branch point but if not this gives us a one-line out to get X working...
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
What we have is equivalent to the default callback; let's use that.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
depth_stencil_attachment and/or ds_resolve attachment can be NULL.
This fixes crashes with
dEQP-VK.renderpass.suballocation.unused_clear_attachments.*
Cc: 19.1 <[email protected]>
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
Some leaks detected with GL_KHR_debug on i965.
CC: Timothy Arceri <[email protected]>
Signed-off-by: Sergii Romantsov <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
GFX10 isn't affected.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
This field doesn't exist.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
RadeonSI only uses Z32_FLOAT_CLAMP for upgraded depth textures
on GFX10 and RADV doesn't promotes Z16 or Z24.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
This late optimization pass is only affected by nir_opt_if() and handles all cases
in a single pass. It's enough to call it once after the optimization loop.
No changes on vkpipeline-db.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Since logicop_func 0 is PIPE_LOGIOP_CLEAR, we were trigger lowerinng
of logic ops on precompiled shaders, which we don't want to do. Also, this
had the side effect of making shader-db crash, as during this lowering we
would try to read the color format swizzle information from the fragment shader
key that we don't populate in precompiled shaders because right now we only
need it when logic operations are enabled.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we detect that a scheduling candidate will stall because having a
register source that is the written by the SFU unit in the previous
instruction we reduce its priority so any non stalling operation would
be chosen.
The latency of SFU operations is defined as 2. So they would be scheduled
earlier if other candidates have the same priority.
Finally we won't merge instructions that stall to a previously chosen one.
As the result of the previous one would be waiting for an extra cycle.
Although shader-db result show that instruction are hurt with an increase
of 0.35% the sum of instructions + stalls is reduced a 0.52%. And
the total of sfu-stalls is reduced a 63.51%. It implies also a small
increase in the max-temps metric because of scheduling earlier SFU
operations.
total instructions in shared programs: 9102719 -> 9117851 (0.17%)
instructions in affected programs: 4324628 -> 4339760 (0.35%)
helped: 4162
HURT: 12128
helped stats (abs) min: 1 max: 10 x̄: 1.28 x̃: 1
helped stats (rel) min: 0.09% max: 4.76% x̄: 0.66% x̃: 0.51%
HURT stats (abs) min: 1 max: 27 x̄: 1.69 x̃: 1
HURT stats (rel) min: 0.05% max: 7.69% x̄: 0.87% x̃: 0.68%
95% mean confidence interval for instructions value: 0.90 0.96
95% mean confidence interval for instructions %-change: 0.47% 0.50%
Instructions are HURT.
total max-temps in shared programs: 1327728 -> 1327812 (<.01%)
max-temps in affected programs: 4730 -> 4814 (1.78%)
helped: 61
HURT: 134
helped stats (abs) min: 1 max: 2 x̄: 1.08 x̃: 1
helped stats (rel) min: 2.70% max: 13.33% x̄: 4.89% x̃: 4.17%
HURT stats (abs) min: 1 max: 3 x̄: 1.12 x̃: 1
HURT stats (rel) min: 1.54% max: 20.00% x̄: 6.10% x̃: 5.26%
95% mean confidence interval for max-temps value: 0.28 0.58
95% mean confidence interval for max-temps %-change: 1.80% 3.52%
Max-temps are HURT.
total sfu-stalls in shared programs: 99551 -> 36324 (-63.51%)
sfu-stalls in affected programs: 95029 -> 31802 (-66.53%)
helped: 25882
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 2.44 x̃: 2
helped stats (rel) min: 5.26% max: 100.00% x̄: 79.86% x̃: 100.00%
95% mean confidence interval for sfu-stalls value: -2.47 -2.42
95% mean confidence interval for sfu-stalls %-change: -80.18% -79.54%
Sfu-stalls are helped.
total inst-and-stalls in shared programs: 9202270 -> 9154175 (-0.52%)
inst-and-stalls in affected programs: 5618516 -> 5570421 (-0.86%)
helped: 22728
HURT: 855
helped stats (abs) min: 1 max: 31 x̄: 2.16 x̃: 1
helped stats (rel) min: 0.07% max: 16.67% x̄: 1.14% x̃: 0.92%
HURT stats (abs) min: 1 max: 5 x̄: 1.25 x̃: 1
HURT stats (rel) min: 0.12% max: 5.26% x̄: 1.24% x̃: 0.86%
95% mean confidence interval for inst-and-stalls value: -2.07 -2.01
95% mean confidence interval for inst-and-stalls %-change: -1.07% -1.05%
Inst-and-stalls are helped.
v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric)
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SFU operations have a latency of 2 cicles, so if their results
are used in the following cycle to a SFU instruction, the GPU
stalls for an extra cycle until the result is available.
This adds the number of stalls to the shader-db debug mode and
sum of instruction + stalls to evaluate optimizations to schedule
instructions that avoid generating sfu-stalls.
v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric)
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Just like the next line :)
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
snprintf() always terminates the string.
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
For es_vgpr_comp_cnt.
Fixes: 795adbbadd4 "radv/gfx10: Add pipeline state support for tess."
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In virgl_buffer_transfer_extend, when no flush is needed, it tries
to extend a previously queued transfer instead if it can find one.
Comparing to virgl_resource_transfer_prepare, it fails to check if
the resource is busy.
The existence of a previously queued transfer normally implies that
the resource is not busy, maybe except for when the transfer is
PIPE_TRANSFER_UNSYNCHRONIZED. Rather than burdening us with a
lengthy comment, and potential concerns over breaking it as the
transfer code evolves, this commit makes the valid_buffer_range
check the only condition to take the fast path.
In real world, we hit the fast path almost only because of the
valid_buffer_range check. In micro benchmarks, the condition should
always be true, otherwise the benchmarks are not very representative
of meaningful workloads. I think this fix is justified.
The recent change to PIPE_TRANSFER_MAP_DIRECTLY usage disables the
fast path. This commit re-enables it as well.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
|
|
|
| |
Do not take a transfer and do the memcpy. Add a _buffer suffix to
the function name to make it clear that it is only for buffers.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
|
|
|
| |
Without setting hw_res, virgl_transfer_queue_extend never finds a
match and always returns NULL.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]>
Acked-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]>
Acked-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]>
Acked-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
| |
Legacy GS has to use Wave64, so TES before GS has to use Wave64 too.
Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]>
Acked-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]>
Acked-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]>
Acked-by: Samuel Pitoiset <[email protected]>
|