| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This is a function with timeout support for reading from the pipe
between processes used for secure compile.
Initially we hardcode the timeout to 5 seconds. We can adjust the
timeout limit in future if needed.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
This will be used in the following patch to support timeouts for
reading the pipe between processes.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously subgroup shuffle was implemented using the bpermute
instruction, which only works accross half-waves, so by itself it's
not suitable for implementing subgroup shuffle when the shader is
running in wave64 mode.
This commit adds a trick using shared VGPRs that allows to implement
subgroup shuffle still relatively effectively in this mode.
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes p_reduce (all cluster sizes), p_inclusive_scan and p_exclusive_scan
with all reduction operations.
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
| |
This makes it clear that it's a boolean test and not an action
(eg. "empty the list").
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
| |
Just use the inlined function directly. The macro was replaced with
the function in ebe304fa540f.
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
| |
Just use the inlined function directly. The macro was replaced with
the function in ebe304fa540f.
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
| |
Just use the inlined function directly. The macro was replaced with
the function in ebe304fa540f.
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Do not flush NaN to 0.
Fixes
dEQP-VK.spirv_assembly.instruction.compute.opquantize.propagated_nans
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
It's similar to GFX9+. Shadow of Mordor (Vulkan beta) hits that
path and it works fine.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
Fixes: 8d43e2b2ded0fe3c82d4 ("meson: add -Werror=empty-body to disallow `if(x);`")
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Can be enabled via the environment variable which tells the
driver how many compilation threads are expected to be called,
and therefore how many forked processes the driver should
create.
For example we would expect to call fossilize replay with
something like this:
RADV_SECURE_COMPILE_THREADS=8 ./fossilize-replay --num-threads 8 \
--shader-cache-size 0 --ignore-derived-pipelines pipeline_cache.foz
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
This added support for the fork, the installation of the seccomp
filter, and the main loop for the actual compilation to be called
from i.e. run_secure_compile_device().
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function will be called by the parent process when doing a
secure compile. It first selects a free process to work with then
passes it all the information it needs to compile the pipeline.
Once the pipeline information has been passed to the secure
process, it then waits around to read/write any disk cache entries
required before exiting.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
We don't have permission to be creating shared memory etc.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This allows the secure process to read and write to the disk cache
via the parent process. This commit just adds the functionality
needed for the secure process, the following commit will add the
functionality for the parent process.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
These will be used by the following commits to hold information about
the forked secure compile processes.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
This will be used to identify information being passed between the
parent and secure process during a secure compile.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
In a follwing commit we want to be able to call this for secure
compiles from radv_device.c
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This can be usefull for debugging the on disk cache, but is also
useful in the following patch for secure compiles which will be
used to compile huge pipeline collections. These pipeline
collections can be multiple GBs and the in memory cache grows to
multiple GBs very quickly when they are compiled so we want to
be able to turn off the in memory cache.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
This is cleaner and avoids having to read/write an additional copy of
topology for use with secure compile.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GFX10 hazards require a different approach compared to previous
generations, for example it doesn't need s_nop, and most hazards
can't be solved by adding NOPs at all. Also, they are not
resolved by branch instructions.
This commit reorganizes aco_insert_NOPs so that there is now a
separate pass for GFX10. The new GFX10 pass also respects the
control flow of the shader.
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit refines the VMEMtoScalarWriteHazard mitigation, based
upon a closer look at what LLVM does. Also changes the code to
match the structure of the other hazard mitigations.
* The hazard is not only triggered by VMEM, FLAT and GLOBAL
but also SCRATCH and DS instructions.
* The SMEM/SALU instructions only cause a hazard when they
write a register that the VMEM/etc. are reading.
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
|
|
| |
There is a hazard caused by there is a branch between a
VMEM/GLOBAL/SCRATCH instruction and a DS instruction.
This commit adds a workaround that avoids the problem.
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
|
|
| |
There is a hazard that happens when an SMEM instruction
reads an SGPR and then a VALU instruction writes that same SGPR.
This commit adds a workaround that avoids the problem.
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
|
|
| |
There is a hazard when a non-VALU instruction reads the EXEC mask
and then a VALU instruction writes the EXEC mask.
This commit adds a workaround that avoids the problem.
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
|
| |
Any permlane instruction that follows any VOPC instruction can cause a hazard,
this commit implements a workaround that avoids this causing a problem.
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
ACO currently mitigates VMEMtoScalarWriteHazard and Offset3fBug
(names from LLVM). There are some bugs that ACO needn't care about.
Just to be on the safe side, add an assertion that makes sure
that we aren't hit by FlatSegmentOffsetBug.
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From the Vulkan spec 1.1.126 :
"VK_SHADER_FLOAT_CONTROLS_INDEPENDENCE_32_BIT_ONLY_KHR specifies
that shader float controls for 32-bit floating point can be set
independently; other bit widths must be set identically to each
other."
Forgot to update this when I enabled that extension recently.
Fixes dEQP-VK.spirv_assembly.instruction.compute.float_controls.independence_settings.independence_setting
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
On GFX8 the number of records is in bytes while on other chips
it's in units of "stride".
Fixes dEQP-VK.robustness.vertex_access.*.draw.vertex_* on RAVEN.
Tested on GFX6, GFX8, GFX10 and RAVEN.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pipeline-db (Vega):
SGPRS: 344 -> 344 (0.00 %)
VGPRS: 424 -> 524 (23.58 %)
Spilled SGPRs: 84 -> 80 (-4.76 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 52812 -> 52484 (-0.62 %) bytes
LDS: 135 -> 135 (0.00 %) blocks
Max Waves: 56 -> 53 (-5.36 %)
v2: consider WGP, rework to be clearer and apply the
"maximum 16 workgroups per CU" limit properly
v2: use "SIMD" instead of "EU"
v2: fix spiller by introducing "Program::max_waves"
v2: rename "lds_size" to "lds_limit"
v3: make max_waves actually independant of register usage
v3: fix issue where max_waves was way too high
v3: use DIV_ROUND_UP(a, b) instead of max(a / b, 1)
v3: rename "workgroups_per_cu" to "workgroups_per_cu_wgp"
v4: fix typo from "workgroups_per_cu" rename
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]> (v3)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SGPRs are allocated in groups of 16 on GFX8/GFX9. GFX10 allocates a fixed
number of SGPRs and has 106 addressable SGPRs.
pipeline-db (Vega):
SGPRS: 5912 -> 6232 (5.41 %)
VGPRS: 1772 -> 1780 (0.45 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 88228 -> 87904 (-0.37 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 559 -> 571 (2.15 %)
piepline-db (Navi):
SGPRS: 341256 -> 363384 (6.48 %)
VGPRS: 171536 -> 170960 (-0.34 %)
Spilled SGPRs: 832 -> 581 (-30.17 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 14207332 -> 14190872 (-0.12 %) bytes
LDS: 33 -> 33 (0.00 %) blocks
Max Waves: 18072 -> 18251 (0.99 %)
v2: unconditionally count vcc as an extra sgpr on GFX10+
v3: pass SGPRs rounded to 8
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Note that ACO doesn't correctly round SGPR counts on GFX8/GFX9.
pipeline-db (ACO/Vega):
SGPRS: 11000 -> 11000 (0.00 %)
VGPRS: 3120 -> 3120 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 164328 -> 164328 (0.00 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 1125 -> 1000 (-11.11 %)
v2: consider wave32
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I recently changed the slow depth/stencil clear path to make sure
depth values are explicitly exported by the fragment shader. This
is actually only useful when VK_EXT_depth_range_unrestricted is
enabled.
While this path is correct, it introduced a performance regression
with Heroes of the Storm, Shadow of Mordor (Vulkan beta) and
probably more titles. This is because it prevents the hardware
to do some optimizations like discarding fragments.
This commit re-introduces the previous (a bit faster) slow
depth/stencil clear path and it selects the unrestricted path
only if VK_EXT_depth_range_unrestricted is enabled.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/863
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
descriptorCount is the number of bytes into the descriptor, so
it shouldn't be used as an index. srcArrayElement/dstArrayElement
specify the starting byte offset within the binding to copy from/to.
This fixes new CTS tests:
dEQP-VK.binding_model.descriptor_copy.*.inline_uniform_block_*
dEQP-VK.binding_model.descriptor_copy.*.mix_3
dEQP-VK.binding_model.descriptor_copy.*.mix_array1
Fixes: 8d2654a4197 ("radv: Support VK_EXT_inline_uniform_block.")
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
GFX10 does act like GFX9 actually.
This fixes
dEQP-VK.glsl.texture_functions.query.texturesize.*sampler3d_*.
Cc: 19.2 <[email protected]>
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It used to cause weird issues on GFX10 in the past with vkmark and
Wreckfest, and they can't be reproduced now. Shadow Of Mordor
(Vulkan beta) hits that path and it works fine.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes some crashes with dEQP-VK.geometry.layered.*.secondary_cmd_buffer
on Raven and other chips that allow rbplus.
This just prevents a crash and rbplus probaby needs more work.
Cc: 19.2 <[email protected]>
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
To prevent out of bounds access.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The driver only supports up to 8 samples, so it's useless to
create more pipelines than needed.
This fixes a conditional jump reported by Valgrind on GFX10:
==194282== Conditional jump or move depends on uninitialised value(s)
==194282== at 0xDBF925A: radv_gfx10_compute_bin_size (radv_pipeline.c:3242)
==194282== by 0xDBF95A6: radv_pipeline_generate_binning_state (radv_pipeline.c:3334)
==194282== by 0xDBFC1A0: radv_pipeline_generate_pm4 (radv_pipeline.c:4440)
==194282== by 0xDBFD15E: radv_pipeline_init (radv_pipeline.c:4764)
==194282== by 0xDBFD23E: radv_graphics_pipeline_create (radv_pipeline.c:4788)
==194282== by 0xDBB95A3: create_pipeline (radv_meta_clear.c:114)
==194282== by 0xDBB9AC5: create_color_pipeline (radv_meta_clear.c:297)
==194282== by 0xDBBCF05: radv_device_init_meta_clear_state (radv_meta_clear.c:1277)
==194282== by 0xDB9ACD9: radv_device_init_meta (radv_meta.c:363)
==194282== by 0xDB7FE3A: radv_CreateDevice (radv_device.c:2080
This is caused by an out of bound access of 'fmask_array' (ie. index
is 4 as for 16 samples).
Cc: <[email protected]>
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
or export"
We don't properly pass on ctx.lgkm_cnt/ctx.barrier_imm/etc, so this
waitcnt was necessary for barriers and correctly waiting for SMEM before
s_dcache_wb on GFX10.
Totals from affected shaders:
SGPRS: 33200 -> 33200 (0.00 %)
VGPRS: 31376 -> 31376 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 2431804 -> 2433956 (0.09 %) bytes
LDS: 316 -> 316 (0.00 %) blocks
Max Waves: 1609 -> 1609 (0.00 %)
This reverts commit 2c050b49b3d776f054f1265d5523cabb61f22fc3.
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Affects 30 shaders in the pipeline-db (all youngblood).
Totals from affected shaders:
SGPRS: 2656 -> 2456 (-7.53 %)
VGPRS: 2260 -> 2260 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 240680 -> 240944 (0.11 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 90 -> 90 (0.00 %)
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|