summaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* radv: make use of radv_sc_read()Timothy Arceri2019-10-303-39/+76
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add radv_sc_read() helperTimothy Arceri2019-10-302-0/+42
| | | | | | | | | | This is a function with timeout support for reading from the pipe between processes used for secure compile. Initially we hardcode the timeout to 5 seconds. We can adjust the timeout limit in future if needed. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: allow select() calls in secure compileTimothy Arceri2019-10-301-1/+5
| | | | | | | This will be used in the following patch to support timeouts for reading the pipe between processes. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: get tcc_harvested from the kernelMarek Olšák2019-10-281-3/+8
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: Introduce vgpr_limit to keep track of available VGPRs.Timur Kristóf2019-10-286-5/+12
| | | | | Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Implement subgroup shuffle in GFX10 wave64 mode.Timur Kristóf2019-10-286-16/+113
| | | | | | | | | | | | | Previously subgroup shuffle was implemented using the bpermute instruction, which only works accross half-waves, so by itself it's not suitable for implementing subgroup shuffle when the shader is running in wave64 mode. This commit adds a trick using shared VGPRs that allows to implement subgroup shuffle still relatively effectively in this mode. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Remove dead code in reduction lowering.Rhys Perry2019-10-281-16/+14
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: Fix reductions on GFX10.Rhys Perry2019-10-283-18/+95
| | | | | | | | Fixes p_reduce (all cluster sizes), p_inclusive_scan and p_exclusive_scan with all reduction operations. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* util: rename list_empty() to list_is_empty()Timothy Arceri2019-10-281-1/+1
| | | | | | | This makes it clear that it's a boolean test and not an action (eg. "empty the list"). Reviewed-by: Eric Engestrom <[email protected]>
* util: remove LIST_DEL macroTimothy Arceri2019-10-281-1/+1
| | | | | | | Just use the inlined function directly. The macro was replaced with the function in ebe304fa540f. Reviewed-by: Eric Engestrom <[email protected]>
* util: remove LIST_ADDTAIL macroTimothy Arceri2019-10-281-1/+1
| | | | | | | Just use the inlined function directly. The macro was replaced with the function in ebe304fa540f. Reviewed-by: Eric Engestrom <[email protected]>
* util: remove LIST_INITHEAD macroTimothy Arceri2019-10-281-1/+1
| | | | | | | Just use the inlined function directly. The macro was replaced with the function in ebe304fa540f. Reviewed-by: Eric Engestrom <[email protected]>
* radv: fix OpQuantizeToF16 for NaN on GFX6-7Samuel Pitoiset2019-10-281-2/+2
| | | | | | | | | | Do not flush NaN to 0. Fixes dEQP-VK.spirv_assembly.instruction.compute.opquantize.propagated_nans Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: enable fast depth/stencil clears with separate aspects on GFX8Samuel Pitoiset2019-10-281-7/+0
| | | | | | | | It's similar to GFX9+. Shadow of Mordor (Vulkan beta) hits that path and it works fine. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix empty-body instructionEric Engestrom2019-10-271-1/+1
| | | | | | | Fixes: 8d43e2b2ded0fe3c82d4 ("meson: add -Werror=empty-body to disallow `if(x);`") Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* radv: enable secure compile supportTimothy Arceri2019-10-262-3/+26
| | | | | | | | | | | | | | | Can be enabled via the environment variable which tells the driver how many compilation threads are expected to be called, and therefore how many forked processes the driver should create. For example we would expect to call fossilize replay with something like this: RADV_SECURE_COMPILE_THREADS=8 ./fossilize-replay --num-threads 8 \ --shader-cache-size 0 --ignore-derived-pipelines pipeline_cache.foz Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: a support for a secure compile fork at device creationTimothy Arceri2019-10-261-1/+299
| | | | | | | | This added support for the fork, the installation of the seccomp filter, and the main loop for the actual compilation to be called from i.e. run_secure_compile_device(). Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add radv_secure_compile()Timothy Arceri2019-10-261-0/+134
| | | | | | | | | | | | This function will be called by the parent process when doing a secure compile. It first selects a free process to work with then passes it all the information it needs to compile the pipeline. Once the pipeline information has been passed to the secure process, it then waits around to read/write any disk cache entries required before exiting. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: for secure compile exit early from radv_shader_variant_create()Timothy Arceri2019-10-261-1/+8
| | | | | | We don't have permission to be creating shared memory etc. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: allow the secure process to read and write from disk cacheTimothy Arceri2019-10-261-5/+77
| | | | | | | | | This allows the secure process to read and write to the disk cache via the parent process. This commit just adds the functionality needed for the secure process, the following commit will add the functionality for the parent process. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add radv_device_use_secure_compile() helperTimothy Arceri2019-10-261-0/+6
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add some new members to radv device and instance for secure compileTimothy Arceri2019-10-261-0/+21
| | | | | | | These will be used by the following commits to hold information about the forked secure compile processes. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add radv_secure_compile_type enumTimothy Arceri2019-10-261-0/+11
| | | | | | | This will be used to identify information being passed between the parent and secure process during a secure compile. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add radv_create_shaders() to radv_shader.hTimothy Arceri2019-10-262-1/+10
| | | | | | | In a follwing commit we want to be able to call this for secure compiles from radv_device.c Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add debug option to turn off in memory cacheTimothy Arceri2019-10-263-5/+22
| | | | | | | | | | | This can be usefull for debugging the on disk cache, but is also useful in the following patch for secure compiles which will be used to compile huge pipeline collections. These pipeline collections can be multiple GBs and the in memory cache grows to multiple GBs very quickly when they are compiled so we want to be able to turn off the in memory cache. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: get topology from pipeline key rather than VkGraphicsPipelineCreateInfoTimothy Arceri2019-10-261-9/+8
| | | | | | | This is cleaner and avoids having to read/write an additional copy of topology for use with secure compile. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: Refactor hazard mitigations, separate pass for GFX10.Timur Kristóf2019-10-251-70/+113
| | | | | | | | | | | | | | GFX10 hazards require a different approach compared to previous generations, for example it doesn't need s_nop, and most hazards can't be solved by adding NOPs at all. Also, they are not resolved by branch instructions. This commit reorganizes aco_insert_NOPs so that there is now a separate pass for GFX10. The new GFX10 pass also respects the control flow of the shader. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco/gfx10: Fix mitigation of VMEMtoScalarWriteHazard.Timur Kristóf2019-10-251-10/+20
| | | | | | | | | | | | | | This commit refines the VMEMtoScalarWriteHazard mitigation, based upon a closer look at what LLVM does. Also changes the code to match the structure of the other hazard mitigations. * The hazard is not only triggered by VMEM, FLAT and GLOBAL but also SCRATCH and DS instructions. * The SMEM/SALU instructions only cause a hazard when they write a register that the VMEM/etc. are reading. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco/gfx10: Mitigate LdsBranchVmemWARHazard.Timur Kristóf2019-10-252-0/+66
| | | | | | | | | There is a hazard caused by there is a branch between a VMEM/GLOBAL/SCRATCH instruction and a DS instruction. This commit adds a workaround that avoids the problem. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco/gfx10: Mitigate SMEMtoVectorWriteHazard.Timur Kristóf2019-10-252-0/+70
| | | | | | | | | There is a hazard that happens when an SMEM instruction reads an SGPR and then a VALU instruction writes that same SGPR. This commit adds a workaround that avoids the problem. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco/gfx10: Mitigate VcmpxExecWARHazard.Timur Kristóf2019-10-252-0/+59
| | | | | | | | | There is a hazard when a non-VALU instruction reads the EXEC mask and then a VALU instruction writes the EXEC mask. This commit adds a workaround that avoids the problem. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco/gfx10: Mitigate VcmpxPermlaneHazard.Timur Kristóf2019-10-252-0/+28
| | | | | | | | Any permlane instruction that follows any VOPC instruction can cause a hazard, this commit implements a workaround that avoids this causing a problem. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco/gfx10: Add notes about some GFX10 hazards.Timur Kristóf2019-10-252-2/+37
| | | | | | | | | | ACO currently mitigates VMEMtoScalarWriteHazard and Offset3fBug (names from LLVM). There are some bugs that ACO needn't care about. Just to be on the safe side, add an assertion that makes sure that we aren't hit by FlatSegmentOffsetBug. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* radv: fix VK_KHR_shader_float_controls dependency on GFX6-7Samuel Pitoiset2019-10-251-3/+3
| | | | | | | | | | | | | | | From the Vulkan spec 1.1.126 : "VK_SHADER_FLOAT_CONTROLS_INDEPENDENCE_32_BIT_ONLY_KHR specifies that shader float controls for 32-bit floating point can be set independently; other bit widths must be set identically to each other." Forgot to update this when I enabled that extension recently. Fixes dEQP-VK.spirv_assembly.instruction.compute.float_controls.independence_settings.independence_setting Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: compute the number of records correctly for vertex buffersSamuel Pitoiset2019-10-241-5/+7
| | | | | | | | | | | | On GFX8 the number of records is in bytes while on other chips it's in units of "stride". Fixes dEQP-VK.robustness.vertex_access.*.draw.vertex_* on RAVEN. Tested on GFX6, GFX8, GFX10 and RAVEN. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: take LDS into account when calculating num_wavesRhys Perry2019-10-234-7/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | pipeline-db (Vega): SGPRS: 344 -> 344 (0.00 %) VGPRS: 424 -> 524 (23.58 %) Spilled SGPRs: 84 -> 80 (-4.76 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 52812 -> 52484 (-0.62 %) bytes LDS: 135 -> 135 (0.00 %) blocks Max Waves: 56 -> 53 (-5.36 %) v2: consider WGP, rework to be clearer and apply the "maximum 16 workgroups per CU" limit properly v2: use "SIMD" instead of "EU" v2: fix spiller by introducing "Program::max_waves" v2: rename "lds_size" to "lds_limit" v3: make max_waves actually independant of register usage v3: fix issue where max_waves was way too high v3: use DIV_ROUND_UP(a, b) instead of max(a / b, 1) v3: rename "workgroups_per_cu" to "workgroups_per_cu_wgp" v4: fix typo from "workgroups_per_cu" rename Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> (v3)
* aco: increase accuracy of SGPR limitsRhys Perry2019-10-236-28/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SGPRs are allocated in groups of 16 on GFX8/GFX9. GFX10 allocates a fixed number of SGPRs and has 106 addressable SGPRs. pipeline-db (Vega): SGPRS: 5912 -> 6232 (5.41 %) VGPRS: 1772 -> 1780 (0.45 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 88228 -> 87904 (-0.37 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 559 -> 571 (2.15 %) piepline-db (Navi): SGPRS: 341256 -> 363384 (6.48 %) VGPRS: 171536 -> 170960 (-0.34 %) Spilled SGPRs: 832 -> 581 (-30.17 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 14207332 -> 14190872 (-0.12 %) bytes LDS: 33 -> 33 (0.00 %) blocks Max Waves: 18072 -> 18251 (0.99 %) v2: unconditionally count vcc as an extra sgpr on GFX10+ v3: pass SGPRs rounded to 8 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* radv: round vgprs/sgprs before calculating max_wavesRhys Perry2019-10-231-4/+8
| | | | | | | | | | | | | | | | | | | | Note that ACO doesn't correctly round SGPR counts on GFX8/GFX9. pipeline-db (ACO/Vega): SGPRS: 11000 -> 11000 (0.00 %) VGPRS: 3120 -> 3120 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 164328 -> 164328 (0.00 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 1125 -> 1000 (-11.11 %) v2: consider wave32 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* radv: fix a performance regression with graphics depth/stencil clearsSamuel Pitoiset2019-10-232-25/+123
| | | | | | | | | | | | | | | | | | | | I recently changed the slow depth/stencil clear path to make sure depth values are explicitly exported by the fragment shader. This is actually only useful when VK_EXT_depth_range_unrestricted is enabled. While this path is correct, it introduced a performance regression with Heroes of the Storm, Shadow of Mordor (Vulkan beta) and probably more titles. This is because it prevents the hardware to do some optimizations like discarding fragments. This commit re-introduces the previous (a bit faster) slow depth/stencil clear path and it selects the unrestricted path only if VK_EXT_depth_range_unrestricted is enabled. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/863 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix vkUpdateDescriptorSets with inline uniform blocksSamuel Pitoiset2019-10-231-0/+8
| | | | | | | | | | | | | | | descriptorCount is the number of bytes into the descriptor, so it shouldn't be used as an index. srcArrayElement/dstArrayElement specify the starting byte offset within the binding to copy from/to. This fixes new CTS tests: dEQP-VK.binding_model.descriptor_copy.*.inline_uniform_block_* dEQP-VK.binding_model.descriptor_copy.*.mix_3 dEQP-VK.binding_model.descriptor_copy.*.mix_array1 Fixes: 8d2654a4197 ("radv: Support VK_EXT_inline_uniform_block.") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: fix 3D imagesSamuel Pitoiset2019-10-233-17/+17
| | | | | | | | | | | GFX10 does act like GFX9 actually. This fixes dEQP-VK.glsl.texture_functions.query.texturesize.*sampler3d_*. Cc: 19.2 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: re-enable fast depth/stencil clears with separate aspectsSamuel Pitoiset2019-10-231-3/+2
| | | | | | | | | It used to cause weird issues on GFX10 in the past with vkmark and Wreckfest, and they can't be reproduced now. Shadow Of Mordor (Vulkan beta) hits that path and it works fine. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: do not emit rbplus if attachments are undefinedSamuel Pitoiset2019-10-231-0/+3
| | | | | | | | | | | Fixes some crashes with dEQP-VK.geometry.layered.*.secondary_cmd_buffer on Raven and other chips that allow rbplus. This just prevents a crash and rbplus probaby needs more work. Cc: 19.2 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add an assertion in radv_gfx10_compute_bin_size()Samuel Pitoiset2019-10-231-0/+1
| | | | | | | To prevent out of bounds access. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: do not create meta pipelines with 16 samplesSamuel Pitoiset2019-10-232-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | The driver only supports up to 8 samples, so it's useless to create more pipelines than needed. This fixes a conditional jump reported by Valgrind on GFX10: ==194282== Conditional jump or move depends on uninitialised value(s) ==194282== at 0xDBF925A: radv_gfx10_compute_bin_size (radv_pipeline.c:3242) ==194282== by 0xDBF95A6: radv_pipeline_generate_binning_state (radv_pipeline.c:3334) ==194282== by 0xDBFC1A0: radv_pipeline_generate_pm4 (radv_pipeline.c:4440) ==194282== by 0xDBFD15E: radv_pipeline_init (radv_pipeline.c:4764) ==194282== by 0xDBFD23E: radv_graphics_pipeline_create (radv_pipeline.c:4788) ==194282== by 0xDBB95A3: create_pipeline (radv_meta_clear.c:114) ==194282== by 0xDBB9AC5: create_color_pipeline (radv_meta_clear.c:297) ==194282== by 0xDBBCF05: radv_device_init_meta_clear_state (radv_meta_clear.c:1277) ==194282== by 0xDB9ACD9: radv_device_init_meta (radv_meta.c:363) ==194282== by 0xDB7FE3A: radv_CreateDevice (radv_device.c:2080 This is caused by an out of bound access of 'fmask_array' (ie. index is 4 as for 16 samples). Cc: <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* Revert "aco: only emit waitcnt on loop continues if we there was some load ↵Rhys Perry2019-10-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | or export" We don't properly pass on ctx.lgkm_cnt/ctx.barrier_imm/etc, so this waitcnt was necessary for barriers and correctly waiting for SMEM before s_dcache_wb on GFX10. Totals from affected shaders: SGPRS: 33200 -> 33200 (0.00 %) VGPRS: 31376 -> 31376 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 2431804 -> 2433956 (0.09 %) bytes LDS: 316 -> 316 (0.00 %) blocks Max Waves: 1609 -> 1609 (0.00 %) This reverts commit 2c050b49b3d776f054f1265d5523cabb61f22fc3. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: add missing bld.scc()Rhys Perry2019-10-221-1/+1
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: keep can_reorder/barrier when combining addition into SMEMRhys Perry2019-10-221-0/+2
| | | | | | | | | | | | | | | | | | Affects 30 shaders in the pipeline-db (all youngblood). Totals from affected shaders: SGPRS: 2656 -> 2456 (-7.53 %) VGPRS: 2260 -> 2260 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 240680 -> 240944 (0.11 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 90 -> 90 (0.00 %) Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: add a few missing checks in value numberingRhys Perry2019-10-221-1/+4
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco: use ds_read2_b64/ds_write2_b64Rhys Perry2019-10-221-7/+24
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>