aboutsummaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* radv: gather clip/cull distances in the shader info passSamuel Pitoiset2019-09-062-21/+25
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: move ac_fill_shader_info() to radv_nir_shader_info_pass()Samuel Pitoiset2019-09-062-45/+38
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: merge radv_shader_variant_info into radv_shader_infoSamuel Pitoiset2019-09-066-293/+275
| | | | | | | Having two different structs is useless. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/gfx10: always set ballot_mask_bits to 64Samuel Pitoiset2019-09-061-2/+1
| | | | | | | | The codegen handles it and it adds the correct casts. This fixes a bunch of LLVM validation errors when enabling Wave32 for compute. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: allow specifying filter callback in lower_alu_to_scalarVasily Khoruzhick2019-09-061-1/+1
| | | | | | | | | | | | | Set of opcodes doesn't have enough flexibility in certain cases. E.g. Utgard PP has vector conditional select operation, but condition is always scalar. Lowering all the vector selects to scalar increases instruction number, so we need a way to filter only those ops that can't be handled in hardware. Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* radv: Call nir_propagate_invariant()Connor Abbott2019-09-051-0/+2
| | | | | | | | Without this, invariant qualifiers don't do anything. Together with a fix to the game, this fixes flickering in No Man's Sky. Cc: [email protected] Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: Enable nir_opt_large_constantsConnor Abbott2019-09-051-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vkpipeline-db numbers: Totals: SGPRS: 1740306 -> 1741322 (0.06 %) VGPRS: 1331124 -> 1331712 (0.04 %) Spilled SGPRs: 21201 -> 21316 (0.54 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 256 -> 256 (0.00 %) dwords per thread Code Size: 79022628 -> 78694788 (-0.41 %) bytes LDS: 6500 -> 6500 (0.00 %) blocks Max Waves: 301413 -> 301302 (-0.04 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 53633 -> 54649 (1.89 %) VGPRS: 53000 -> 53588 (1.11 %) Spilled SGPRs: 3454 -> 3569 (3.33 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 5284232 -> 4956392 (-6.20 %) bytes LDS: 2 -> 2 (0.00 %) blocks Max Waves: 4239 -> 4128 (-2.62 %) Wait states: 0 -> 0 (0.00 %) (The biggest VGPR and max wave regression is due to unrolling a loop, which made the scheduler more aggressive, but in this case it's able to effectively hide latency so it's actually probably a win.) shader-db numbers with radeonsi NIR: Totals: SGPRS: 3526496 -> 3526512 (0.00 %) VGPRS: 2198576 -> 2198576 (0.00 %) Spilled SGPRs: 10463 -> 10463 (0.00 %) Spilled VGPRs: 86 -> 86 (0.00 %) Private memory VGPRs: 3182 -> 2528 (-20.55 %) Scratch size: 3308 -> 2640 (-20.19 %) dwords per thread Code Size: 74117280 -> 74106140 (-0.02 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 775846 -> 775844 (-0.00 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 856 -> 872 (1.87 %) VGPRS: 680 -> 680 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 654 -> 0 (-100.00 %) Scratch size: 668 -> 0 (-100.00 %) dwords per thread Code Size: 49652 -> 38512 (-22.44 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 182 -> 180 (-1.10 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Marek Olšák <[email protected]>
* ac/nir: Support load_constant intrinsicsConnor Abbott2019-09-051-0/+55
| | | | | | | Setup a constant global variable that LLVM will stick in a .rodata section and generate PC-relative loads for. Reviewed-by: Marek Olšák <[email protected]>
* radv/radeonsi: Don't count read-only data when reporting code sizeConnor Abbott2019-09-055-3/+13
| | | | | | | | | | We usually use these counts as a simple way to figure out if a change reduces the number of instructions or shrinks an instruction. However, since .rodata sections aren't executed, we shouldn't be counting their size for this analysis. Make the linker return the total executable size, and use it to report the more useful size in both drivers. Reviewed-by: Marek Olšák <[email protected]>
* ac/nir: Fix gather4 integer wa with unnormalized coordinatesConnor Abbott2019-09-031-4/+26
| | | | | | | | | | | | | | This adds a bit of unneccesary code on radeonsi, since whether unnormalized coordinates are used is known at compile time with GL, but I wasn't sure if it was worth the few instructions to plumb everything through, especially for something so rare -- my shader-db doesn't have any instances where this changes anything. Fixes CTS tests I created at https://github.com/cwabbott0/VK-GL-CTS/tree/unnorm-gather-tests Acked-by: Marek Olšák <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: Rewrite gather4 integer workaround based on radeonsiConnor Abbott2019-09-031-65/+85
| | | | | | | | | | The workaround was originally written based on amdgpu-pro traces, but since then radeonsi has got its own slightly different version. Use the radeonsi version instead, to be consistent and because it'll be slightly more convenient for handling unnormalized coordinates. Acked-by: Marek Olšák <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: keep a pointer to a NIR shader into radv_shader_contextSamuel Pitoiset2019-08-301-36/+24
| | | | | | | This avoids multiple copies for nothing and it's more elegant. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: move setting can_discard to ac_fill_shader_info()Samuel Pitoiset2019-08-301-1/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: replace ac_nir_build_if by ac_build_ifccSamuel Pitoiset2019-08-301-107/+13
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: remove radv_init_llvm_target() helperSamuel Pitoiset2019-08-301-33/+1
| | | | | | | RADV no longer uses specific LLVM options compared to the common code. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: remove useless ac_llvm_util.h include from the WSI codeSamuel Pitoiset2019-08-301-1/+0
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: remove unused shader_info parameter in ac_compile_llvm_module()Samuel Pitoiset2019-08-301-3/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: remove some unused fields from radv_shader_contextSamuel Pitoiset2019-08-301-2/+0
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: move lowering PS inputs/outputs at the right placeSamuel Pitoiset2019-08-303-5/+8
| | | | | | | At shaders creation, just after NIR linking. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: gather info about PS inputs in the shader info passSamuel Pitoiset2019-08-304-74/+53
| | | | | | | It's the right place to do that. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* ac: drop now useless lookup_interp_param from ABISamuel Pitoiset2019-08-303-39/+32
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: import linear/perspective PS input parameters from radv/radeonsiSamuel Pitoiset2019-08-302-17/+23
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radv/gfx10: compute the LDS size for exporting PrimID for VSSamuel Pitoiset2019-08-291-0/+9
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: add PKT3_CONTEXT_REG_RMWMarek Olšák2019-08-271-0/+1
| | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]>
* radv: make use of has_ls_vgpr_init_bugSamuel Pitoiset2019-08-273-2/+3
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add has_ls_vgpr_init_bug to ac_gpu_infoSamuel Pitoiset2019-08-272-0/+4
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: add has_msaa_sample_loc_bug to ac_gpu_infoSamuel Pitoiset2019-08-272-0/+6
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: add rbplus_allowed to ac_gpu_infoSamuel Pitoiset2019-08-276-13/+13
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: add has_tc_compat_zrange_bug to ac_gpu_infoSamuel Pitoiset2019-08-276-6/+7
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: add has_gfx9_scissor_bug to ac_gpu_infoSamuel Pitoiset2019-08-276-8/+9
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: add cpdma_prefetch_writes_memory to ac_gpu_infoSamuel Pitoiset2019-08-275-4/+4
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: add has_out_of_order_rast to ac_gpu_infoSamuel Pitoiset2019-08-275-6/+6
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: add has_load_ctx_reg_pkt to ac_gpu_infoSamuel Pitoiset2019-08-275-10/+8
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: add has_rbplus to ac_gpu_infoSamuel Pitoiset2019-08-275-5/+7
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: add has_dcc_constant_encode to ac_gpu_infoSamuel Pitoiset2019-08-275-8/+6
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: add has_distributed_tess to ac_gpu_infoSamuel Pitoiset2019-08-275-6/+6
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: add has_clear_state to ac_gpu_infoSamuel Pitoiset2019-08-275-16/+18
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: drop llvm8 from some load/store helpersSamuel Pitoiset2019-08-271-115/+75
| | | | | | | | Cleanup. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radv: add mipmap support for the clear depth/stencil valuesSamuel Pitoiset2019-08-262-27/+59
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add mipmap support for the TC-compat zrange bugSamuel Pitoiset2019-08-263-24/+51
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: allocate metadata space for mipmapped depth/stencil imagesSamuel Pitoiset2019-08-261-2/+2
| | | | | | | | For each mipmaps, the driver will store the clear values (8-bytes) and the TC-compat zrange value (4-bytes). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: decompress mipmapped depth/stencil images during transitionsSamuel Pitoiset2019-08-263-12/+7
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add mipmaps support for decompress/resummarizeSamuel Pitoiset2019-08-261-25/+32
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add radv_process_depth_image_layer() helperSamuel Pitoiset2019-08-261-63/+75
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: Remove gfx9_stride_size_workaround_for_atomicConnor Abbott2019-08-263-11/+1
| | | | | | | | | The workaround was entirely in common code, and it's needed in radeonsi too so just always do it when necessary. Fixes KHR-GL45.shader_image_load_store.advanced-allStages-oneImage on gfx9 with LLVM 8. Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: add a workaround for viewing a slice of 3D as a 2D imageConnor Abbott2019-08-261-3/+32
| | | | | | | | | | | | GL and Vulkan allow you to bind a single layer of a 3D texture to a 2D image, and we weren't implementing a workaround for that on gfx9 that TGSI was. Copy it over. Fixes KHR-GL45.shader_image_load_store.non-layered_binding with radeonsi NIR. Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix getting the index type size for uint8_tSamuel Pitoiset2019-08-261-1/+1
| | | | | | | | | | 16-bit and 32-bit values match hardware values but 8-bit doesn't. This fixes dEQP-VK.pipeline.input_assembly.* with 8-bit index. Fixes: 372c3dcfdb8 ("radv: implement VK_EXT_index_type_uint8") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]
* radv: Change memory type order for GPUs without dedicated VRAMAlex Smith2019-08-241-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | Put the uncached GTT type at a higher index than the visible VRAM type, rather than having GTT first. When we don't have dedicated VRAM, we don't have a non-visible VRAM type, and the property flags for GTT and visible VRAM are identical. According to the spec, for types with identical flags, we should give the one with better performance a lower index. Previously, apps which follow the spec guidance for choosing a memory type would have picked the GTT type in preference to visible VRAM (all Feral games will do this), and end up with lower performance. On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of other (Feral) games and saw similar improvement on those as well. Signed-off-by: Alex Smith <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: 19.2 <[email protected]> (Bas: CCing this to 19.2-rc due to high impact and limited complexity)
* radv: additional query fixesAndres Rodriguez2019-08-171-7/+8
| | | | | | | | | Make sure we read the updated data from the gpu in cases where WAIT_BIT is not set. Cc: 19.1 19.2 <[email protected] Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: Assert GS input index is constantConnor Abbott2019-08-231-0/+1
| | | | | | If it's not we silently ignore indir_index which is definitely a bug. Reviewed-by: Marek Olšák <[email protected]>