aboutsummaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* ac: fix the return value in cull_bbox when bbox culling is disabledMarek Olšák2019-12-161-1/+1
| | | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3095>
* ac: fix ac_get_i1_sgpr_mask for Wave32Marek Olšák2019-12-161-2/+11
| | | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3095>
* amd/common: Always use addrlib for HTILE tc-compat.Bas Nieuwenhuizen2019-12-141-11/+4
| | | | | | | | | | | | | | | | | Even without depth+stencil addrlib can (correctly!) decide to disable tc compatible HTILE. One example is 8x sampling with 32-bit depth on Stoney. The row size on Stoney is 1024, while the tile size is 2048, which results in tile splits which are not supported with tc-compat. On Stoney, this fixes dEQP-VK.glsl.builtin_var.fragdepth.*_list_d32_sfloat_multisample_8 CC: <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3054> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3054>
* amd/common: Fix tcCompatible degradation on Stoney.Bas Nieuwenhuizen2019-12-141-1/+1
| | | | | | | | | | | | | addrlib sometimes returns smaller sizes for tcCompat as it does not seem to take into account the depth+stencil matching config gymnastics with tcCompat. This fixes dEQP-VK.pipeline.render_to_image.core.2d_array.huge.height.r8g8b8a8_unorm_d32_sfloat_s8_uint CC: <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3054>
* radv: handle unaligned vertex fetches on GFX6/GFX10Samuel Pitoiset2019-12-131-47/+86
| | | | | | | | | | | | | | | | | | | | The Vulkan spec doesn't have any words for vertex attributes alignment. Fixes a test failure on GFX6 and a GPU hang on GFX10 with: dEQP-VK.spirv_assembly.instruction.spirv1p4.entrypoint.tess_con_pc_entry_point vkpipeline-db results on GFX10: Totals from affected shaders: SGPRS: 463772 -> 472972 (1.98 %) VGPRS: 343208 -> 343752 (0.16 %) Spilled SGPRs: 323 -> 336 (4.02 %) Spilled VGPRs: 0 -> 0 (0.00 %) Code Size: 13806200 -> 14164472 (2.60 %) bytes Max Waves: 84021 -> 83755 (-0.32 %) Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2161 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: enable SpvCapabilityImageMSArraySamuel Pitoiset2019-12-121-0/+1
| | | | | | | | | | | | The Vulkan spec says that StorageImageMultisample and ImageMSArray SPIRV-V capabilities must be enabled if the shaderStorageImageMultisample feature is supported. This fixes a warning with RenderDoc. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2212 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: fix out-of-bound access when loading constants from globalSamuel Pitoiset2019-12-121-4/+14
| | | | | | | | | | | | | | | | Global load/store instructions can't know if the offset is out-of-bound because they don't use descriptors (no range). Fix this by clamping the offset for arrays that are indexed with a non-constant offset that's greater or equal to the array size. This fixes VM faults and GPU hangs with Dead Rising 4. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2148 Fixes: 71a67942003 ("ac/nir: Enable nir_opt_large_constants") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Fix RGBX Android<->Vulkan format correspondence.Bas Nieuwenhuizen2019-12-111-1/+1
| | | | | | | This is correct per the Vulkan spec format equivalence table. Fixes: f36b52740a0 "radv/android: Add android hardware buffer queries." Reviewed-by: Eric Anholt <[email protected]>
* radv: implement VK_KHR_separate_depth_stencil_layoutsSamuel Pitoiset2019-12-106-7/+93
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: initialize HTILE for separate depth/stencil aspectsSamuel Pitoiset2019-12-103-19/+29
| | | | | | | | It either clears the whole HTILE buffer or part of it depending on the HTILE mask parameter. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: do not init HTILE as compressed state when dst layout allows itSamuel Pitoiset2019-12-101-13/+5
| | | | | | | | I don't think this makes much differences and a potential clear following the initialization will overwrite HTILE anyways. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: synchronize after performing a separate depth/stencil fast clearsSamuel Pitoiset2019-12-101-0/+10
| | | | | | | | | | For depth+stencil images, the driver might use an optimized path if only one aspect is cleared. It either clears the depth or the stencil part of HTILE. Because the two separate aspects might use the same HTILE memory we have to synchronize. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix possibly wrong PA_SC_AA_CONFIG value for conservative rastSamuel Pitoiset2019-12-102-10/+7
| | | | | | | | | | | | PA_SC_AA_CONFIG might be updated when conversative rasterization is enabled. Because the driver only re-emits the multisample state if the number of samples is different, that register value might not be updated correctly. Found by inspection, doesn't fix anything known. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: move emission of two PA_SC_* registers to the pipeline CSSamuel Pitoiset2019-12-102-4/+3
| | | | | | | They don't have to be updated dynamically. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: do not use VK_TRUE/VK_FALSESamuel Pitoiset2019-12-091-12/+12
| | | | | | | For consistency. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* aco: propagate temporaries into expanded vectorsDaniel Schürmann2019-12-071-2/+7
| | | | | | | | | Gives a very slight decrease in code size: Totals from affected shaders: Code Size: 1708488 -> 1702768 (-0.33 %) bytes Max Waves: 2858 -> 2855 (-0.10 %) Reviewed-by: Rhys Perry <[email protected]>
* aco: improve readfirstlane after uniform ssbo loads on GFX7Daniel Schürmann2019-12-071-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | pipeline-db changes for GFX7: 80310 shaders in 40472 tests Totals: SGPRS: 3655900 -> 3643916 (-0.33 %) VGPRS: 2678324 -> 2686324 (0.30 %) Spilled SGPRs: 1730 -> 1634 (-5.55 %) Spilled VGPRs: 14 -> 21 (50.00 %) Scratch size: 15540 -> 15536 (-0.03 %) dwords per thread Code Size: 136106120 -> 135457616 (-0.48 %) bytes LDS: 1259 -> 1259 (0.00 %) blocks Max Waves: 601014 -> 600206 (-0.13 %) Totals from affected shaders: SGPRS: 307832 -> 295848 (-3.89 %) VGPRS: 267864 -> 275864 (2.99 %) Spilled SGPRs: 770 -> 674 (-12.47 %) Spilled VGPRs: 14 -> 21 (50.00 %) Scratch size: 16 -> 12 (-25.00 %) dwords per thread Code Size: 22007488 -> 21358984 (-2.95 %) bytes LDS: 65 -> 65 (0.00 %) blocks Max Waves: 28668 -> 27860 (-2.82 %) Reviewed-by: Rhys Perry <[email protected]>
* aco: use soffset for MUBUF instructions on SI/CIDaniel Schürmann2019-12-071-15/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | pipeline-db changes for GFX7: 80310 shaders in 40472 tests Totals: SGPRS: 3655300 -> 3655900 (0.02 %) VGPRS: 2677732 -> 2678324 (0.02 %) Spilled SGPRs: 1730 -> 1730 (0.00 %) Spilled VGPRs: 14 -> 14 (0.00 %) Scratch size: 15540 -> 15540 (0.00 %) dwords per thread Code Size: 136488364 -> 136106120 (-0.28 %) bytes LDS: 1259 -> 1259 (0.00 %) blocks Max Waves: 601039 -> 601014 (-0.00 %) Totals from affected shaders: SGPRS: 316312 -> 316912 (0.19 %) VGPRS: 273844 -> 274436 (0.22 %) Spilled SGPRs: 770 -> 770 (0.00 %) Spilled VGPRs: 14 -> 14 (0.00 %) Scratch size: 16 -> 16 (0.00 %) dwords per thread Code Size: 22724904 -> 22342660 (-1.68 %) bytes LDS: 114 -> 114 (0.00 %) blocks Max Waves: 30861 -> 30836 (-0.08 %) Reviewed-by: Rhys Perry <[email protected]>
* radv: Enable ACO on GFX7 (Sea Islands)Daniel Schürmann2019-12-071-2/+3
| | | | | | | | This patch also disables AMD_shader_ballot on GFX7 by default if ACO is used. Note that shader_ballot works correctly, but performance seems inferior. To enable shader_ballot use RADV_PERFTEST=shader_ballot. Reviewed-by: Rhys Perry <[email protected]>
* aco: return to loop_active mask at continue_or_break blocksDaniel Schürmann2019-12-071-13/+4
| | | | Reviewed-by: Rhys Perry <[email protected]>
* radv: disable Youngblood app profile if ACO is usedDaniel Schürmann2019-12-071-2/+3
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: implement exclusive scan for SI/CIDaniel Schürmann2019-12-071-2/+35
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: implement inclusive_scan for SI/CIDaniel Schürmann2019-12-071-5/+41
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: implement (clustered) reductions for SI/CIDaniel Schürmann2019-12-072-39/+74
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: don't use a scalar temporary for reductions on GFX10Daniel Schürmann2019-12-072-3/+3
| | | | | | This patch also adds the scalar temporary for scans on SI/CI Reviewed-by: Rhys Perry <[email protected]>
* aco: flush denorms after fmin/fmax on pre-GFX9Daniel Schürmann2019-12-071-15/+46
| | | | Reviewed-by: Rhys Perry <[email protected]>
* radv: only flush scalar cache for SSBO writes with ACO on GFX8+Daniel Schürmann2019-12-071-1/+2
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: disable disassembly for SI/CI due to lack of support by LLVMDaniel Schürmann2019-12-071-0/+4
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: implement 64bit ine/ieq for SI/CIDaniel Schürmann2019-12-071-5/+7
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: implement 64bit i2b for SI /CIDaniel Schürmann2019-12-071-2/+7
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: make 1/2*PI a literal constant on SI/CIDaniel Schürmann2019-12-074-15/+19
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: implement 64bit VGPR shifts for SI/CIDaniel Schürmann2019-12-071-7/+27
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: split read/writelane opcode into VOP2/VOP3 version for SI/CIDaniel Schürmann2019-12-079-35/+72
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: fix disassembly of writelane instructions.Daniel Schürmann2019-12-071-1/+7
| | | | | | | ACO writes an unused 3rd operand for internal usage which makes LLVM recoginize it as illegal instruction. Reviewed-by: Rhys Perry <[email protected]>
* aco: recognize SI/CI SMRD hazardsDaniel Schürmann2019-12-071-2/+27
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: implement quad swizzles for SI/CIDaniel Schürmann2019-12-071-30/+75
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: move buffer_store data to VGPR if neededDaniel Schürmann2019-12-071-1/+1
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: implement nir_op_isign on SI/CIDaniel Schürmann2019-12-071-2/+7
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: only use scalar loads for readonly buffers on SI/CIDaniel Schürmann2019-12-071-1/+1
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: implement nir_op_fquantize2f16 for SI/CIDaniel Schürmann2019-12-071-7/+16
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: fix SMEM offsets for SI/CIDaniel Schürmann2019-12-071-1/+2
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: SI/CI - fix sampler anisoDaniel Schürmann2019-12-071-5/+20
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: handle gfx7 int8/10 clamping on exportsDave Airlie2019-12-071-8/+37
| | | | | | Co-authored-by: Daniel Schürmann <[email protected]> Reviewed-by: Rhys Perry <[email protected]>
* aco: Initial GFX7 SupportDaniel Schürmann2019-12-074-72/+95
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco: refactor visit_store_fs_output() to use the BuilderDaniel Schürmann2019-12-071-49/+15
| | | | Reviewed-by: Rhys Perry <[email protected]>
* aco/wave32: Fix reductions.Timur Kristóf2019-12-043-30/+45
| | | | | Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco/wave32: Allow setting the subgroup ballot size to 64-bit.Timur Kristóf2019-12-042-4/+8
| | | | | | | | | Previously, it would only work when the ballot size was set to the lane mask. This patch makes is possible to set the ballot size to either 32-bit or 64-bit for wave32 mode. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco/wave32: Use wave_size for barrier intrinsic.Timur Kristóf2019-12-042-3/+3
| | | | | | Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco/wave32: Fix load_local_invocation_index to support wave32.Timur Kristóf2019-12-041-3/+15
| | | | | Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>
* aco/wave32: Use lane mask regclass for exec/vcc.Timur Kristóf2019-12-0412-209/+250
| | | | | | | | | Currently all usages of exec and vcc are hardcoded to use s2 regclass. This commit makes it possible to use s1 in wave32 mode and s2 in wave64 mode. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]>