aboutsummaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* aco: coalesce v_mad's accumulator with definition's affinitiesDaniel Schürmann2020-04-221-15/+13
| | | | | | | | Totals from affected shaders: Code Size: 8922676 -> 8915192 (-0.08 %) bytes Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4573>
* aco: use upper part of gap in register file if it is beneficial for stridingDaniel Schürmann2020-04-221-5/+16
| | | | | | | | | | | Totals from affected shaders: SGPRS: 1717288 -> 1716984 (-0.02 %) VGPRS: 1305924 -> 1304904 (-0.08 %) Code Size: 138508892 -> 138420144 (-0.06 %) bytes Max Waves: 115726 -> 115735 (0.01 %) Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4573>
* aco: try to always find a register with stride for even sizesDaniel Schürmann2020-04-221-2/+4
| | | | | | | | | | | Totals from affected shaders: SGPRS: 1162400 -> 1162400 (0.00 %) VGPRS: 947364 -> 946960 (-0.04 %) Code Size: 98399300 -> 98399004 (-0.00 %) bytes Max Waves: 74665 -> 74682 (0.02 %) Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4573>
* aco: stop get_reg_simple after reaching max_used_gprDaniel Schürmann2020-04-221-1/+7
| | | | | Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4573>
* aco: refactor get_reg_simple() to return early on exact matchesDaniel Schürmann2020-04-221-25/+22
| | | | | | | in the best fit algorithm Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4573>
* aco: don't create vector affinities for operands which are not killed or are ↵Daniel Schürmann2020-04-221-1/+1
| | | | | | | | | | | | | duplicates Totals from affected shaders: SGPRS: 825184 -> 825184 (0.00 %) VGPRS: 697640 -> 697240 (-0.06 %) Code Size: 79244104 -> 79201072 (-0.05 %) bytes Max Waves: 42388 -> 42386 (-0.00 %) Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4573>
* aco: allocate full register for subdword definitions if HW doesn't support itDaniel Schürmann2020-04-222-5/+26
| | | | | Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4573>
* aco: move attempt to find strided register into get_reg_simple()Daniel Schürmann2020-04-221-8/+9
| | | | | | | | | | | This simplifies code and helps some shaders Totals from affected shaders: Code Size: 51227172 -> 51202216 (-0.05 %) bytes Max Waves: 19955 -> 19948 (-0.04 %) Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4573>
* aco: use DefInfo in more places to simplify RADaniel Schürmann2020-04-221-42/+19
| | | | | Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4573>
* aco: create and use DefInfo struct in RADaniel Schürmann2020-04-221-45/+71
| | | | | | | for maintaining all information necessary to find a register. Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4573>
* aco: create pseudo dummy instruction in RA to be used for live-range splitsDaniel Schürmann2020-04-221-2/+6
| | | | | Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4573>
* aco: refactor get_reg() to also handle affinitiesDaniel Schürmann2020-04-221-60/+51
| | | | | | | | | | | This simplifies definition handling and helps a few shaders Totals from affected shaders: Code Size: 659540 -> 659376 (-0.02 %) bytes Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4573>
* aco: refactor get_reg() to take Temp instead of RegClassDaniel Schürmann2020-04-221-85/+84
| | | | | | | | This patch also moves get_reg_specified() and get_reg_vec() before get_reg() to make use of it later. Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4573>
* aco: simplify operand handling in RADaniel Schürmann2020-04-221-72/+53
| | | | | Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4573>
* aco: implement 64-bit sgpr swapsRhys Perry2020-04-221-1/+10
| | | | | | | | | | | | | | | In our pipeline-db, helps almost exclusively Detroit: Become Human. Totals from 6726 (5.36% of 125503) affected shaders: CodeSize: 74680952 -> 74102228 (-0.77%) Instrs: 14551507 -> 14406001 (-1.00%) Cycles: 1748272436 -> 1690173104 (-3.32%) VMEM: 964671 -> 964058 (-0.06%) Copies: 1993312 -> 1847806 (-7.30%) Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4469>
* aco: implement sub-dword swapsRhys Perry2020-04-223-140/+320
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4469>
* aco: add VOP3P_instructionRhys Perry2020-04-224-19/+85
| | | | | | | | | The optimizer isn't yet updated to handle this, since lower_to_hw_instr will be the only user for now. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4469>
* aco: fix copy statistic for 64-bit vgpr constant copyRhys Perry2020-04-221-0/+1
| | | | | | | | The statistic is in units of instructions. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4469>
* radv: use common nir_convert_ycbcrJonathan Marek2020-04-201-122/+9
| | | | | | | | | Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: D Scott Phillips <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4528>
* aco: move src1 to vgpr instead of using VOP3 for VOP2 instructions during iselDaniel Schürmann2020-04-201-9/+1
| | | | | | | | | Is simpler and helps a couple of shaders. Totals from affected shaders: (Vega) Code Size: 16341296 -> 16335460 (-0.04 %) bytes Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4642>
* aco: fix 64bit fsubDaniel Schürmann2020-04-201-1/+1
| | | | | | | | Fixes: 425558bfd595ed3a7a049ad0f47a46b8b3c4691e ('aco: use v_subrev_f32 for fsub with an sgpr operand in src1') Reviewed-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4642>
* radeonsi: skip vs output optimizations for some outputsPierre-Eric Pelloux-Prayer2020-04-203-1/+6
| | | | | | | | | | If PT_SPRITE_TEX is enabled, PS inputs are overriden at runtime so we can't apply the vs output optim. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2747 Fixes: 3ec9975555d ("radeonsi: eliminate trivial constant VS outputs") Reviewed-by: Marek Olšák <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4559>
* aco: use v_subrev_f32 for fsub with an sgpr operand in src1Daniel Schürmann2020-04-191-1/+1
| | | | | | | | This fixes an accidentally introduced regression. Fixes: 9be4be515f2a08b9c9e5ae1fc4c5dc9a830c2337 ('aco: implement 16-bit nir_op_fsub/nir_op_fadd') Reviewed-by: Samuel Pitoiset <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4633>
* Fix promotion of floats to doublesAlbert Astals Cid2020-04-183-7/+7
| | | | | | | | | Use the f variants of the math functions if the input parameter is a float, saves converting from float to double and running the double variant of the math function for gaining no precision at all Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3969>
* aco: fix exporting the viewport index if the fragment shader needs itSamuel Pitoiset2020-04-172-2/+4
| | | | | | | | | | | | | It's like the layer, it has to be exported via the pos and also as a varying if the fragment shader reads it. Fixes dEQP-VK.draw.shader_viewport_index.fragment_shader_* Cc: <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Rhys Perry <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4564>
* radv/llvm: fix exporting the viewport index if the fragment shader needs itSamuel Pitoiset2020-04-171-0/+1
| | | | | | | | | | | | | It's like the layer, it has to be exported via the pos and also as a varying if the fragment shader reads it. Fixes dEQP-VK.draw.shader_viewport_index.fragment_shader_* Cc: <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Rhys Perry <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4564>
* radv: set missing SHARED_VGPR_CNT for NGG VS and ACOSamuel Pitoiset2020-04-171-1/+1
| | | | | | | | | | | | | shuffle is implemented with shared VGPRs with ACO and Wave64. Fixes dEQP-VK.subgroups.shuffle.framebuffer.subgroupshuffle*_vertex with Wave64. Fixes: c24d9522dae ("radv: Enable ACO for NGG VS/TES, but disable NGG for ACO GS.") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timur Kristóf <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4595>
* radv: fix geometry shader primitives query with ACO on GFX10Samuel Pitoiset2020-04-174-6/+7
| | | | | | | | | | | Fixes dEQP-VK.query_pool.statistics_query.*.geometry_shader_primitives.*. Fixes: c24d9522dae ("radv: Enable ACO for NGG VS/TES, but disable NGG for ACO GS.") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timur Kristóf <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4593>
* aco: add missing scc clobber to nir_op_unpack_32_2x16_split_yRhys Perry2020-04-161-1/+1
| | | | | | | | | The ISA doc is inconsistent whether this instruction writes SCC. It does. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4552>
* aco: implement various 8/16-bit conversionsRhys Perry2020-04-161-165/+94
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4552>
* amd/addrlib: Use enum instead of sparse chars to identify dimensionsMichel Dänzer2020-04-163-207/+109
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The enum values can be used directly as indices into arrays, simplifying the code. This significantly cuts down the number of CPU cycles spent inside * Addr::V2::Gfx9Lib::HwlComputeDccAddrFromCoord: +------------------------------------------------------------------------+ |+ +++ + x x xx| | |_____AM____| |_A__|| +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 5 14.89 15.44 15.14 15.156 0.24704251 + 5 8.26 9.96 9.37 9.282 0.6262747 Difference at 95.0% confidence -5.874 +/- 0.694294 -38.7569% +/- 4.58098% (Student's t, pooled s = 0.476051) * Addr::V2::CoordEq::solve: +------------------------------------------------------------------------+ | + x | | + + + + x x x x| ||__MA____| |______A__M____|| +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 5 8.11 9.59 9.21 9.02 0.55605755 + 5 4.28 5.05 4.48 4.564 0.32867917 Difference at 95.0% confidence -4.456 +/- 0.666135 -49.4013% +/- 7.38509% (Student's t, pooled s = 0.456744) (The measured numbers are the percentages of samples inside the respective function and its calles for `perf record --call-graph=fp kitty -e false`, measured on a Lenovo Thinkpad E595 (Picasso)) v2: * Add missed 'coords[dim] |= bit << ord;' (Pierre-Eric Pelloux-Prayer) * Put 'ADDR_ASSERT(dim < DIM_S);' where the code previous had 'ADDR_ASSERT_ALWAYS()' for the s/m dimensions. * Use 1u for BitsValid (since it's 32-bit unsigned values). * Use parens in 'BitsValid[dim] & (1u << ord)' for clarity. Acked-by: Marek Olšák <[email protected]> # v1 Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4523>
* radv: do not abort with unknown/unimplemented descriptor typesSamuel Pitoiset2020-04-161-6/+1
| | | | | | | | | | | | | | | | | | To workaround a crash with Wolfeinstein Younglood because the games creates one descriptor with VK_DESCRIPTOR_TYPE_ACCELERATION_STRUCTURE_NV... I reported the problem to Machine Games, but still no answer, so let's remove the unreachable calls (which are technically not unreachable for buggy apps) to help gamers. Note that AMDVLK and AMDGPU-PRO don't crash because they ignore unsupported descriptor types. Cc: <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4571>
* aco: fix emitting stream output with tess eval shadersSamuel Pitoiset2020-04-161-1/+1
| | | | | | | | Fixes dEQP-VK.transform_feedback.simple.winding_patch_list_12. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timur Kristóf <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4553>
* aco: implement nir_op_f2i8/nir_op_f2u8Samuel Pitoiset2020-04-161-2/+4
| | | | | | | | I think we should really refactor the conversions path. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4551>
* radv/aco: do not advertise VK_KHR_shader_subgroup_extended_typesSamuel Pitoiset2020-04-152-3/+3
| | | | | | | | | It's unsupported because small bitsizes are still not completely supported. It should have been disabled by default with ACO. Acked-by: Daniel Schürmann <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4549>
* aco: fix 1D textureGrad() on GFX9Rhys Perry2020-04-151-1/+1
| | | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Fixes: 6f718edcedd ('aco: simplify gathering of MIMG address components') Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4550>
* aco: fix nir_op_frexp_exp with 16-bit floats and negative exponentsSamuel Pitoiset2020-04-151-1/+6
| | | | | | | | | | | | | | | | | v_frexp_exp_i16_f16 returns the two's complement for negative exponents. For example, with 0.333252 it returns 0.666504 for the mantissa and 65535 for the exponent (-1 in decimal). RADV/LLVM and AMDVLK do a v_bfe_i32 and AMDGPU-PRO uses SDWA with the sign extension bit set. The latter is probably what we want to do in long term but for now RA doesn't support changing non-SDWA instructions to SDWA if useful/needed. Fixes dEQP-VK.glsl.builtin.precision_fp16_storage16b.frexp.compute.*. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4546>
* aco: clear moved operands in get_reg_create_vector()Rhys Perry2020-04-141-1/+11
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4507>
* aco: improve p_create_vector RA for sub-dword operandsRhys Perry2020-04-141-17/+32
| | | | | | | | | These's still improvements needed for sub-dword definitions, but that's not as simple. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4507>
* aco: fix p_extract_vector validationRhys Perry2020-04-141-1/+1
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4507>
* aco: improve vector optimization with sub-dword vectorsRhys Perry2020-04-141-11/+22
| | | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4507>
* radv: use RMW packets for updating the maximum sample distanceSamuel Pitoiset2020-04-141-9/+3
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4531>
* radv: add radeon_set_context_reg_rmw() helperSamuel Pitoiset2020-04-141-0/+12
| | | | | | | | | | | For emitting RMW packets in the command stream. This new helper will be useful for implementing extended dynamic states to only overwrite the fields that need to be updated instead of storing more values in the pipeline. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4531>
* aco: fix p_extract_vector optimization in presence of unequally sized vector ↵Daniel Schürmann2020-04-131-22/+27
| | | | | | | operands Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4506>
* aco: fix nir_op_pack_32_2x16_split if one operand is a constantSamuel Pitoiset2020-04-131-0/+2
| | | | | | | | | Because 16-bit constants are represented with the s1 RegClass, we have to extract the low half. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4509>
* aco: implement 16-bit nir_op_f2i64/nir_op_f2u64Samuel Pitoiset2020-04-131-4/+10
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4509>
* aco: fix f2i64/f2u64 with sgprs if the exponent computation overflowSamuel Pitoiset2020-04-131-5/+5
| | | | | | | | This fixes f16->{i64,u64} conversions for +0/-0. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4509>
* aco: make some reg_file helpers private and fix their usesDaniel Schürmann2020-04-101-25/+29
| | | | | | | Fixes various subdword RA issues Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4492>
* aco: rename aco_lower_bool_phis() -> aco_lower_phis()Daniel Schürmann2020-04-105-7/+7
| | | | | | | We also lower subdword phis, now. Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4492>
* aco: lower subdword phis with SGPR operandsDaniel Schürmann2020-04-101-0/+26
| | | | | Reviewed-by: Rhys Perry <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4492>