summaryrefslogtreecommitdiffstats
path: root/src/amd/common
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi/gfx9: use SET_UCONFIG_REG_INDEX packets when availableNicolai Hähnle2018-12-192-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* ac/surface: 3D and cube surfaces are never displayableNicolai Hähnle2018-12-191-3/+5
| | | | Reviewed-by: Marek Olšák <[email protected]>
* amd/common: add i1 special case to ac_build_{inclusive,exclusive}_scanNicolai Hähnle2018-12-191-2/+25
| | | | | | | Allow for a unified but efficient treatment of adding a bitmask over a wave or an entire threadgroup. Reviewed-by: Marek Olšák <[email protected]>
* amd/common: scan/reduce across waves of a workgroupNicolai Hähnle2018-12-192-4/+227
| | | | | | | Order-aware scan/reduce can trade-off LDS traffic for external atomics memory traffic in producer/consumer compute shaders. Reviewed-by: Marek Olšák <[email protected]>
* amd/common: add ac_build_ifccNicolai Hähnle2018-12-192-4/+4
| | | | Reviewed-by: Marek Olšák <[email protected]>
* amd/common: whitespace fixesNicolai Hähnle2018-12-191-10/+8
| | | | Reviewed-by: Marek Olšák <[email protected]>
* amd/sid_tables: add additional python3 compatibility importsNicolai Hähnle2018-12-191-1/+1
| | | | | | This happened to bite me while doing some experiments. Reviewed-by: Marek Olšák <[email protected]>
* nir: Rename Boolean-related opcodes to include 32 in the nameJason Ekstrand2018-12-161-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a squash of a bunch of individual changes: nir/builder: Generate 32-bit bool opcodes transparently nir/algebraic: Remap Boolean opcodes to the 32-bit variant Use 32-bit opcodes in the NIR producers and optimizations Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c Use 32-bit opcodes in the NIR back-ends Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
* ac: split 16-bit ssbo loads that may not be dword alignedRhys Perry2018-12-161-0/+2
| | | | | | | Fixes: 7e7ee826982 ('ac: add support for 16bit buffer loads') Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108114 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: refactor visit_load_bufferRhys Perry2018-12-162-44/+42
| | | | | | | This is so that we can split different types of loads more easily. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* amd: remove support for LLVM 6.0Samuel Pitoiset2018-12-066-298/+32
| | | | | | | User are encouraged to switch to LLVM 7.0 released in September 2018. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nir: Make boolean conversions sized just like the othersJason Ekstrand2018-12-051-4/+8
| | | | | | | | | Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is one if 8, 16, 32, or 64. This leads to having a few more opcodes but now everything is consistent and booleans aren't a weird special case anymore. Reviewed-by: Connor Abbott <[email protected]>
* amd/addrlib: update Mesa's copy of addrlibNicolai Hähnle2018-11-291-2/+2
| | | | | | | | Update to the internal master as of 2018-11-15. This has a lot of gratuitous whitespace change, but on the plus side it's built using the same tooling that's used for AMDVLK, which should help going forward.
* ac/surface/gfx9: let addrlib choose the preferred swizzle kindNicolai Hähnle2018-11-291-18/+4
| | | | | | | | | Our choices here are simply redundant as long as sin.flags is set correctly. (v2: - remove unused function parameter) Reviewed-by: Marek Olšák <[email protected]>
* radv: remove dependency on addrlib gfx9_enum.hNicolai Hähnle2018-11-291-0/+3
| | | | | | | v2: - use SI_CONTEXT_REG_OFFSET Reviewed-by: Dave Airlie <[email protected]>
* ac: handle cast derefsDave Airlie2018-11-211-0/+3
| | | | | | Just give back the same value for now. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: handle loading from shared pointersDave Airlie2018-11-211-9/+18
| | | | | | | | | | We won't have a var to load from, so don't try to the processing required if we don't need it. This avoids crashes in: dEQP-VK.spirv_assembly.instruction.compute.variable_pointers.compute.workgroup_two_buffers Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: avoid casting pointers on bcsel and storesDave Airlie2018-11-213-3/+14
| | | | | | | | For variable pointers we really don't want to case the pointers to int without a good reason, just add a wrapper for bcsel loading and result storing. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: fix intrinsic name string size in visit_image_atomic()Samuel Pitoiset2018-11-201-1/+1
| | | | | | | | Fixes an assertion in SoTTR. Fixes: dd0172e865 ("radv: Use structured intrinsics instead of indexing workaround for GFX9.") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Use structured intrinsics instead of indexing workaround for GFX9.Bas Nieuwenhuizen2018-11-192-7/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These force the index to be used in the instruction so we don't need the workaround. Totals: SGPRS: 1321642 -> 1321802 (0.01 %) VGPRS: 943664 -> 943788 (0.01 %) Spilled SGPRs: 28468 -> 28480 (0.04 %) Spilled VGPRs: 88 -> 89 (1.14 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 80 -> 80 (0.00 %) dwords per thread Code Size: 52415292 -> 52338932 (-0.15 %) bytes LDS: 400 -> 400 (0.00 %) blocks Max Waves: 233903 -> 233803 (-0.04 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 238344 -> 238504 (0.07 %) VGPRS: 232732 -> 232856 (0.05 %) Spilled SGPRs: 13125 -> 13137 (0.09 %) Spilled VGPRs: 88 -> 89 (1.14 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 80 -> 80 (0.00 %) dwords per thread Code Size: 15752712 -> 15676352 (-0.48 %) bytes LDS: 139 -> 139 (0.00 %) blocks Max Waves: 31680 -> 31580 (-0.32 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/surface: remove the overallocation workaround for Vega12Marek Olšák2018-11-091-4/+0
| | | | | | not needed anymore (probably since the tile_swizzle fix) Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: use LOAD_CONTEXT_REG when loading fast clear valuesSamuel Pitoiset2018-11-081-0/+1
| | | | | | | | | This avoids syncing the Micro Engine. This is only supported for VI+ currently. There is probably a way for using LOAD_CONTEXT_REG on previous chips but that could be done later. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* ac/nir_to_llvm: fix b2f for f64Timothy Arceri2018-11-071-3/+12
| | | | | | Fixes: d7e0d47b9de3 ("nir: Add a bunch of b2[if] optimizations") Reviewed-by: Dave Airlie <[email protected]>
* amd: Make vgpr-spilling depend on llvm versionJan Vesely2018-11-021-1/+2
| | | | | | | | | The option was removed in LLVM r345763 Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* ac/nir: make use of i1false in few more placesSamuel Pitoiset2018-11-011-3/+3
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radv: use WAIT_REG_MEM_GREATER_OR_EQUAL instead of a magic valueSamuel Pitoiset2018-10-311-0/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radeonsi: add support for Raven2 (v2)Marek Olšák2018-10-305-0/+16
| | | | | | v2: fix enabling primitive binning Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: fix ac_build_fdiv for f64Marek Olšák2018-10-291-1/+2
| | | | | | trivial Fixes: a5f35aa742c
* radv: implement VK_EXT_transform_feedbackSamuel Pitoiset2018-10-291-0/+1
| | | | | | | | This implementation should work and potential bugs can be fixed during the release candidates window anyway. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* util: use C99 declaration in the for-loop hash_table_foreach() macroEric Engestrom2018-10-251-1/+0
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* amd/common: check DRM version 3.27 for JPEG decodeLeo Liu2018-10-231-1/+1
| | | | | | | | | | JPEG was added after DRM version 3.26 Signed-off-by: Leo Liu <[email protected]> Fixes: 4558758c51749(amd/common: add vcn jpeg ip info query) Cc: Boyuan Zhang <[email protected]> Cc: Alex Smith <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* amd/common: add vcn jpeg ip info queryBoyuan Zhang2018-10-231-2/+12
| | | | | Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* ac: Fix loading a dvec3 from an SSBOConnor Abbott2018-10-221-2/+2
| | | | | | | | | The comment was wrong, since the loop above casts to a type with the correct bitsize already. Fixes: 7e7ee82698247d8f93fe37775b99f4838b0247dd ("ac: add support for 16bit buffer loads") Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: Introduce ac_build_expand()Connor Abbott2018-10-222-14/+29
| | | | | | | | And implement ac_bulid_expand_to_vec4() on top of it. Fixes: 7e7ee82698247d8f93fe37775b99f4838b0247dd ("ac: add support for 16bit buffer loads") Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add helpers for fast integer division by a constantMarek Olšák2018-10-162-0/+78
|
* radeonsi: save raster config in screen, add se_tile_repeatMarek Olšák2018-10-162-3/+13
|
* radeonsi: rename si_gfx_* functions to si_cp_*Marek Olšák2018-10-161-0/+1
| | | | and write_event_eop -> release_mem
* radeonsi: make si_gfx_write_event_eop more configurableMarek Olšák2018-10-161-0/+5
|
* ac/nir: Use context-specific LLVM typesAlex Smith2018-10-161-2/+2
| | | | | | | | | | | | | | | LLVMInt*Type() return types from the global context and therefore are not safe for use in other contexts. Use types from our own context instead. Fixes frequent crashes seen when doing multithreaded pipeline creation. Fixes: 4d0b02bb5a "ac: add support for 16bit load_push_constant" Fixes: 7e7ee82698 "ac: add support for 16bit buffer loads" Cc: "18.2" <[email protected]> Signed-off-by: Alex Smith <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: emit the GLC bit for SSBO loads/stores when neededSamuel Pitoiset2018-10-123-8/+22
| | | | | | | | | This fixes some new memory model tests: dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.* Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108112 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add ac_build_roundMarek Olšák2018-10-063-3/+19
|
* ac: correct PKT3_COPY_DATA definitionsMarek Olšák2018-10-061-2/+9
|
* ac: simplify LLVM alloca helpersMarek Olšák2018-10-061-7/+4
|
* ac: define all address spaces properlyMarek Olšák2018-10-063-10/+12
|
* radv: do not use the availability bit for timestamp queriesSamuel Pitoiset2018-09-281-0/+1
| | | | | | | | | | | It's unnecessary because we can just check if the timestamp is to different to the default value when a pool is created or resetted. Instead of waiting for the availability bit to be 1, we have to emit a not equal WAIT_REG_MEM for checking if the timestamp is ready. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* ac: add 16-bit support to ac_build_bitfield_reverse()Samuel Pitoiset2018-09-171-0/+5
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add 16-bit support to ac_build_bit_count()Samuel Pitoiset2018-09-171-0/+5
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add 16-bit support to ac_find_lsb()Samuel Pitoiset2018-09-171-2/+13
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add 16-bit support to ac_build_umsb()Samuel Pitoiset2018-09-171-2/+16
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add 16-bit support to ac_build_isign()Samuel Pitoiset2018-09-171-5/+16
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>