summaryrefslogtreecommitdiffstats
path: root/src/amd/common
Commit message (Collapse)AuthorAgeFilesLines
* ac: fix typo DSL_SEL -> DST_SELMarek Olšák2018-07-261-2/+2
|
* nir: rename f2f16_undef to f2f16Karol Herbst2018-07-241-1/+1
| | | | | | | | | | | we need rounding modes on other conversions involving floats and it is easier to rename f2f16_undef than renaming all the other ones. v2: rebased on master Reviewed-by: Jason Ekstrand <[email protected]> Acked-by: Rob Clark <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* radeonsi: Add debug option to enable LLVM GlobalISel (v2)Tom Stellard2018-07-233-2/+18
| | | | | | | | | | R600_DEBUG=gisel will tell LLVM to use GlobalISel rather than SelectionDAG for instruction selection. v2: mareko: move the helper to src/amd/common Signed-off-by: Marek Olšák <[email protected]> Reviewed-by: Tom Stellard <[email protected]>
* ac: add support for 16bit load_push_constantDaniel Schürmann2018-07-231-0/+20
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add support for 16bit input/outputDaniel Schürmann2018-07-231-1/+7
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add support for 16bit buffer loadsDaniel Schürmann2018-07-231-40/+55
| | | | | | v2: Fixed dvec3 loads (bas) Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add support for 16bit UBO loadsDaniel Schürmann2018-07-233-3/+51
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add support for 16bit ssbo storesDaniel Schürmann2018-07-231-60/+84
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add 16bit conversion operationsDaniel Schürmann2018-07-232-9/+31
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: add a workaround for bitfield_extract when count is 0Samuel Pitoiset2018-07-191-3/+17
| | | | | | | | | | | | LLVM 7 returns incorrect results when count is 0, something has been broken since LLVM 6. Of course, the best solution is to fix LLVM but this workaround works as expected for now. Original workaround by Philippe Rebohle. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107276 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: run LLVM optimization passes only on the final function after inliningMarek Olšák2018-07-193-0/+14
|
* radeonsi: add support for Vega20Marek Olšák2018-07-124-1/+9
| | | | Reviewed-by: Alex Deucher <[email protected]>
* python: Use the print functionMathieu Bridon2018-07-061-19/+20
| | | | | | | | | | | | In Python 2, `print` was a statement, but it became a function in Python 3. Using print functions everywhere makes the script compatible with Python versions >= 2.6, including Python 3. Signed-off-by: Mathieu Bridon <[email protected]> Acked-by: Eric Engestrom <[email protected]> Acked-by: Dylan Baker <[email protected]>
* python: Stabilize some script outputsMathieu Bridon2018-07-051-1/+1
| | | | | | | | | | | | In Python, dictionaries and sets are unordered, and as a result their is no guarantee that running this script twice will produce the same output. Using ordered dicts and explicitly sorting items makes the build more reproducible, and will make it possible to verify that we're not breaking anything when we move the build scripts to Python 3. Reviewed-by: Eric Engestrom <[email protected]>
* ac: fold LLVMContext creation into ac_llvm_context_initMarek Olšák2018-07-042-4/+4
| | | | Reviewed-by: Dave Airlie <[email protected]>
* ac: add reusable helpers for direct LLVM compilationMarek Olšák2018-07-043-4/+76
| | | | | | | | | | | | | | | This is basically LLVMTargetMachineEmitToMemoryBuffer inlined and reworked. struct ac_compiler_passes (opaque type) contains the main pass manager. ac_create_llvm_passes -- the result can go to thread local storage ac_destroy_llvm_passes -- can be called by a destructor in TLS ac_compile_module_to_binary -- from LLVMModuleRef to ac_shader_binary The motivation is to do the expensive call addPassesToEmitFile once per context or thread. Reviewed-by: Dave Airlie <[email protected]>
* ac: make some fns staticDave Airlie2018-07-042-13/+6
| | | | | | | Some of the compiler functions are no longer called outside the util file. Reviewed-by: Marek Olšák <[email protected]>
* ac/radv: move llvm compiler info to struct and init in one placeDave Airlie2018-07-042-4/+8
| | | | | | | | | | | | This ports radv to the shared code, however due to a bug in LLVM version prior to 7, radv cannot add target info at this stage, as it would leak one for every shader compile, however I'd prefer to keep this llvm damage in the shared code, since it isn't the driver at fault here. We just add a flag to denote if the driver can support leaking the target info or not, and the common code does the right thing depending on the llvm version. Reviewed-by: Marek Olšák <[email protected]>
* ac/radeonsi: port compiler init/destroy out of radeonsi.Dave Airlie2018-07-042-0/+49
| | | | | | | | | | We want to share this code with radv in the future, so port it out of radeonsi. Add a return value as radv will want that to know if this succeeds Reviewed-by: Marek Olšák <[email protected]>
* radv/radeonsi: add a check ir tm optionsDave Airlie2018-07-041-0/+1
| | | | | | | This doesn't do much yet, but it makes it easier to move the code to a common shared code base. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: rename si_compiler -> ac_llvm_compilerDave Airlie2018-07-041-0/+7
| | | | | | | | As precursor to moving init to common code, just rename the struct and move it. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: add target library info helpersDave Airlie2018-07-042-0/+15
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radv: port to use common passmgr code.Dave Airlie2018-07-041-2/+3
| | | | | | | | This adds a inline always pass, but otherwise should work the same. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/radeonsi: refactor out pass manager init to common code.Dave Airlie2018-07-042-0/+32
| | | | Reviewed-by: Marek Olšák <[email protected]>
* ac/radv: split the non-common init_once code from the common target code. (v2)Dave Airlie2018-07-042-2/+7
| | | | | | | | This just splits out the non-shared code and reuses ac_get_llvm_target in radv. v2: rebase on Marek's patch - fixup brace position/whitespace Reviewed-by: Marek Olšák <[email protected]>
* ac: move all LLVM module initialization into ac_create_moduleMarek Olšák2018-07-022-0/+11
| | | | | | This removes some ugly code around module initialization. Reviewed-by: Dave Airlie <[email protected]>
* ac: set +auto-waitcnt-before-barrier when neededMarek Olšák2018-06-281-2/+4
| | | | | This removes useless s_waitcnt before barriers. Only radeonsi uses this function.
* radeonsi: move CMASK size computation into ac_surfaceMarek Olšák2018-06-252-0/+66
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* ac/surface: move cmask_size/alignment into radeon_surfMarek Olšák2018-06-252-11/+11
| | | | | | cmask_size is changed to uint32_t because it can't be greater than 4GB. Reviewed-by: Timothy Arceri <[email protected]>
* ac/nir: Remove deref chain support.Bas Nieuwenhuizen2018-06-221-354/+50
| | | | | | Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]>
* ac/nir: Add deref interp support.Bas Nieuwenhuizen2018-06-221-6/+27
| | | | | | | Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* ac/nir: Add shared atomic deref instr support.Bas Nieuwenhuizen2018-06-221-1/+25
| | | | | | | Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* ac/nir: Add deref based var loads/stores.Bas Nieuwenhuizen2018-06-221-47/+160
| | | | | | | Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* ac/nir: Add deref support to image intrinsics.Bas Nieuwenhuizen2018-06-221-34/+98
| | | | | | | Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* ac/nir: Implement derefs for integer gather4 lowering.Bas Nieuwenhuizen2018-06-221-3/+22
| | | | | | | Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* ac/nir: Support deref instructions in tex instructions.Bas Nieuwenhuizen2018-06-221-8/+31
| | | | | | | Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* ac/nir: Support deref instructions in get_sampler_desc.Bas Nieuwenhuizen2018-06-221-15/+43
| | | | | | | Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* ac/nir: Implement the deref instr for shared memory.Bas Nieuwenhuizen2018-06-221-0/+31
| | | | | | | | v2: Store the result in ctx->ssa_defs. Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* ac/surface: disallow rotated micro tile modeMarek Olšák2018-06-211-2/+19
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/gpu_info: add radeon_info::num_tcc_blocksMarek Olšák2018-06-192-0/+11
| | | | | | The values for the radeon winsys were copied from the kernel driver. Tested-by: Dieter Nützel <[email protected]>
* ac/surface: Set compressZ for stencil-only surfaces.Bas Nieuwenhuizen2018-06-191-1/+1
| | | | | | | We HTILE compress stencil-only surfaces too. CC: 18.1 <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: Clear meminfo to avoid valgrind warning.Bas Nieuwenhuizen2018-06-161-1/+1
| | | | | | | Somehow valgrind misses that the value is initialized by the ioctl. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: handle undefined EQAA samples in ac_apply_fmask_to_sampleMarek Olšák2018-06-131-2/+4
| | | | | | RADV might wanna use this helper too. Tested-by: Dieter Nützel <[email protected]>
* ac/gpu_info: report real total memory sizesMarek Olšák2018-06-131-28/+54
| | | | | | | The change from MIN2 to MAX2 is intentional. Cc: 18.1 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: fix possible truncation of intrinsic nameTimothy Arceri2018-06-081-1/+1
| | | | | | | | Fixes the gcc warning: snprintf’ output between 26 and 33 bytes into a destination of size 32 Fixes: d5f7ebda3ec0 ("ac: add LLVM build functions for subgroup instrinsics") Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* amd/common: Fix number of coords for getlod.Bas Nieuwenhuizen2018-06-071-3/+18
| | | | | | | | | | The LLVM 6 code reduced it to a non-array call. We need to do that with the new code too. This fixes dEQP-VK.glsl.texture_functions.query.texturequerylod.*array* for radv. Fixes: a9a79934412 "amd/common: use the dimension-aware image intrinsics on LLVM 7+" Reviewed-by: Dave Airlie <[email protected]>
* amd/common: use the dimension-aware image intrinsics on LLVM 7+Nicolai Hähnle2018-06-041-24/+165
| | | | | | Requires LLVM trunk r329166. Acked-by: Marek Olšák <[email protected]>
* radeonsi: remove some old gfx 9.x registersMarek Olšák2018-05-241-48/+0
| | | | | | | Leftover from bring up. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ac/surface/gfx6: don't overallocate mipmapped HTILEMarek Olšák2018-05-241-2/+11
| | | | | Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ac: Use DPP for build_ddxy where possible.Bas Nieuwenhuizen2018-05-231-1/+15
| | | | | | | | | | | | WQM is pretty reliable now on LLVM 7, so let us just use DPP + WQM. This gives approximately a 1.5% performance increase on the vrcompositor built-in benchmark. v2: Use ac_build_quad_swizzle. Reviewed-by: Nicolai Hähnle <[email protected]>