summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/radeonsi
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: don't set spi_ps_input_* for monolithic shadersMarek Olšák2019-06-241-13/+26
| | | | | | | | The driver doesn't use these values and ac_rtld has assertions expecting the value of 0. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: rename and re-document cache flush flagsMarek Olšák2019-06-2410-64/+66
| | | | | | | SMEM and VMEM caches are L0 on gfx10. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: fix AMD_DEBUG=nofmaskMarek Olšák2019-06-244-14/+20
| | | | | Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: flatten the switch for DPBB tunablesMarek Olšák2019-06-241-14/+4
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* radeonsi: set the calling convention for inlined function callsMarek Olšák2019-06-242-2/+2
| | | | | | | otherwise the behavior is undefined Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* radeonsi: refactor si_update_vgt_shader_configNicolai Hähnle2019-06-242-28/+60
| | | | | | | | | | We'll have to extend this at some point, and using a bitfield union in this way makes it easier to get the right index without excessive branching. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* amd/common: lower bitfield_extract to ubfe/ibfe.Daniel Schürmann2019-06-241-0/+1
| | | | Reviewed-by: Connor Abbott <[email protected]>
* amd/common: lower bitfield_insert to bfm & bitfield_selectDaniel Schürmann2019-06-241-0/+1
| | | | Reviewed-by: Connor Abbott <[email protected]>
* ac/rtld: check correct LDS max sizeMarek Olšák2019-06-191-0/+3
| | | | Tested-by: Dieter Nützel <[email protected]>
* radeonsi: add s_sethalt to shaders for debuggingNicolai Hähnle2019-06-192-0/+4
| | | | Tested-by: Dieter Nützel <[email protected]>
* ac,radeonsi: Always mark buffer stores as inaccessiblememonlyConnor Abbott2019-06-193-19/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | inaccessiblememonly means that it doesn't modify memory accesible via normal LLVM pointers. This lets LLVM's dead store elimination, memcpy forwarding, etc. ignore functions with this attribute. We don't represent descriptors as pointers, so this property is always true of buffer and image stores. There are plans to represent descriptors via pointers, but this just means that now nothing is inaccessiblememonly, as LLVM will then understand loads/stores via its usual alias analysis. Radeonsi was mistakenly only setting it if the driver could prove that there were no reads, and then it was cargo-culted into ac_llvm_build and ac_llvm_to_nir. Rip it out of everything. statistics with nir enabled: Totals from affected shaders: SGPRS: 152 -> 152 (0.00 %) VGPRS: 128 -> 132 (3.12 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 9324 -> 9244 (-0.86 %) bytes LDS: 2 -> 2 (0.00 %) blocks Max Waves: 17 -> 17 (0.00 %) Wait states: 0 -> 0 (0.00 %) The only difference was a manhattan31 shader. Acked-by: Timothy Arceri <[email protected]> Acked-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fix undefined shift in macro definitionDave Airlie2019-06-191-1/+1
| | | | | | Pointed out by coverity Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* amd: update addrlibMarek Olšák2019-06-171-0/+24
| | | | | Acked-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: reduce MAX_GEOMETRY_OUTPUT_VERTICESNicolai Hähnle2019-06-171-1/+4
| | | | | | This fixes piglit [email protected]@gs-max-output on gfx9. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add radeon_info::is_amdgpu instead of checking drm_major == 3Marek Olšák2019-06-146-8/+8
| | | | | | and clean up Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi: add radeonsi_debug_disassembly optionNicolai Hähnle2019-06-122-6/+10
| | | | | | | | | | This dumps disassembly to the pipe_debug_callback together with shader stats. Can be used together with shader-db to get full disassembly of all shaders in the database. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fix line splitting in si_shader_dump_assemblyNicolai Hähnle2019-06-121-1/+1
| | | | | | | Compute the count since the start of the current line instead of the count since the start of the the disassembly. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: raise the alignment of LDS memory for compute shadersNicolai Hähnle2019-06-121-1/+1
| | | | | | | This implies that the memory will always be at address 0, which allows LLVM to generate slightly better code. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: use an explicit symbol for the LSHS LDS memoryNicolai Hähnle2019-06-122-2/+20
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: rename lds_{load,store} to lshs_lds_{load,store}Nicolai Hähnle2019-06-121-17/+16
| | | | | | | These functions are now only used in LS/HS shaders (both separate and merged). Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/gfx9: declare LDS ESGS ring as an explicit symbol on LLVM >= 9Nicolai Hähnle2019-06-123-36/+94
| | | | | | | | | | | | | | | | | | | This will make it easier to use LDS for other purposes in geometry shaders in the future. The lifetime of the esgs_ring variable is as follows: - declared as [0 x i32] while compiling shader parts or monolithic shaders - just before uploading, gfx9_get_gs_info computes (among other things) the final ESGS ring size (this depends on both the ES and the GS shader) - during upload, the "esgs_ring" symbol is given to ac_rtld as a shared LDS symbol, which will lead to correctly laying out the LDS including other LDS objects that may be defined in the future - si_shader_gs uses shader->config.lds_size as the LDS size This change depends on the LLVM changes for emitting LDS symbols into the ELF file. Reviewed-by: Marek Olšák <[email protected]>
* amd/rtld: layout and relocate LDS symbolsNicolai Hähnle2019-06-125-33/+66
| | | | | | | | | | | Upcoming changes to LLVM will emit LDS objects as symbols in the ELF symbol table, with relocations that will be resolved with this change. Callers will also be able to define LDS symbols that are shared between shader parts. This will be used by radeonsi for the ESGS ring in gfx9+ merged shaders. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: cleanup some #includesNicolai Hähnle2019-06-121-2/+2
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: inline si_shader_binary_read_config into its only callerNicolai Hähnle2019-06-122-16/+7
| | | | | | | Since it can only be used for reading the config of an individual, non-combined shader, it is not very reusable anyway. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: use the new run-time linker for shadersNicolai Hähnle2019-06-129-237/+272
| | | | | | | v2: - fix a memory leak Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: don't declare pointers to static stringsNicolai Hähnle2019-06-121-2/+2
| | | | | | | | The compiler should be able to optimize them away, but still. There's no point in declaring those as pointers, and if the compiler *doesn't* optimize them away, they add unnecessary load-time relocations. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: dump shader binary buffer contentsNicolai Hähnle2019-06-122-0/+19
| | | | | | | Help identify bugs related to corruption of shaders in memory, or errors in shader upload / rtld. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: return bool from si_shader_binary_uploadNicolai Hähnle2019-06-124-19/+16
| | | | | | We didn't really use error codes anyway. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: let si_shader_create return a booleanNicolai Hähnle2019-06-124-16/+14
| | | | | | We didn't really use error codes anyway. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: use ac_shader_configNicolai Hähnle2019-06-123-126/+25
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: don't test SDMA perf if SDMA is disabled/unsupportedMarek Olšák2019-06-111-0/+3
|
* radeonsi: always interpolate PrimID as flatMarek Olšák2019-06-111-1/+2
|
* radeonsi: move color clamping to si_llvm_export_vs to unify the codeMarek Olšák2019-06-111-80/+67
|
* radeonsi: use the ac helper for index buffer stores in the culling shaderMarek Olšák2019-06-112-13/+10
|
* radeonsi: use the ac helper for image storesMarek Olšák2019-06-111-29/+6
|
* radeonsi: use the ac helper for SSBO storesMarek Olšák2019-06-111-22/+16
|
* radeonsi: fixes for vec3 buffer stores in LLVM 9Marek Olšák2019-06-111-3/+10
|
* radeonsi: Don't force dcc disable for loadsConnor Abbott2019-06-062-13/+0
| | | | | | | | | | | | | When e9d935ed0e2 added force_dcc_off(), we forced it off for any preloaded image descriptor which had stores associated with them, since the same preloaded descriptors were used for loads and stores. However, when the preloading was removed in 16be87c9042, the existing logic was kept despite it not being necessary anymore. The comment above force_dcc_off() only mentions stores, so only force DCC off for stores. Cc: Nicolai Hähnle <[email protected]> Cc: Marek Olšák <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Enable NIR's lower_fmod option.Kenneth Graunke2019-06-051-0/+1
| | | | | | | | | | | | | Currently, st/mesa is always calling the GLSL IR lower_instructions() pass with MOD_TO_FLOOR set, so mod operations will be lowered before ever reaching NIR. This enables the same lowering at the NIR level, which will let me shut off the GLSL IR path for NIR-based drivers. The AMD NIR backend also has code to handle fmod, so we could potentially skip this and still be fine. I don't have an opinion on that. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/nir: Fix type in bindless address computationConnor Abbott2019-06-041-2/+2
| | | | | | Bindless handles in GL are 64-bit. This fixes an assert failure in LLVM. Reviewed-by: Marek Olšák <[email protected]>
* amd/common: use generated register headerNicolai Hähnle2019-06-038-9/+6
|
* amd/common: use SH{0,1}_CU_EN definitions only of COMPUTE_STATIC_THREAD_MGMT_SE0Nicolai Hähnle2019-06-031-5/+5
| | | | | | | The automatic header generation unifies identical registers in a series and only emits definitions for the first one. This is mostly to avoid emitting excessive definitions for CB registers, but special-casing an exception for this family of registers doesn't seem worth it.
* amd/common: unify PITCH_GFX6 and PITCH_GFX9Nicolai Hähnle2019-06-032-7/+7
| | | | | | | | | | | The definition of the fields differs, but PITCH_GFX9 is a mere extension of PITCH_GFX6 that does not conflict with any other fields. This aligns the definitions with what will be generated from the register JSON. The information about how large the fields really are is preserved in the register database.
* amd/common: cleanup DATA_FORMAT/NUM_FORMAT field namesNicolai Hähnle2019-06-032-8/+8
| | | | | | | | | | The field layout wasn't actually changed in gfx9, so having the suffix isn't very useful. The field *contents* were changed, but this is reflected in the V_xxx_xxx definitions and is taken into account by the ac_debug logic based on the register JSON. This aligns the definitions with what will be generated from the register JSON.
* radeonsi: init sctx->dma_copy before using itPierre-Eric Pelloux-Prayer2019-06-031-3/+3
| | | | | | | | | | | | | | Commit a1378639ab19 reordered context functions initializations but broke sctx->b.resource_copy_region init when using AMD_DEBUG=forcedma. In this case sctx->dma_copy was assigned a value after being used in: sctx->b.resource_copy_region = sctx->dma_copy; This commit moves the FORCE_DMA special case after sctx->dma_copy initialization. See https://bugs.freedesktop.org/show_bug.cgi?id=110422 Signed-off-by: Marek Olšák <[email protected]>
* ac: use amdgpu-flat-work-group-sizeMarek Olšák2019-06-031-5/+2
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/nir: Remove hack for builtinsConnor Abbott2019-05-311-11/+2
| | | | | | | | | | We now bounds check properly in the uniform loading fast path, so there's no need to disable it by pretending there are other UBO bindings in use. The way this looks at the variable name was causing problems when two piglit shaders, one with a name that triggered the hack and one that didn't, got hashed to the same thing after stripping out the names. Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi/nir: Use correct location for uniform access boundConnor Abbott2019-05-311-1/+1
| | | | | | | | | location is the API-level location, but driver_location is the actual location the uniform gets passed to the driver. This apparently only caused failures with builtins, where the location is 0 because it's represented via the state tokens instead. Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi/nir: Correctly handle double TCS/TES varyingsConnor Abbott2019-05-311-4/+28
| | | | | | | | | | | ac expands the store to 32-bit components for us, but we still have to deal with storing up to 8 components, and when a varying is split across two vec4 slots we have to calculate the address again for the second slot, since they aren't adjacent in memory. I didn't do this on the ac level because we should generate better indexing arithmetic for the lds store, where slots are contiguous. Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi: fix timestamp queries for compute-only contextsMarek Olšák2019-05-291-3/+5
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Jan Vesely <[email protected]>