summaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* ac/nir: use ac_build_buffer_load_format for image buffer loadsMarek Olšák2018-02-011-8/+13
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: add glc parameter to ac_build_buffer_load_formatMarek Olšák2018-02-013-3/+5
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi: load the right number of components for VS inputs and TBOsMarek Olšák2018-02-012-0/+38
| | | | | | | | | | | | | | | | | | | | | | | The supported counts are 1, 2, 4. (3=4) The following snippet loads float, vec2, vec3, and vec4: Before: buffer_load_format_x v9, v4, s[0:3], 0 idxen ; E0002000 80000904 buffer_load_format_xyzw v[0:3], v5, s[8:11], 0 idxen ; E00C2000 80020005 s_waitcnt vmcnt(0) ; BF8C0F70 buffer_load_format_xyzw v[2:5], v6, s[12:15], 0 idxen ; E00C2000 80030206 s_waitcnt vmcnt(0) ; BF8C0F70 buffer_load_format_xyzw v[5:8], v7, s[4:7], 0 idxen ; E00C2000 80010507 After: buffer_load_format_x v10, v4, s[0:3], 0 idxen ; E0002000 80000A04 buffer_load_format_xy v[8:9], v5, s[8:11], 0 idxen ; E0042000 80020805 buffer_load_format_xyzw v[0:3], v6, s[12:15], 0 idxen ; E00C2000 80030006 s_waitcnt vmcnt(0) ; BF8C0F70 buffer_load_format_xyzw v[3:6], v7, s[4:7], 0 idxen ; E00C2000 80010307 Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: do not insert shaders in cache when it's disabledSamuel Pitoiset2018-02-011-5/+24
| | | | | | | | | | | | | When the application doesn't provide its own pipeline cache, the driver uses a in-memory cache but it shouldn't insert any entries when the cache is explicitely disabled by the user. Found while running my experimental pipeline-db tool with a ton of shaders, the memory footprint was just huge, and sometimes the process was even killed... Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: use separate bindings for graphics and compute descriptorsSamuel Pitoiset2018-02-013-53/+125
| | | | | | | | | | | | | The Vulkan spec says: "pipelineBindPoint is a VkPipelineBindPoint indicating whether the descriptors will be used by graphics pipelines or compute pipelines. There is a separate set of bind points for each of graphics and compute, so binding one does not disturb the other." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104732 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: store the bind point when creating descriptors with templatesSamuel Pitoiset2018-02-012-0/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: do not dump meta shader statsSamuel Pitoiset2018-01-312-21/+18
| | | | | | | That's quite useless and that pollutes the output. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: fix emission of ffract for 64-bitSamuel Pitoiset2018-01-311-7/+16
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: remove predication on cache flushesMatthew Nicholls2018-01-314-18/+13
| | | | | | | | | This can lead to a situation where cache flushes could get conditionally disabled while still clearing the flush_bits, and thus flushes due to application pipeline barriers may never get executed. Fixes: a6c2001ace (radv: add support for cmd predication.) Signed-off-by: Dave Airlie <[email protected]>
* ac/radeonsi: add lookup_interp_param and load_sample_position to the abiTimothy Arceri2018-01-312-29/+42
| | | | | | | This will enable the interpolateAt builtins to work on the radeonsi nir backend. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/nir: add prim_mask to the abiTimothy Arceri2018-01-312-6/+6
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/nir: add si_nir_lookup_interp_param() helperTimothy Arceri2018-01-311-0/+2
| | | | Reviewed-by: Marek Olšák <[email protected]>
* ac/nir_to_llvm: move some interp defines to the headerTimothy Arceri2018-01-312-4/+5
| | | | | | These will be used in the following patch. Reviewed-by: Marek Olšák <[email protected]>
* radv: Merge raster state with PM4 generation.Bas Nieuwenhuizen2018-01-302-75/+50
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Move gs state out of pipeline.Bas Nieuwenhuizen2018-01-302-43/+43
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Split out cliprect rule generation.Bas Nieuwenhuizen2018-01-302-25/+33
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Merge VGT_GS_MODE computation with PM4 generation.Bas Nieuwenhuizen2018-01-302-28/+25
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Split out processing the vertex input state.Bas Nieuwenhuizen2018-01-301-35/+43
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Move tessellation state out of pipeline.Bas Nieuwenhuizen2018-01-302-50/+58
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Move blend state out of pipeline.Bas Nieuwenhuizen2018-01-302-67/+72
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Split out generating VGT_SHADER_STAGES_EN.Bas Nieuwenhuizen2018-01-302-24/+27
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Split out the ia_multi_vgt_param precomputation.Bas Nieuwenhuizen2018-01-303-91/+106
| | | | | | | | | Also moved everything in a struct and then return the struct from the helper function, so it is clear in the caller what part of the pipeline gets modified. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Split out db_shader_control computation.Bas Nieuwenhuizen2018-01-302-22/+22
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Compute shader_z_format when emitting it.Bas Nieuwenhuizen2018-01-302-8/+3
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Merge depth stencil state with PM4 generation.Bas Nieuwenhuizen2018-01-302-73/+58
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Merge ps_input_cntl computation with PM4 generation.Bas Nieuwenhuizen2018-01-302-83/+79
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Merge vtx_reuse_depth computation with PM4 generation.Bas Nieuwenhuizen2018-01-302-8/+6
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Merge vs state computation with PM4 generation.Bas Nieuwenhuizen2018-01-302-58/+34
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Merge binning state generation with pm4 emission.Bas Nieuwenhuizen2018-01-302-35/+19
| | | | | | | We don't need the pipeline state struct anymore. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Constify some pipeline helpers.Bas Nieuwenhuizen2018-01-302-6/+6
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Add PM4 pregeneration for compute pipelines.Bas Nieuwenhuizen2018-01-302-58/+68
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Record a PM4 sequence for graphics pipeline switches.Bas Nieuwenhuizen2018-01-303-451/+483
| | | | | | | | | | | | | This gives about 2% performance improvement on dota2 for me. This is mostly a mechanical copy and replacement, but at bind time we still do: 1) Some stuff that is only based on num_samples changes. 2) Some command buffer state setting. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Determine unneeded dynamic states.Bas Nieuwenhuizen2018-01-303-38/+64
| | | | | | | Which avoids setting or emitting them. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/llvm: bump the number of results to 8.Dave Airlie2018-01-311-1/+1
| | | | | | | | | | | This function can get access for a 64-bit dvec4, which means we have to load 8 components. This fixes: R600_DEBUG=nir ./bin/shader_runner generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-abs-dvec4.shader_test -auto Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nir: add vs_inputs_dual_locations compiler optionTimothy Arceri2018-01-301-0/+1
| | | | | | | | | | | | | Allows nir drivers to either use a single or dual locations for vs double inputs. i965 uses dual locations for both OpenGL and Vulkan drivers, for now gallium OpenGL drivers only use a single location. The following patch will also make use of this option when calling nir_shader_gather_info(). Reviewed-by: Karol Herbst <[email protected]>
* radv/gfx9: fix block compression texture views. (v2)Dave Airlie2018-01-301-4/+49
| | | | | | | | | | | | | | | | This ports a fix from amdvlk, to fix the sizing for mip levels when block compressed images are viewed using uncompressed views. My original fix didn't power the clamping, but it looks like the clamping is required to stop the sizing going too large. Fixes: dEQP-VK.image.texel_view_compatible.graphic.extended*bc* Doesn't crash DOW3 anymore. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Fixes: e38685cc62e 'Revert "radv: disable support for VEGA for now."' Signed-off-by: Dave Airlie <[email protected]>
* radv: Signal fence correctly after sparse binding.Bas Nieuwenhuizen2018-01-291-14/+32
| | | | | | | | | It did not signal syncobjs in the fence, and also signalled too early if there was work on the queue already, as we have to wait till that work is done. Fixes: d27aaae4d2 "radv: Add external fence support." Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: fix indentationTimothy Arceri2018-01-291-6/+6
| | | | Reviewed-by: Dave Airlie <[email protected]>
* ac: remove unused nir2llvmtype()Timothy Arceri2018-01-291-22/+0
| | | | | | The last use of this was removed in the previous patch. Reviewed-by: Dave Airlie <[email protected]>
* ac: fix gs load inputs typeTimothy Arceri2018-01-291-2/+3
| | | | | | | This fixes the scenario where the input is a struct. With this the Unreal engines Elemental demo now works on radeonsi. Reviewed-by: Dave Airlie <[email protected]>
* ac/nir: call glsl_get_sampler_dim() only once where possibleKai Wasserbäch2018-01-291-8/+11
| | | | | | | | | | Changes since v1: * Rebased on top of e68150de263156a3f3d1b609b6506c5649967f61 and 82adf53308c137ce0dc5f2d5da4e7cc40c5b808c. Signed-off-by: Kai Wasserbäch <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* ac: rename and move si_const_array into common codeMarek Olšák2018-01-273-13/+16
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: move address space definitions to common codeMarek Olšák2018-01-272-6/+4
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: don't use byval LLVM qualifier in shadersMarek Olšák2018-01-274-9/+3
| | | | | | | shader-db doesn't show any regression and 32-bit pointers with byval are declared as VGPRs for some reason. Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: set amdgpu.uniform and invariant.load for SSBOsSamuel Pitoiset2018-01-261-1/+7
| | | | | | | For descriptors. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: set amdgpu.uniform and invariant.load for UBOsSamuel Pitoiset2018-01-261-1/+7
| | | | | | | | | | UBOs are constants buffers. Cc: "18.0" <[email protected]> Fixes: 41c36c45 ("amd/common: use ac_build_buffer_load() for emitting UBO loads") Signed-off-by: Samuel Pitoiset <[email protected]> Tested-by: Alex Smith <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: set the noalias attribute on input pointersSamuel Pitoiset2018-01-261-0/+1
| | | | | | | | This attribute is similar to the definition of restrict in C99 and it might help LLVM. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: only load used channels when sampling buffer viewsSamuel Pitoiset2018-01-261-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | This allows to reduce the number of dwords that are loaded with buffer_load_format_xyzw. For example, when the only used channel is 1, the driver will emit buffer_load_format_x instead. Shader stats for DOW3 (with some local hacky scripts for SPIRV): 143 shaders in 143 tests Totals: SGPRS: 5344 -> 5352 (0.15 %) VGPRS: 3476 -> 3452 (-0.69 %) Spilled SGPRs: 30 -> 29 (-3.33 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 269860 -> 269808 (-0.02 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 1267 -> 1272 (0.39 %) Wait states: 0 -> 0 (0.00 %) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ac: pass the number of channels to ac_build_buffer_load_format()Samuel Pitoiset2018-01-263-14/+7
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ac: add ac_build_buffer_load_common() helperSamuel Pitoiset2018-01-261-21/+40
| | | | | | | | For both versions of llvm.amdgcn.buffer.load.{format}.*. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>