aboutsummaryrefslogtreecommitdiffstats
path: root/src/amd
Commit message (Collapse)AuthorAgeFilesLines
* radv: remove predication on cache flushesMatthew Nicholls2018-01-314-18/+13
| | | | | | | | | This can lead to a situation where cache flushes could get conditionally disabled while still clearing the flush_bits, and thus flushes due to application pipeline barriers may never get executed. Fixes: a6c2001ace (radv: add support for cmd predication.) Signed-off-by: Dave Airlie <[email protected]>
* ac/radeonsi: add lookup_interp_param and load_sample_position to the abiTimothy Arceri2018-01-312-29/+42
| | | | | | | This will enable the interpolateAt builtins to work on the radeonsi nir backend. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/nir: add prim_mask to the abiTimothy Arceri2018-01-312-6/+6
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/nir: add si_nir_lookup_interp_param() helperTimothy Arceri2018-01-311-0/+2
| | | | Reviewed-by: Marek Olšák <[email protected]>
* ac/nir_to_llvm: move some interp defines to the headerTimothy Arceri2018-01-312-4/+5
| | | | | | These will be used in the following patch. Reviewed-by: Marek Olšák <[email protected]>
* radv: Merge raster state with PM4 generation.Bas Nieuwenhuizen2018-01-302-75/+50
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Move gs state out of pipeline.Bas Nieuwenhuizen2018-01-302-43/+43
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Split out cliprect rule generation.Bas Nieuwenhuizen2018-01-302-25/+33
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Merge VGT_GS_MODE computation with PM4 generation.Bas Nieuwenhuizen2018-01-302-28/+25
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Split out processing the vertex input state.Bas Nieuwenhuizen2018-01-301-35/+43
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Move tessellation state out of pipeline.Bas Nieuwenhuizen2018-01-302-50/+58
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Move blend state out of pipeline.Bas Nieuwenhuizen2018-01-302-67/+72
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Split out generating VGT_SHADER_STAGES_EN.Bas Nieuwenhuizen2018-01-302-24/+27
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Split out the ia_multi_vgt_param precomputation.Bas Nieuwenhuizen2018-01-303-91/+106
| | | | | | | | | Also moved everything in a struct and then return the struct from the helper function, so it is clear in the caller what part of the pipeline gets modified. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Split out db_shader_control computation.Bas Nieuwenhuizen2018-01-302-22/+22
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Compute shader_z_format when emitting it.Bas Nieuwenhuizen2018-01-302-8/+3
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Merge depth stencil state with PM4 generation.Bas Nieuwenhuizen2018-01-302-73/+58
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Merge ps_input_cntl computation with PM4 generation.Bas Nieuwenhuizen2018-01-302-83/+79
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Merge vtx_reuse_depth computation with PM4 generation.Bas Nieuwenhuizen2018-01-302-8/+6
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Merge vs state computation with PM4 generation.Bas Nieuwenhuizen2018-01-302-58/+34
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Merge binning state generation with pm4 emission.Bas Nieuwenhuizen2018-01-302-35/+19
| | | | | | | We don't need the pipeline state struct anymore. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Constify some pipeline helpers.Bas Nieuwenhuizen2018-01-302-6/+6
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Add PM4 pregeneration for compute pipelines.Bas Nieuwenhuizen2018-01-302-58/+68
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Record a PM4 sequence for graphics pipeline switches.Bas Nieuwenhuizen2018-01-303-451/+483
| | | | | | | | | | | | | This gives about 2% performance improvement on dota2 for me. This is mostly a mechanical copy and replacement, but at bind time we still do: 1) Some stuff that is only based on num_samples changes. 2) Some command buffer state setting. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Determine unneeded dynamic states.Bas Nieuwenhuizen2018-01-303-38/+64
| | | | | | | Which avoids setting or emitting them. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/llvm: bump the number of results to 8.Dave Airlie2018-01-311-1/+1
| | | | | | | | | | | This function can get access for a 64-bit dvec4, which means we have to load 8 components. This fixes: R600_DEBUG=nir ./bin/shader_runner generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-abs-dvec4.shader_test -auto Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nir: add vs_inputs_dual_locations compiler optionTimothy Arceri2018-01-301-0/+1
| | | | | | | | | | | | | Allows nir drivers to either use a single or dual locations for vs double inputs. i965 uses dual locations for both OpenGL and Vulkan drivers, for now gallium OpenGL drivers only use a single location. The following patch will also make use of this option when calling nir_shader_gather_info(). Reviewed-by: Karol Herbst <[email protected]>
* radv/gfx9: fix block compression texture views. (v2)Dave Airlie2018-01-301-4/+49
| | | | | | | | | | | | | | | | This ports a fix from amdvlk, to fix the sizing for mip levels when block compressed images are viewed using uncompressed views. My original fix didn't power the clamping, but it looks like the clamping is required to stop the sizing going too large. Fixes: dEQP-VK.image.texel_view_compatible.graphic.extended*bc* Doesn't crash DOW3 anymore. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Fixes: e38685cc62e 'Revert "radv: disable support for VEGA for now."' Signed-off-by: Dave Airlie <[email protected]>
* radv: Signal fence correctly after sparse binding.Bas Nieuwenhuizen2018-01-291-14/+32
| | | | | | | | | It did not signal syncobjs in the fence, and also signalled too early if there was work on the queue already, as we have to wait till that work is done. Fixes: d27aaae4d2 "radv: Add external fence support." Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: fix indentationTimothy Arceri2018-01-291-6/+6
| | | | Reviewed-by: Dave Airlie <[email protected]>
* ac: remove unused nir2llvmtype()Timothy Arceri2018-01-291-22/+0
| | | | | | The last use of this was removed in the previous patch. Reviewed-by: Dave Airlie <[email protected]>
* ac: fix gs load inputs typeTimothy Arceri2018-01-291-2/+3
| | | | | | | This fixes the scenario where the input is a struct. With this the Unreal engines Elemental demo now works on radeonsi. Reviewed-by: Dave Airlie <[email protected]>
* ac/nir: call glsl_get_sampler_dim() only once where possibleKai Wasserbäch2018-01-291-8/+11
| | | | | | | | | | Changes since v1: * Rebased on top of e68150de263156a3f3d1b609b6506c5649967f61 and 82adf53308c137ce0dc5f2d5da4e7cc40c5b808c. Signed-off-by: Kai Wasserbäch <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* ac: rename and move si_const_array into common codeMarek Olšák2018-01-273-13/+16
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: move address space definitions to common codeMarek Olšák2018-01-272-6/+4
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: don't use byval LLVM qualifier in shadersMarek Olšák2018-01-274-9/+3
| | | | | | | shader-db doesn't show any regression and 32-bit pointers with byval are declared as VGPRs for some reason. Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: set amdgpu.uniform and invariant.load for SSBOsSamuel Pitoiset2018-01-261-1/+7
| | | | | | | For descriptors. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: set amdgpu.uniform and invariant.load for UBOsSamuel Pitoiset2018-01-261-1/+7
| | | | | | | | | | UBOs are constants buffers. Cc: "18.0" <[email protected]> Fixes: 41c36c45 ("amd/common: use ac_build_buffer_load() for emitting UBO loads") Signed-off-by: Samuel Pitoiset <[email protected]> Tested-by: Alex Smith <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: set the noalias attribute on input pointersSamuel Pitoiset2018-01-261-0/+1
| | | | | | | | This attribute is similar to the definition of restrict in C99 and it might help LLVM. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: only load used channels when sampling buffer viewsSamuel Pitoiset2018-01-261-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | This allows to reduce the number of dwords that are loaded with buffer_load_format_xyzw. For example, when the only used channel is 1, the driver will emit buffer_load_format_x instead. Shader stats for DOW3 (with some local hacky scripts for SPIRV): 143 shaders in 143 tests Totals: SGPRS: 5344 -> 5352 (0.15 %) VGPRS: 3476 -> 3452 (-0.69 %) Spilled SGPRs: 30 -> 29 (-3.33 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 269860 -> 269808 (-0.02 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 1267 -> 1272 (0.39 %) Wait states: 0 -> 0 (0.00 %) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ac: pass the number of channels to ac_build_buffer_load_format()Samuel Pitoiset2018-01-263-14/+7
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ac: add ac_build_buffer_load_common() helperSamuel Pitoiset2018-01-261-21/+40
| | | | | | | | For both versions of llvm.amdgcn.buffer.load.{format}.*. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radv: fix RADV_DEBUG=syncshaders on GFX9Samuel Pitoiset2018-01-261-1/+10
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix a GPU hang with RADV_DEBUG=syncshadersSamuel Pitoiset2018-01-261-8/+7
| | | | | | | | | The GPU hangs when the driver forces a PS_PARTIAL_FLUSH after a dispatch call (and vice versa for graphics). Something has changed in the kernel driver because it used to work. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/shader: scan if fragment shaders write memorySamuel Pitoiset2018-01-265-16/+39
| | | | | | | It's better to do that in ac_shader_info. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/nir: only canonicalize 32-bit float min/max outputs on pre-GFX9Samuel Pitoiset2018-01-261-2/+8
| | | | | | | | | | According to LLVM, only pre-GFX9 targets do not flush denorms for fmin/fmax. All dEQP-VK.glsl.builtin.precision.* still pass. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: don't enable tc compat for d32s8 + 4/8 samples (v1.1)Dave Airlie2018-01-261-1/+2
| | | | | | | | | | | | | | | | | | This seems to be broken, at least the cts tests fail. This fixes: dEQP-VK.renderpass.suballocation.multisample.d32_sfloat_s8_uint.samples_4 dEQP-VK.renderpass.suballocation.multisample.d32_sfloat_s8_uint.samples_8 2 samples seems to pass fine, amdvlk doesn't appear to enable TC for possibly some other reasons here. This is most likely a hack. v1.1: add a bit of explaination text. (Samuel) Fixes: ad3d98da9 (radv: enable tc compatible htile for d32s8 also.) Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/nir: add break statements in needs_view_index_sgpr()Samuel Pitoiset2018-01-251-0/+2
| | | | | | | | | Previous code is correct but as the first case statement uses a break, keep it consistent. CID: 1428579 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add multisample Z optimisation from amdvlkDave Airlie2018-01-251-0/+3
| | | | | | | | This was just found while reading for other stuff, src/core/hw/gfxip/gfx6/gfx6DepthStencilView.cpp. Reviewed-by: Samuel Pitoiset <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: move spi_baryc_cntl to pipelineDave Airlie2018-01-253-5/+5
| | | | | | | | | | | | We need to enable the pos float location 2 mode anytime we have persample not just when forced by the frag shader. This fixes: dEQP-VK.pipeline.multisample.min_sample_shading* Fixes: 58c97a079 (radv: enable location at sample when persample is forced.) Reviewed-by: Samuel Pitoiset <[email protected]> Signed-off-by: Dave Airlie <[email protected]>