aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/radeonsi/si_state_shaders.c
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: enable distributed tess on multi-SE parts onlyMarek Olšák2016-06-291-1/+1
| | | | | | | ported from Vulkan Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set optimal VGT_HS_OFFCHIP_PARAMMarek Olšák2016-06-291-10/+39
| | | | | | | ported from Vulkan Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: use r600_resource_referenceMarek Olšák2016-06-251-3/+1
| | | | | | Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Vedran Miletić <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use trapezoid distribution for tess on Fiji and PolarisNicolai Hähnle2016-06-201-3/+7
| | | | | | | This yields a small performance improvement in Unigine Heaven. Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Don't offset OFFCHIP_BUFFERING on pre-VI cards.Bas Nieuwenhuizen2016-05-301-2/+6
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96239 Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: always reserve output space for tess factorsMarek Olšák2016-05-271-1/+6
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Dave Airlie <[email protected]>
* radeonsi: Allow TES distribution between shader engines.Bas Nieuwenhuizen2016-05-261-15/+24
| | | | | | | | | | | | | The R_028B50_VGT_TESS_DISTRIBUTION value is copied from amdgpu-pro. Smaller values in the ACCUM fields seem to decrease the performance advantage from this patch, higher values don't seem to matter. v2: Add distribution mode field enums. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Enable dynamic HS.Bas Nieuwenhuizen2016-05-261-1/+1
| | | | | | | | | | This allows running the TES on different CU's than the TCS which results in performance improvements. v2: Only write the control word from one invocation. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Store inputs to memory when not using a TCS.Bas Nieuwenhuizen2016-05-261-0/+3
| | | | | | | | | | | | | | | | | We need to copy the VS outputs to memory. I decided to do this using a shader key, as the value depends on other shaders. I also switch the fixed function TCS over to monolithic, as otherwisze many of the user SGPR's need to be passed to the epilog, which increases register pressure, or complexity to avoid that. The main body of the fixed function TCS is not that interesting to precompile anyway, since we do it on demand and it is very small. v2: Use u_bit_scan64. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Add offchip tessellation parameters.Bas Nieuwenhuizen2016-05-261-0/+9
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Add buffer for offchip storage between TCS and TES.Bas Nieuwenhuizen2016-05-261-0/+18
| | | | | | | | | | | The buffer is quite large, but should only be allocated if the application uses tessellation. Most non-games don't. v2: - Use the correct register for SI. - Add define for block size. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Change default behaviour for undefined COLOR0Axel Davy2016-05-181-0/+3
| | | | | | | | | d3d 9 needs COLOR0 to be 1.0 on all channels when undefined. 0.0 for the others is fine. GL behaviour is undefined. Signed-off-by: Axel Davy <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: always allocate export memory for pixel shadersNicolai Hähnle2016-05-091-5/+10
| | | | | | | | Experiments with framebuffer-no-attachments type draw calls have shown that NULL exports stall terribly unless we ensure that export memory is allocated by the SPI. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fix undefined behavior (memcpy arguments must be non-NULL)Nicolai Hähnle2016-05-071-1/+3
| | | | | Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: remove helpers converting to/from TGSI_PROCESSOR_*Marek Olšák2016-04-221-1/+1
| | | | Acked-by: Jose Fonseca <[email protected]>
* gallium: use PIPE_SHADER_* everywhere, remove TGSI_PROCESSOR_*Marek Olšák2016-04-221-7/+7
| | | | Acked-by: Jose Fonseca <[email protected]>
* radeonsi: remove the shader parameter from si_set_ring_bufferMarek Olšák2016-04-221-10/+9
| | | | | | | | not used anymore this is a follow-up to the RW buffer cleanup. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: move default tess level constant buffer to RW buffersMarek Olšák2016-04-221-8/+7
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: rename and rearrange RW buffer slotsMarek Olšák2016-04-221-8/+8
| | | | | | | | | - use an enum - use a unique slot number regardless of the shader stage (the per-stage slots will go away for RW buffers) Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: Add config parameter to si_shader_apply_scratch_relocs.Bas Nieuwenhuizen2016-04-211-1/+1
| | | | | | | shader->config is not updated for compute kernels. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Tom Stellard <[email protected]>
* radeonsi: fold num_user_sgprs where it is possibleMarek Olšák2016-04-141-16/+4
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: fix SGPRS calculation once moreMarek Olšák2016-04-141-55/+12
| | | | | | | | | | | | | This fixes GS piglit failures after adding SI_PARAM_SHADER_BUFFERS, which bumped NUM_USER_SGPRS and uncovered this bug on SI. If this was fixed in LLVM, these workarounds wouldn't be needed. LLVM would have to look at the calling convention to know how many SGPR inputs are declared, and add VCC and the scratch wave offset (which is enabled even if we spill SGPRs but not VGPRs, oh well). Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: move scissor and viewport states into gallium/radeonMarek Olšák2016-04-121-22/+3
| | | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Grigori Goronzy <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: disable early Z if the fragment shader writes to memoryNicolai Hähnle2016-03-211-2/+12
| | | | | | Empirically, both the EXEC_ON_* flags and LATE_Z are necessary. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: process TGSI property NEXT_SHADERMarek Olšák2016-03-191-0/+27
| | | | | | | | | | | | This allows compiling the main shader part as ES or LS. If we get the correct hint, non-separable GLSL shaders no longer have to be compiled as VS first, followed by LS or ES compiled on demand. The result is that fewer shaders are compiled by piglit, but it doesn't improve piglit running time. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set DEPTH_BEFORE_SHADER based on FS_EARLY_DEPTH_STENCILNicolai Hähnle2016-03-141-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: use re-ZMarek Olšák2016-03-011-3/+17
| | | | | | | | | This can increase perf for shaders that kill pixels (kill, alpha-test, alpha-to-coverage). v2: add comments Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: implement binary shaders & shader cache in memory (v2)Marek Olšák2016-02-211-4/+235
| | | | | | | v2: handle _mesa_hash_table_insert failure other cosmetic changes Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move some struct si_shader members to new struct si_shader_infoMarek Olšák2016-02-211-9/+9
| | | | | | This will be part of shader binaries. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: compile non-GS middle parts of shaders immediately if enabledMarek Olšák2016-02-211-6/+30
| | | | | | | | | | | | | | Still disabled. Only prologs & epilogs are compiled in draw calls, but each variant of those is compiled only once per process. VS is always compiled as hw VS. TES is always compiled as hw VS. LS and ES stages are always compiled on demand. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add PS prologMarek Olšák2016-02-211-0/+7
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: separate out shader key bits for prologs & epilogsMarek Olšák2016-02-211-44/+47
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: enable denorms for 64-bit and 16-bit floatsMarek Olšák2016-02-091-6/+12
| | | | | | | This fixes FP16 conversion instructions for VI, which has 16-bit floats, but not SI & CI, which can't disable denorms for those instructions. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: compile geometry shaders immediatelyMarek Olšák2016-02-091-1/+2
| | | | | | they have only 1 variant Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: split out code for deleting si_shaderMarek Olšák2016-02-091-29/+36
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove useless code that handles dx10_clamp_modeMarek Olšák2016-02-091-6/+6
| | | | | | | "enable-no-nans-fp-math" is a wrong string and there was a disagreement about fixing it. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: read SPI_PS_INPUT_ADDR from LLVM if it returns itMarek Olšák2016-02-091-1/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: implement forcing per-sample_interpolation using the shader key onlyMarek Olšák2016-02-091-83/+24
| | | | | | | | | | | It was partly a state and partly emulated by shader code, but since we want to do this in a fragment shader prolog, we need to put it into the shader key, which will be used to generate the prolog. This also removes the spi_ps_input states and moves the registers to the PS state. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move BCOLOR PS input locations after all other inputsMarek Olšák2016-02-091-15/+37
| | | | | | | | | | | | | | | | BCOLOR inputs were immediately after COLOR inputs. Thus, all following inputs were offset by 1 if color_two_side was enabled, and not offset if it was not enabled, which is a variation that's problematic if we want to have 1 variant per shader and the variant doesn't care about color_two_side (that should be handled by other bytecode attached at the beginning). Instead, move BCOLOR inputs after all other inputs, so BCOLOR0 is at location "num_inputs" if it's present. BCOLOR1 is next. This also allows removing si_shader::nparam and si_shader::ps_input_param_offset, which are useless now. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move SPI_PS_INPUT_CNTL value computation to a separate functionMarek Olšák2016-02-091-34/+40
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: generate a color_two_side variant only if the shader reads colorsMarek Olšák2016-02-091-1/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: rework RB+ for StoneyMarek Olšák2016-02-021-0/+3
| | | | | | | | | | | | | | | This fixes it. States which also need to be taken into account: - SPI color formats - each down-conversion format supports only a limited set of SPI formats - whether MSAA resolving and logic op are enabled These need special handling: - blending - disabled channels Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: rename cb_target_mask state to cb_render_stateMarek Olšák2016-02-021-1/+1
| | | | | | | | and rename a variable in the function. SX_PS_DOWNCONVERT will be emitted here. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix shader precompilation for shader-dbMarek Olšák2016-01-261-9/+35
| | | | | | | | | | The addition of spi_shader_col_format killed all color outputs in precompiled shaders. Reviewed-by: Michel Dänzer <[email protected]> (v1) Reviewed-by: Nicolai Hähnle <[email protected]> (v1) v2: also set the alpha func (trivial)
* radeonsi: replace use of is_gs_copy_shader in si_shader_vsNicolai Hähnle2016-01-251-1/+1
| | | | | | | | We now have an explicit parameter that contains the same information, and this will allow us to get rid of is_gs_copy_shader in the si_shader struct. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: ensure that VGT_GS_MODE is sent when necessaryNicolai Hähnle2016-01-251-8/+21
| | | | | | | | | | | | | | | Specifically, when the API switches from using a GS to not using a GS and then back to using the same GS again, we do not have to re-send all the GS state, but we do have to send VGT_GS_MODE. So make VGT_GS_MODE consistently be a part of the VS state. This fixes a rendering bug in Dolphin, but surely other applications are affected as well. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93648 Cc: "11.0 11.1" <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: extract the VGT_GS_MODE calculation into its own functionNicolai Hähnle2016-01-251-19/+28
| | | | | | Cc: "11.0 11.1" <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* Revert "radeonsi: fix discard-only fragment shaders (v2)"Nicolai Hähnle2016-01-221-4/+0
| | | | | | | This reverts commit 843855bbf0da2204ce536623ba957bfa83fdbd52. It became redundant due to Marek's earlier pushed 8667a1ae which achieves the same thing.
* radeonsi: fix discard-only fragment shaders (v2)Nicolai Hähnle2016-01-221-0/+4
| | | | | | | | | | | | | | | | | | | When a fragment shader is used that has no outputs but does conditional discard (KILL_IF), all fragments are killed without this patch. By comparing various register settings, my conclusion is that the exec mask is either not properly forwarded to the DB by NULL exports or ends up being unused, at least when there is _only_ a NULL export (the ISA documentation claims that NULL exports can be used to override a previously exported exec mask). Of the various approaches I have tried to work around the problem, this one seems to be the least invasive one. v2: take discard by alpha test into account as well Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93761 Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: disable SPI color outputs the shader doesn't writeMarek Olšák2016-01-221-0/+12
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>