summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/radeonsi
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi/gfx9: update the summary of shader stage configsMarek Olšák2017-04-281-4/+9
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: adjust the signature of si_get_vs_prolog_keyMarek Olšák2017-04-281-9/+21
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: separate out VS prolog key generationMarek Olšák2017-04-281-11/+20
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: separate out VS prolog key printingMarek Olšák2017-04-281-19/+29
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: code shuffling in si_emit_derived_tess_stateMarek Olšák2017-04-281-31/+38
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: separate out TGSI initialization of si_shader_contextMarek Olšák2017-04-283-43/+72
| | | | | | so that we can put multiple different TGSI shaders into one module. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/ac: move vertex export remove to common code.Dave Airlie2017-04-273-163/+14
| | | | | | | | | | | This code can be shared by radv, we bump the max to VARYING_SLOT_MAX here, but that shouldn't have too much fallout. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radeonsi: disable the TGSI merge registers passSamuel Pitoiset2017-04-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 47109 shaders in 29632 tests Totals: SGPRS: 1917364 -> 1916620 (-0.04 %) VGPRS: 1165802 -> 1165202 (-0.05 %) Spilled SGPRs: 1880 -> 1843 (-1.97 %) Spilled VGPRs: 70 -> 65 (-7.14 %) Private memory VGPRs: 1184 -> 1184 (0.00 %) Scratch size: 1312 -> 1308 (-0.30 %) dwords per thread Code Size: 60211356 -> 60192268 (-0.03 %) bytes LDS: 1077 -> 1077 (0.00 %) blocks Max Waves: 428597 -> 428674 (0.02 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 238173 -> 237429 (-0.31 %) VGPRS: 149556 -> 148956 (-0.40 %) Spilled SGPRs: 1263 -> 1226 (-2.93 %) Spilled VGPRs: 25 -> 20 (-20.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 20 -> 16 (-20.00 %) dwords per thread Code Size: 10457904 -> 10438816 (-0.18 %) bytes LDS: 50 -> 50 (0.00 %) blocks Max Waves: 41283 -> 41360 (0.19 %) Wait states: 0 -> 0 (0.00 %) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: add PIPE_SHADER_CAP_TGSI_SKIP_MERGE_REGISTERSSamuel Pitoiset2017-04-261-0/+1
| | | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: use unsynchronized transfers for shader binary uploadsSamuel Pitoiset2017-04-261-1/+2
| | | | | | | | | | | Because the buffer is new, it can't be referenced by any CS. This can save few CPU cycles by skipping the whole PIPE_TRANSFER_UNSYNCHRONIZED if in amdgpu_bo_map(). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: turn si_shader_key::mono into a non-unionMarek Olšák2017-04-263-15/+11
| | | | | | | A merged LS-HS shader needs both fix_fetch and inputs_to_copy for compilation. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: adjust ESGS ring buffer size computation on VIMarek Olšák2017-04-261-1/+4
| | | | | Cc: 17.0 17.1 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: don't set deprecated field PARTIAL_ES_WAVE_ONMarek Olšák2017-04-261-2/+3
| | | | | Cc: 17.1 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: set MAX_PRIMGRP_IN_WAVE in the correct registerMarek Olšák2017-04-262-1/+5
| | | | | Cc: 17.1 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: add a workaround for viewing a slice of 3D as a 2D imageMarek Olšák2017-04-261-8/+22
| | | | | Cc: 17.1 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: fix 1D array shader imagesMarek Olšák2017-04-261-0/+1
| | | | | Cc: 17.1 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: fix most things wrong with shader imagesMarek Olšák2017-04-262-12/+24
| | | | | | | | | | | | There are 2 major hw changes: - The address must always point to the address of level 0. GFX9 tiling modes don't allow binding to a non-0 level. - 3D must always be bound as 3D, because 2D and 3D use entirely different tiling modes, and the texture target determines which set of modes is used. Cc: 17.1 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: fix texture buffer objects and image buffers with IDXEN==0Marek Olšák2017-04-261-1/+34
| | | | | Cc: 17.1 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: always flush asynchronously and wait after begin_new_csMarek Olšák2017-04-171-0/+3
| | | | | | | | | | This hides the overhead of everything in the driver after the CS flush and before returning from pipe_context::flush. Only microbenchmarks will benefit. +2% FPS for glxgears. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove local variable 'mod' from si_compile_tgsi_shaderMarek Olšák2017-04-171-5/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add si_shader_selector::vs_needs_prologMarek Olšák2017-04-173-7/+10
| | | | | | cleanup Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't set VGT_GS_MODE as part of the GS stateMarek Olšák2017-04-171-2/+0
| | | | | | The VS state sets it. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't allow user indices with indirect drawsMarek Olšák2017-04-171-4/+4
| | | | | | | Not possible with GL and it will make future gallium rework easier. (also it's something I wouldn't like to support) Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: merge two if (indirect) statementsMarek Olšák2017-04-171-27/+25
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't mark non-dirty textures with CMASK as compressedMarek Olšák2017-04-171-2/+3
| | | | | | | because the compression is skipped with non-dirty textures. Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: cope with missing disassemblyNicolai Hähnle2017-04-141-1/+2
| | | | | | For robustness and testing purposes. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: enable ARB_shader_viewport_layer_arrayNicolai Hähnle2017-04-141-1/+1
| | | | | Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: handle ignored LAYER and VIEWPORT_INDEX writesNicolai Hähnle2017-04-141-0/+20
| | | | | Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* gallium: add PIPE_CAP_TGSI_TES_LAYER_VIEWPORTNicolai Hähnle2017-04-141-0/+1
| | | | | Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: fix gl_BaseVertex in non-indexed drawsNicolai Hähnle2017-04-133-4/+23
| | | | | | | | | | | | | | | | | | | gl_BaseVertex is supposed to be 0 in non-indexed draws. Unfortunately, the way they're implemented, the VGT always generates indices starting at 0, and the VS prolog adds the start index. There's a VGT_INDX_OFFSET register which causes the VGT to start at a driver-defined index. However, this register cannot be written from indirect draws. So fix this unlikely case by setting a bit to tell the VS whether the draw is indexed or not, so that gl_BaseVertex can be adjusted accordingly when used. Fixes a bug in KHR-GL45.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters.* Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: provide VS_STATE input to all VS variantsNicolai Hähnle2017-04-135-27/+18
| | | | | | v2: fix incorrect change in get_tcs_out_patch_stride Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: change the bit-packing of LS out/TCS in dataNicolai Hähnle2017-04-133-9/+14
| | | | | | Avoid conflicts when merging various VS state bits. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: emit VS_STATE register explicitly from si_draw_vboNicolai Hähnle2017-04-136-2/+27
| | | | | | We will merge other derived state information into this register. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: extract derived tess state emit to higher levelNicolai Hähnle2017-04-131-6/+7
| | | | | | | Especially with subsequent changes, this makes it easier to see the sequence of state emits at the higher level. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: drop support for TGSI_SEMANTIC_VERTEXID_NOBASENicolai Hähnle2017-04-131-2/+3
| | | | | | It is unused. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: add new si_check_render_feedback_texture() helperSamuel Pitoiset2017-04-101-45/+44
| | | | | | | For bindless. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: add new si_decompress_color_texture() helperSamuel Pitoiset2017-04-101-13/+17
| | | | | | | For bindless. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: add new depth_needs_decompression() helperSamuel Pitoiset2017-04-101-2/+8
| | | | | | | v2: - rename to depth_needs_decompression() instead Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: add a 'break' in si_check_render_feedback_*()Samuel Pitoiset2017-04-101-2/+6
| | | | | | | No need to check all color buffers. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: re-use 'desc' in si_set_shader_image()Samuel Pitoiset2017-04-101-2/+1
| | | | | | | No need to compute the offset in the descriptor twice. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: enable ARB_shader_ballotNicolai Hähnle2017-04-051-1/+3
| | | | | | | | Require LLVM 5.0 or later because LLVM 4.0 is easily fooled into putting the lane select of llvm.amdgcn.readlane into a VGPR and then fails to continue to compile. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: optimization barriers to work around LLVM deficienciesNicolai Hähnle2017-04-051-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | Notably, llvm.amdgcn.readfirstlane and llvm.amdgcn.icmp may be hoisted out of loops or if/else branches in cases like if (cond) { v = readFirstInvocationARB(x); ... use v ... } else { v = readFirstInvocationARB(x); ... use v ... } ===> v = readFirstInvocationARB(x); if (cond) { ... use v ... } else { ... use v ... } The optimization barrier is a heavy hammer to stop that until LLVM is taught the semantics of the intrinsic properly. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: strengthen emit_optimization_barrierNicolai Hähnle2017-04-052-4/+38
| | | | | | | | | | | | | LLVM will lift inline assembly out of if-else-blocks if both paths have the same inline assembly. Prevent this by adding an irrelevant unique text to the assembly. This requires the LLVM assembly parser to be initialized. Furthermore, allow forcing subsequent computations to happen after the optimization barrier by defining a data dependency. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: emit TGSI_OPCODE_READ_*Nicolai Hähnle2017-04-051-0/+38
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: emit TGSI_OPCODE_BALLOTNicolai Hähnle2017-04-051-0/+18
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: implement TGSI_SEMANTIC_SUBGROUP_*Nicolai Hähnle2017-04-051-0/+40
| | | | | | 64-bit system values are stored as v2i32 to simplify the fetch logic. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: support 64-bit system valuesNicolai Hähnle2017-04-051-4/+20
| | | | | | | For simplicitly, always store system values as 32-bit values or arrays of 32-bit values. 64-bit values are unpacked and packed accordingly. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: bump RADEON_LLVM_MAX_SYSTEM_VALUESNicolai Hähnle2017-04-052-1/+3
| | | | | | | ARB_shader_ballot introduces 7 new system values that can be used in all shader stages. Reviewed-by: Marek Olšák <[email protected]>
* gallium: add PIPE_CAP_TGSI_BALLOTNicolai Hähnle2017-04-051-0/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: enable ARB_sparse_bufferNicolai Hähnle2017-04-051-1/+10
| | | | | | | | v2: - fill in DRM version requirement - disable on SI due to CP DMA faults Reviewed-by: Marek Olšák <[email protected]>