summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/radeonsi
Commit message (Collapse)AuthorAgeFilesLines
* gallium: support for native fence fd'sRob Clark2016-12-011-0/+1
| | | | | | | This enables gallium support for EGL_ANDROID_native_fence_sync, for drivers which support PIPE_CAP_NATIVE_FENCE_FD. Signed-off-by: Rob Clark <[email protected]>
* radeonsi: add a tess+GS hang workaround for VI dGPUsMarek Olšák2016-12-011-2/+10
| | | | | | | ported from Vulkan Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't apply the Z export bug workaround to HainanMarek Olšák2016-12-011-2/+3
| | | | | | not needed Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: apply a tessellation bug workaround for SIMarek Olšák2016-12-011-0/+7
| | | | | Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: apply a TC L1 write corruption workaround for SIMarek Olšák2016-12-011-11/+23
| | | | | Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: apply a multi-wave workgroup SPI bug workaround to affected CIK chipsMarek Olšák2016-12-014-4/+29
| | | | | | | All codepaths are handled except for clover. Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: consolidate max-work-group-size computationMarek Olšák2016-12-011-24/+19
| | | | | | | The next commit will need this. Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add PIPE_CAP_TGSI_CAN_READ_OUTPUTSNicolai Hähnle2016-11-301-0/+1
| | | | | | | | | | | Drivers that support this benefit by saving one lowering pass in the GLSL-to-TGSI conversion. radeonsi already supports this because all outputs are stored in temporary variables before the export (except for TCS outputs, which have always been readable in TGSI anyway due to their special semantics). Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: don't fetch 8 dwords for samplerBuffer and imageBufferMarek Olšák2016-11-291-51/+43
| | | | | | | | | | | | | | The compiler doesn't shrink s_load_dwordx8, so we always wasted 4 SGPRs. Also, the extraction of the descriptor created some really ugly asm code with lots of VALU bitwise ops and v_readfirstlane. Totals from *affected* shaders: SGPRS: 13880 -> 13253 (-4.52 %) VGPRS: 15200 -> 15088 (-0.74 %) Code Size: 499864 -> 459816 (-8.01 %) bytes Max Waves: 1554 -> 1564 (0.64 %) Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: disable XNACK to free 2 SGPRs on APUsMarek Olšák2016-11-291-1/+1
| | | | | | My LLVM commit disables it for dGPUs, but not APUs. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: count and report temp arrays in scratch separatelyMarek Olšák2016-11-292-4/+40
| | | | | | v2: only do this if debug output of shader dumping is enabled Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
* radeonsi: don't try to eliminate trivial VS outputs for PS and CSMarek Olšák2016-11-291-1/+4
| | | | | | PS and CS don't have any param exports, so it's a no-op. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: disable RB+ blend optimizations for dual source blendingMarek Olšák2016-11-291-0/+11
| | | | | | | | This fixes dual source blending on Stoney. The fix was copied from Vulkan. The problem was discovered during internal testing. Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set CB_BLEND1_CONTROL.ENABLE for dual source blendingMarek Olšák2016-11-291-0/+4
| | | | | | | copied from Vulkan Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: always set all blend registersMarek Olšák2016-11-291-5/+5
| | | | | | | better safe than sorry Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set the smallest possible CB_TARGET_MASKMarek Olšák2016-11-291-5/+5
| | | | | | better safe than sorry; set_framebuffer_state always makes this dirty Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't print bodies of header-only packetsMarek Olšák2016-11-291-0/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: print unknown registers with correct formattingMarek Olšák2016-11-291-1/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: print new opt flags in si_dump_shader_keyMarek Olšák2016-11-231-0/+9
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add a debug flag that disables optimized shader variantsMarek Olšák2016-11-231-0/+5
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* util: import CRC32 implementation from galliumMarek Olšák2016-11-221-1/+1
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi: remove all varyings for depth-only rendering or rasterization offMarek Olšák2016-11-213-1/+21
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: eliminate VS outputs that aren't used by PS at runtimeMarek Olšák2016-11-213-9/+61
| | | | | | | | | | | | | | | | | | A past commit added the ability to compile "optimized" shader variants asynchronously (not stalling the app). This commit builds upon that and adds what is basically a runtime shader linker. If a VS output isn't used by the currently-bound PS, a new VS compilation is started without that output. The new shader variant is used when it's ready. All apps using separate shader objects I've seen had unused VS outputs. Eliminating unused/useless VS outputs also eliminates the corresponding vertex attribute loads. Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: record information about all written and read varyingsMarek Olšák2016-11-213-3/+98
| | | | | | | It's just tgsi_shader_info with DEFAULT_VAL varyings removed. Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: make si_shader_io_get_unique_index stricterMarek Olšák2016-11-212-11/+14
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't export ClipVertex and ClipDistance[] if clipping is disabledMarek Olšák2016-11-214-5/+37
| | | | | | | | | This is the first user of optimized monolithic shader variants. Cull distances can't be disabled by states. Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add infrastr. for compiling optimized shader variants asynchronouslyMarek Olšák2016-11-212-34/+109
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't set vs.epilog.export_prim_id if TES is boundMarek Olšák2016-11-211-4/+4
| | | | | | | there is no VS epilog in this case Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: simplify checking for monolithic compilationMarek Olšák2016-11-214-8/+9
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: print all flags in si_dump_shader_keyMarek Olšák2016-11-211-0/+5
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: split the shader key into 3 logical partsMarek Olšák2016-11-215-194/+203
| | | | | | | | | key->part.*: prolog and epilog flags only key->as_{ls,es}: special flags key->mono.*: flags for monolithic compilation only Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix culling if clip & cull distances are used at the same timeMarek Olšák2016-11-211-2/+3
| | | | | | | | | Fixed piglits: - arb_cull_distance/clip-cull-3 - arb_cull_distance/clip-cull-4 Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: clean up si_emit_clip_regsMarek Olšák2016-11-211-4/+5
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: assume that a VS without POSITION is LSMarek Olšák2016-11-211-0/+7
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: decrease the number of texture slots to 24Marek Olšák2016-11-211-1/+1
| | | | | | | | | Company Of Heroes 2 needs only 24. This saves 512 bytes of CE RAM per shader stage. Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fast exit si_emit_derived_tess_state earlyMarek Olšák2016-11-212-11/+15
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: check for !is_linear in do_hardware_msaa_resolveMarek Olšák2016-11-211-2/+4
| | | | | | | We don't want opt4Space here. Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: Add missing error-checking to si_create_compute_state (v2)Mun Gwan-gyeong2016-11-211-1/+5
| | | | | | | | | | | | | | | | When the uploading of shader fails on si_shader_binary_upload(), it returns -ENOMEM. We should handle si_shader_binary_upload() failure path on si_create_compute_state(). CID 1394027 v2: Fixes from Edward O'Callaghan's review a) Update explicitly return value check with "si_shader_binary_upload() < 0" b) Update commit message. Signed-off-by: Mun Gwan-gyeong <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* radeonsi: Fix resource leak in gs_copy_shader allocation failure pathGwan-gyeong Mun2016-11-221-1/+7
| | | | | | | | | CID 1394028 Signed-off-by: Mun Gwan-gyeong <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: store group_size_variable in struct si_computeNicolai Hähnle2016-11-211-5/+8
| | | | | | | | | | | | | | For compute shaders, we free the selector after the shader has been compiled, so we need to save this bit somewhere else. Also, make sure that this type of bug cannot re-appear, by NULL-ing the selector pointer after we're done with it. This bug has been there since the feature was added, but was only exposed in piglit arb_compute_variable_group_size-local-size by commit 9bfee7047b70cb0aa026ca9536465762f96cb2b1 (which is totally unrelated). Cc: 13.0 <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: emit sample locations also when nr_samples == 1Nicolai Hähnle2016-11-181-1/+4
| | | | | | | | | | | | Since the state tracker now enables MSAA in the hardware for the case nr_samples == 1 as well, we need to set sample locations correctly for this case. The Polaris override is still needed for the non-MSAA case (when nr_samples == 0). Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: allow sample mask export for single-sample framebuffersNicolai Hähnle2016-11-181-4/+5
| | | | | | | This fixes GL45-CTS.sample_variables.mask.*.samples_1.*. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: fix a subtle bounds checking corner case with 3-component attributesNicolai Hähnle2016-11-163-2/+39
| | | | | | | | I'm also sending out a piglit test, gl-2.0/vertexattribpointer-size-3, which exposes this corner case. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: reject some 3-component formats as buffer texturesNicolai Hähnle2016-11-161-8/+35
| | | | | | | Fixes parts of GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo. Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: set IF_THRESHOLD to 3Marek Olšák2016-11-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Piglit regressions (radeonsi or LLVM bugs, they pass on softpipe): - glsl-1.10/execution/variable-indexing/vs-output-array-vec3-index-wr - glsl-1.10/execution/variable-indexing/vs-output-array-vec4-index-wr - glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-col-row-wr - glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-row-wr Totals: SGPRS: 1132185 -> 1168801 (3.23 %) VGPRS: 907856 -> 906204 (-0.18 %) Spilled SGPRs: 2011 -> 2425 (20.59 %) Spilled VGPRs: 368 -> 96 (-73.91 %) Scratch VGPRs: 1344 -> 1060 (-21.13 %) dwords per thread Code Size: 35916164 -> 35705372 (-0.59 %) bytes LDS: 767 -> 767 (0.00 %) blocks Max Waves: 194010 -> 194921 (0.47 %) Wait states: 0 -> 0 (0.00 %) Before: VGPR SPILLING APPS Shaders SpillVGPR ScratchVGPR alien_isolation 2938 38 40 bioshock-infinite 1769 245 732 dirt-showdown 548 85 72 f1-2015 776 0 320 ue4_lightroom_inter.. 74 0 180 After: VGPR SPILLING APPS Shaders SpillVGPR ScratchVGPR alien_isolation 2938 38 40 bioshock-infinite 1769 0 480 dirt-showdown 548 58 40 f1-2015 776 0 320 ue4_lightroom_inter.. 74 0 180 Bioshock and DiRT benefit. If I set IF_THRESHOLD=4, tesseract starts spilling VGPRs Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add PIPE_SHADER_CAP_LOWER_IF_THRESHOLDMarek Olšák2016-11-151-0/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set unsafe fpmath on FP instructions when allowed by R600_DEBUGMarek Olšák2016-11-151-1/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fold some shader context initialization to si_llvm_context_initMarek Olšák2016-11-153-29/+30
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: accept is_store in image_fetch_rsrc instead of dcc_offMarek Olšák2016-11-101-4/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't rely on tgsi_scan::images_buffersMarek Olšák2016-11-101-8/+11
| | | | | | the instruction knows the target Reviewed-by: Nicolai Hähnle <[email protected]>