summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/radeonsi/si_shader.h
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: add si_shader_selector::vs_needs_prologMarek Olšák2017-04-171-0/+1
| | | | | | cleanup Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix gl_BaseVertex in non-indexed drawsNicolai Hähnle2017-04-131-0/+2
| | | | | | | | | | | | | | | | | | | gl_BaseVertex is supposed to be 0 in non-indexed draws. Unfortunately, the way they're implemented, the VGT always generates indices starting at 0, and the VS prolog adds the start index. There's a VGT_INDX_OFFSET register which causes the VGT to start at a driver-defined index. However, this register cannot be written from indirect draws. So fix this unlikely case by setting a bit to tell the VS whether the draw is indexed or not, so that gl_BaseVertex can be adjusted accordingly when used. Fixes a bug in KHR-GL45.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters.* Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: provide VS_STATE input to all VS variantsNicolai Hähnle2017-04-131-12/+1
| | | | | | v2: fix incorrect change in get_tcs_out_patch_stride Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: change the bit-packing of LS out/TCS in dataNicolai Hähnle2017-04-131-2/+7
| | | | | | Avoid conflicts when merging various VS state bits. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: emit VS_STATE register explicitly from si_draw_vboNicolai Hähnle2017-04-131-0/+5
| | | | | | We will merge other derived state information into this register. Reviewed-by: Marek Olšák <[email protected]>
* gallium/util: replace pipe_mutex with mtx_tTimothy Arceri2017-03-071-1/+1
| | | | | | pipe_mutex was made unnecessary with fd33a6bcd7f12. Reviewed-by: Marek Olšák <[email protected]>
* radeon/ac: switch from radeon_shader_binary to ac_shader_binaryTimothy Arceri2017-02-281-6/+5
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: skip TESSINNER/OUTER offchip stores if TES doesn't read themMarek Olšák2017-02-211-0/+1
| | | | | | | | | | We were unconditionally storing these outputs, sometimes even one component at a time, but apps never read them in TES. Move the TESSINNER/OUTER buffer stores into the TCS epilog where we can easily disable them on demand. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix UINT/SINT clamping for 10-bit formats on <= CIKNicolai Hähnle2017-02-211-0/+1
| | | | | | | | | | The same PS epilog workaround as for 8-bit integer formats is required, since the CB doesn't do clamping. Fixes GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels*. Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: use SI_MAX_ATTRIBS where it should be usedMarek Olšák2017-02-181-1/+1
| | | | | | for consistency; no change in behavior Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: sort members of si_shader_key::partMarek Olšák2017-02-181-6/+6
| | | | | | and improve some comments Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: have separate LS and ES main shader parts in the shader selectorMarek Olšák2017-02-181-0/+16
| | | | | | | This might reduce the on-demand compilation if the initial VS/LS/ES determination is wrong. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add a workaround for clamping unaligned RGB 8 & 16-bit vertex loadsMarek Olšák2017-02-181-0/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: make fix_fetch an array of uint8_tMarek Olšák2017-02-181-3/+2
| | | | | | so that we can add 3-component fallbacks. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: implement legacy GL_DOUBLE vertex formatsMarek Olšák2017-02-141-0/+4
| | | | | | so that we can disable u_vbuf for GL core profiles. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: write shader asm annotated with wave info into GPU hang reportsMarek Olšák2017-02-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note that the disassembly is written twice - first the unmodified compiler output and then the wave-annotated output only if there are waves executing the shader. Sample output from a real GPU hang most likely caused by image_sample: The number of active waves = 28 Pixel Shader - annotated disassembly: s_mov_b64 s[6:7], exec ; BE86017E [PC=0x10f3e3800, off=0, size=4] s_wqm_b64 exec, exec ; BEFE077E [PC=0x10f3e3804, off=4, size=4] ... image_sample v[7:9], v[0:1], s[12:19], s[20:23] dmask:0x7 ; F0800700 00A30700 [PC=0x10f3e3a94, off=660, size=8] s_buffer_load_dword s20, s[0:3], 0x50 ; C0220500 00000050 [PC=0x10f3e3a9c, off=668, size=8] s_load_dwordx4 s[24:27], s[4:5], 0x170 ; C00A0602 00000170 [PC=0x10f3e3aa4, off=676, size=8] s_load_dwordx8 s[12:19], s[4:5], 0x140 ; C00E0302 00000140 [PC=0x10f3e3aac, off=684, size=8] s_buffer_load_dword s11, s[0:3], 0x5c ; C02202C0 0000005C [PC=0x10f3e3ab4, off=692, size=8] s_buffer_load_dword s21, s[0:3], 0x54 ; C0220540 00000054 [PC=0x10f3e3abc, off=700, size=8] s_buffer_load_dword s22, s[0:3], 0x58 ; C0220580 00000058 [PC=0x10f3e3ac4, off=708, size=8] s_waitcnt vmcnt(0) ; BF8C0F70 [PC=0x10f3e3acc, off=716, size=4] ^ SE0 SH0 CU1 SIMD1 WAVE0 EXEC=aaaaaaa555aaaaaa INST32=BF8C0F70 ^ SE0 SH0 CU1 SIMD2 WAVE0 EXEC=aaaa85555555552a INST32=BF8C0F70 ^ SE0 SH0 CU1 SIMD3 WAVE0 EXEC=000000000000000a INST32=BF8C0F70 ^ SE0 SH0 CU6 SIMD1 WAVE0 EXEC=25a5a5aa82aaaaaa INST32=BF8C0F70 ^ SE0 SH0 CU6 SIMD3 WAVE0 EXEC=50aaaa8fffa55555 INST32=BF8C0F70 ^ SE0 SH0 CU7 SIMD0 WAVE0 EXEC=5554aaaaaaa1a555 INST32=BF8C0F70 ^ SE0 SH0 CU7 SIMD0 WAVE1 EXEC=aaaa5555ffffffff INST32=BF8C0F70 ^ SE0 SH0 CU7 SIMD1 WAVE0 EXEC=555557aaaaaaaaa5 INST32=BF8C0F70 ^ SE0 SH0 CU7 SIMD3 WAVE0 EXEC=5555aaaaaaaaaa85 INST32=BF8C0F70 ^ SE1 SH0 CU3 SIMD1 WAVE0 EXEC=aaaaaaaaaaaaaaaa INST32=BF8C0F70 ^ SE1 SH0 CU4 SIMD0 WAVE0 EXEC=aaaaaaaa5a5a5a5a INST32=BF8C0F70 ^ SE1 SH0 CU4 SIMD1 WAVE0 EXEC=aaaaaaa5a5a5a4a5 INST32=BF8C0F70 ^ SE1 SH0 CU4 SIMD2 WAVE0 EXEC=5555555000000000 INST32=BF8C0F70 ^ SE1 SH0 CU4 SIMD3 WAVE0 EXEC=aa555554155aaaaa INST32=BF8C0F70 ^ SE1 SH0 CU5 SIMD0 WAVE0 EXEC=55ffff55555555aa INST32=BF8C0F70 ^ SE1 SH0 CU5 SIMD1 WAVE0 EXEC=555555555aaaaaaa INST32=BF8C0F70 ^ SE1 SH0 CU5 SIMD2 WAVE0 EXEC=a0aaaaaaa8555555 INST32=BF8C0F70 ^ SE1 SH0 CU5 SIMD3 WAVE0 EXEC=8aaaaaaaaaaaa555 INST32=BF8C0F70 ^ SE1 SH0 CU6 SIMD0 WAVE0 EXEC=000000002aaaaaaa INST32=BF8C0F70 ^ SE2 SH0 CU1 SIMD0 WAVE0 EXEC=5aaaa5400aaaa15a INST32=BF8C0F70 ^ SE2 SH0 CU1 SIMD1 WAVE0 EXEC=00aaaaaaaa5555aa INST32=BF8C0F70 ^ SE2 SH0 CU1 SIMD2 WAVE0 EXEC=aa00005555554555 INST32=BF8C0F70 ^ SE2 SH0 CU1 SIMD3 WAVE0 EXEC=aaaaaaa000000000 INST32=BF8C0F70 ^ SE3 SH0 CU4 SIMD0 WAVE0 EXEC=5555aaaaaaaaaaaa INST32=BF8C0F70 ^ SE3 SH0 CU4 SIMD2 WAVE0 EXEC=ffaaaaaaaaaa5555 INST32=BF8C0F70 ^ SE3 SH0 CU4 SIMD3 WAVE0 EXEC=aaaa55555555aa00 INST32=BF8C0F70 ^ SE3 SH0 CU5 SIMD0 WAVE0 EXEC=00aaaaaaaaaaaa5a INST32=BF8C0F70 ^ SE3 SH0 CU5 SIMD1 WAVE0 EXEC=5a555555005555ff INST32=BF8C0F70 v_mul_f32_e32 v7, s6, v7 ; 0A0E0E06 [PC=0x10f3e3ad0, off=720, size=4] ... Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use the correct target machine when building shader variantsMarek Olšák2017-01-181-0/+2
| | | | | | | | | | If the shader selector is created with a different context than the shader variant, we should use the calling context's target machine for the shader variant. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99419 Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move shader pipe context state into a separate structureMarek Olšák2017-01-181-6/+14
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: implement GL_FIXED vertex formatMarek Olšák2017-01-161-0/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: implement 32-bit SNORM/UNORM/SSCALED/USCALED vertex formatsMarek Olšák2017-01-161-3/+9
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: make fix_fetch 64-bitMarek Olšák2017-01-161-2/+2
| | | | | | v2: add u_bit_consecutive64 Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: cleanly communicate whether si_shader_dump should check R600_DEBUGMarek Olšák2017-01-091-1/+1
| | | | | | Tested-by: Edmondo Tommasina <[email protected]> Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: apply a multi-wave workgroup SPI bug workaround to affected CIK chipsMarek Olšák2016-12-011-0/+2
| | | | | | | All codepaths are handled except for clover. Cc: 13.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: count and report temp arrays in scratch separatelyMarek Olšák2016-11-291-0/+1
| | | | | | v2: only do this if debug output of shader dumping is enabled Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
* radeonsi: eliminate VS outputs that aren't used by PS at runtimeMarek Olšák2016-11-211-4/+3
| | | | | | | | | | | | | | | | | | A past commit added the ability to compile "optimized" shader variants asynchronously (not stalling the app). This commit builds upon that and adds what is basically a runtime shader linker. If a VS output isn't used by the currently-bound PS, a new VS compilation is started without that output. The new shader variant is used when it's ready. All apps using separate shader objects I've seen had unused VS outputs. Eliminating unused/useless VS outputs also eliminates the corresponding vertex attribute loads. Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: record information about all written and read varyingsMarek Olšák2016-11-211-3/+7
| | | | | | | It's just tgsi_shader_info with DEFAULT_VAL varyings removed. Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't export ClipVertex and ClipDistance[] if clipping is disabledMarek Olšák2016-11-211-2/+3
| | | | | | | | | This is the first user of optimized monolithic shader variants. Cull distances can't be disabled by states. Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add infrastr. for compiling optimized shader variants asynchronouslyMarek Olšák2016-11-211-0/+7
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: simplify checking for monolithic compilationMarek Olšák2016-11-211-0/+1
| | | | | Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: split the shader key into 3 logical partsMarek Olšák2016-11-211-26/+39
| | | | | | | | | key->part.*: prolog and epilog flags only key->as_{ls,es}: special flags key->mono.*: flags for monolithic compilation only Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix vertex fetches for 2_10_10_10 formatsNicolai Hähnle2016-11-041-0/+11
| | | | | | | | | | | The hardware always treats the alpha channel as unsigned, so add a shader workaround. This is rare enough that we'll just build a monolithic vertex shader. The SINT case cannot actually happen in OpenGL, but I've included it for completeness since it's just a mix of the other cases. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: generate GS prolog to (partially) fix triangle strip adjacency ↵Nicolai Hähnle2016-11-031-0/+10
| | | | | | | | | | | | | | | | | | | | rotation Fixes GL45-CTS.geometry_shader.adjacency.adjacency_indiced_triangle_strip and others. This leaves the case of triangle strips with adjacency and primitive restarts open. It seems that the only thing that cares about that is a piglit test. Fixing this efficiently would be really involved, and I don't want to use the hammer of degrading to software handling of indices because there may well be software that uses this draw mode (without caring about the precise rotation of triangles). v2: - skip the GS prolog entirely if workaround is not needed - only check for TES (TES is always non-null when tessellation is used) Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: make the GS copy shader owned by the GS selectorNicolai Hähnle2016-11-031-1/+8
| | | | | | | The copy shader only depends on the selector. This change avoids creating separate code paths for monolithic vs. non-monolithic geometry shaders. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: eliminate trivial constant VS outputsMarek Olšák2016-10-191-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These constant value VS PARAM exports: - 0,0,0,0 - 0,0,0,1 - 1,1,1,0 - 1,1,1,1 can be loaded into PS inputs using the DEFAULT_VAL field, and the VS exports can be removed from the IR to save export & parameter memory. After LLVM optimizations, analyze the IR to see which exports are equal to the ones listed above (or undef) and remove them if they are. Targeted use cases: - All DX9 eON ports always clear 10 VS outputs to 0.0 even if most of them are unused by PS (such as Witcher 2 below). - VS output arrays with unused elements that the GLSL compiler can't eliminate (such as Batman below). The shader-db deltas are quite interesting: (not from upstream si-report.py, it won't be upstreamed) PERCENTAGE DELTAS Shaders PARAM exports (affected only) batman_arkham_origins 589 -67.17 % bioshock-infinite 1769 -0.47 % dirt-showdown 548 -2.68 % dota2 1747 -3.36 % f1-2015 776 -4.94 % left_4_dead_2 1762 -0.07 % metro_2033_redux 2670 -0.43 % portal 474 -0.22 % talos_principle 324 -3.63 % warsow 176 -2.20 % witcher2 1040 -73.78 % ---------------------------------------- All affected 991 -65.37 % ... 9681 -> 3353 ---------------------------------------- Total 26725 -10.82 % ... 58490 -> 52162 v2: treat Undef as both 0 and 1 Reviewed-by: Nicolai Hähnle <[email protected]> (v1) Tested-by: Edmondo Tommasina <[email protected]> (v1)
* radeonsi: adjust and clean up Z_ORDER and EXEC_ON_x settingsMarek Olšák2016-10-131-1/+0
| | | | | | | The table was copied from the Vulkan driver. The comment lines are as long as the table for cosmetic reasons. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: support ARB_compute_variable_group_sizeNicolai Hähnle2016-10-101-1/+3
| | | | | | | | Not sure if it's possible to avoid programming the block size twice (once for the userdata and once for the dispatch). Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: clean up lucky #include dependenciesMarek Olšák2016-10-041-32/+2
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: export SampleMask from pixel shaders at full rateMarek Olšák2016-09-131-0/+2
| | | | | | | Heaven and Valley write gl_SampleMask and not Z. Use 16_ABGR instead of 32_ABGR if Z isn't written. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add DRAWID parameter to vertex shadersNicolai Hähnle2016-08-091-1/+3
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: pre-generate shader logs for ddebugMarek Olšák2016-07-261-0/+7
| | | | | | | | This cuts down the overhead of si_dump_shader when ddebug is capturing shader logs, which is done for every draw call unconditionally (that's quite a lot of work for a draw call). Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move the shader key dumping to si_shader_dumpMarek Olšák2016-07-261-1/+0
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: report accurate SGPR and VGPR spillsMarek Olšák2016-07-131-0/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: do compilation from si_create_shader_selector asynchronouslyMarek Olšák2016-07-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Main shader parts and geometry shaders are compiled asynchronously by util_queue. si_create_shader_selector doesn't wait and returns. si_draw_vbo(si_shader_select) waits for completion. This has the best effect when shaders are compiled at app-loading time. It doesn't help much for shaders compiled on demand, even though VS+PS compilation should take as much as time as the bigger one of the two. If an app creates more shaders, at most 4 threads will be used to compile them. Debug output disables this for shader stats to be printed in the correct order. (We could go even further and build variants asynchronously too, then emit draw calls without waiting and emit incomplete shader states, then force IB chaining to give the compiler more time, then sync the compilation at the IB flush and patch the IB with correct shader states. This is great for compilation before draw calls, but there are some difficulties such as scratch and tess states requiring the compiler output, and an on-disk shader cache will likely be a much better and simpler solution.) Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: separate the compilation chunk of si_create_shader_selectorMarek Olšák2016-07-051-0/+7
| | | | | | | The function interface is ready to be used by util_queue. Also, si_shader_select_with_key can no longer accept si_context. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't interpolate colors if flatshading is enabledMarek Olšák2016-07-051-1/+1
| | | | | | use v_interp_mov for those Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: enable the barycentric optimization in all casesMarek Olšák2016-07-051-5/+2
| | | | | | | | Handle the bc_optimize SGPR bit if both CENTER and CENTROID are enabled. This should increase the PS launch rate for big primitives with MSAA. Based on discussion with SPI guys. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: compute only one set of interpolation (i,j) when MSAA is disabledMarek Olšák2016-07-051-2/+2
| | | | | | | This should increase the PS launch rate for shaders using at least 2 pairs of perspective (i,j) and same for linear. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: split ps.prolog.force_persample_interp into persp and linear bitsMarek Olšák2016-07-051-1/+2
| | | | | | This reduces the number of v_mov's in the prolog. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: enable WQM in PS prolog when neededNicolai Hähnle2016-06-071-0/+1
| | | | | | | | | | | | WQM is needed when the PS prolog computes a VGPR that is consumed by a shader with (implicit or explicit) derivatives. Depends on http://reviews.llvm.org/D20839 / LLVM r272063 for this to be effective (otherwise it's just a no-op). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130 Cc: 12.0 <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Remove LDS layout user SGPR's from TES.Bas Nieuwenhuizen2016-05-261-7/+8
| | | | | | | | They are unused. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>