aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/radeonsi/si_shader.h
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: Use signed char for color_interp_vgpr_indexTimothy Pearson2018-07-181-1/+1
| | | | | | | | | | | | color_interp_vgpr_index was declared as a generic char value. Because signed values are used in this variable, the result was not safe across architectures and crashed on ppc64[el] and arm. Declare color_interp_vgpr_index as a signed type. Signed-off-by: Timothy Pearson <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* radeonsi: rename si_compiler -> ac_llvm_compilerDave Airlie2018-07-041-11/+5
| | | | | | | | As precursor to moving init to common code, just rename the struct and move it. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac: move all LLVM module initialization into ac_create_moduleMarek Olšák2018-07-021-2/+0
| | | | | | This removes some ugly code around module initialization. Reviewed-by: Dave Airlie <[email protected]>
* radeonsi: implement vertex color clamping for tess and GSMarek Olšák2018-06-281-1/+1
|
* radeonsi: move VS_STATE_SGPR before draw SGPRsMarek Olšák2018-06-281-3/+6
| | | | for vertex color clamping.
* radeonsi: store compute local_size into tgsi_shader_infoMarek Olšák2018-06-281-3/+3
| | | | This is kinda a hack, but it's enough for the shader cache.
* radeonsi: clean up passing the is_monolithic flag for compilationMarek Olšák2018-06-251-1/+0
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi: fix passing gl_ClipVertex for GS and tessMarek Olšák2018-05-251-1/+1
| | | | | | Also add the fprintf call. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix color inputs/outputs for GS and tessMarek Olšák2018-05-251-2/+4
| | | | | | | | | | GS is tested, tessellation is untested. Have outputs_written_before_ps for HW VS and outputs_written for other stages. The reason is that COLOR and BCOLOR alias for HW VS, which drives elimination of VS outputs based on PS inputs. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move data_layout into si_compilerMarek Olšák2018-04-271-0/+1
| | | | | | Reviewed-by: Timothy Arceri <[email protected]> Tested-by: Benedikt Schemmer <ben at besd.de> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move passmgr into si_compilerMarek Olšák2018-04-271-0/+1
| | | | | | Reviewed-by: Timothy Arceri <[email protected]> Tested-by: Benedikt Schemmer <ben at besd.de> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move target_library_info into si_compilerMarek Olšák2018-04-271-0/+1
| | | | | | Reviewed-by: Timothy Arceri <[email protected]> Tested-by: Benedikt Schemmer <ben at besd.de> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add triple into si_compilerMarek Olšák2018-04-271-0/+1
| | | | | | Reviewed-by: Timothy Arceri <[email protected]> Tested-by: Benedikt Schemmer <ben at besd.de> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add struct si_compiler containing LLVMTargetMachineRefMarek Olšák2018-04-271-4/+9
| | | | | | | | It will contain more variables. Reviewed-by: Timothy Arceri <[email protected]> Tested-by: Benedikt Schemmer <ben at besd.de> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove r600_pipe_common.hMarek Olšák2018-04-271-1/+6
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: update copyrightsMarek Olšák2018-04-051-0/+1
| | | | Acked-by: Timothy Arceri <[email protected]>
* radeonsi: implement GL_KHR_blend_equation_advancedMarek Olšák2018-04-021-0/+3
| | | | | | | MSAA is supported using sample shading. Layered rendering and all texture targets are also supported. Tested-by: Dieter Nützel <[email protected]>
* radeonsi: remove chip_class parameter from si_lower_nirMarek Olšák2018-03-081-3/+1
| | | | | | | We can get it from si_screen. Reviewed-by: Timothy Arceri <[email protected]> Acked-by: Alex Deucher <[email protected]>
* radeonsi/nir: call ac_lower_indirect_derefs()Timothy Arceri2018-03-051-1/+1
| | | | | | | | Fixes piglit tests: tests/spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec3-index-rd.shader_test tests/spec/glsl-1.50/execution/geometry/max-input-components.shader_test Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: add chip class to compiler_ctx_stateTimothy Arceri2018-03-051-0/+2
| | | | | | This will be used in the following patch. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: remove 2 unused user SGPRs from merged TES-GS with 32-bit pointersMarek Olšák2018-02-261-1/+8
| | | | | | The effect of the last 13 commits on user SGPR counts: Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: make SI_SGPR_VERTEX_BUFFERS the last user SGPR inputMarek Olšák2018-02-261-5/+4
| | | | | | | | so that it can be removed and replaced with inline VBO descriptors, and the pointer can be packed in unused bits of VBO descriptors. This also removes the pointer from merged TES-GS where it's useless. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move tess ring address into TCS_OUT_LAYOUT, removes 2 TCS user SGPRsMarek Olšák2018-02-241-5/+1
| | | | | | | TCS_OUT_LAYOUT has 13 unused bits. That's enough for a 32-bit address aligned to 512KB. Hey, it's a 13-bit pointer! Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move 2nd-shader descriptor pointers into s[0:1]Marek Olšák2018-02-241-23/+14
| | | | | | | | | | | If 32-bit pointers are supported, both pointers can be moved into s[0:1] and then ESGS has exactly the same user data SGPR declarations as VS. If 32-bit pointers are not supported, only one pointer can be moved into s[0:1]. In that case, the 2nd pointer is moved before TCS constants, so that the location is the same in HS and GS. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: implement 32-bit pointers in user data SGPRs (v2)Marek Olšák2018-02-171-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | User SGPRs changes: VS: 14 -> 9 TCS: 14 -> 10 TES: 10 -> 6 GS: 8 -> 4 GSCOPY: 2 -> 1 PS: 9 -> 5 Merged VS-TCS: 24 -> 16 Merged VS-GS: 18 -> 11 Merged TES-GS: 18 -> 11 SGPRS: 2170102 -> 2158430 (-0.54 %) VGPRS: 1645656 -> 1641516 (-0.25 %) Spilled SGPRs: 9078 -> 8810 (-2.95 %) Spilled VGPRs: 130 -> 114 (-12.31 %) Scratch size: 1508 -> 1492 (-1.06 %) dwords per thread Code Size: 52094872 -> 52692540 (1.15 %) bytes Max Waves: 371848 -> 372723 (0.24 %) v2: - the shader cache needs to take address32_hi into account - set amdgpu-32bit-address-high-bits Reviewed-by: Samuel Pitoiset <[email protected]> (v1)
* radeonsi: print shader-db stats for main parts, not final binariesMarek Olšák2018-01-311-0/+2
| | | | | | This is needed to get shader-db stats for LS,HS,ES,GS stages on gfx9. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move max_simd_waves computation into a separate functionMarek Olšák2018-01-311-0/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add dummy implementation of si_nir_scan_tess_ctrl()Timothy Arceri2018-01-051-0/+3
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: make use of ac_get_spi_shader_z_format()Samuel Pitoiset2017-12-141-2/+0
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: clarify that si_shader_selector::esgs_itemsize is set for the ES partNicolai Hähnle2017-11-281-1/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/gfx9: fix VM fault with fetched instance divisorsNicolai Hähnle2017-11-201-3/+1
| | | | | | | | | We need to account for SGPR locations in merged shaders. This case is exercised by KHR-GL45.enhanced_layouts.vertex_attrib_locations Fixes: 79c2e7388c7f ("radeonsi/gfx9: use SPI_SHADER_USER_DATA_COMMON") Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: use ready fences on all shaders, not just optimized onesNicolai Hähnle2017-11-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a race condition between si_shader_select_with_key and si_bind_XX_shader: Thread 1 Thread 2 -------- -------- si_shader_select_with_key begin compiling the first variant (guarded by sel->mutex) si_bind_XX_shader select first_variant by default as state->current si_shader_select_with_key match state->current and early-out Since thread 2 never takes sel->mutex, it may go on rendering without a PM4 for that shader, for example. The solution taken by this patch is to broaden the scope of shader->optimized_ready to a fence shader->ready that applies to all shaders. This does not hurt the fast path (if anything it makes it faster, because we don't explicitly check is_optimized). It will also allow reducing the scope of sel->mutex locks, but this is deferred to a later commit for better bisectability. Fixes dEQP-EGL.functional.sharing.gles2.multithread.simple.buffers.bufferdata_render Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: remove 'Authors:' commentsMarek Olšák2017-11-021-5/+0
| | | | | | | It's inaccurate. Instead, see the copyright and use "git log" and "git blame" to know the authorship. Acked-by: Nicolai Hähnle <[email protected]>
* radeonsi: use postponed KILL only when derivatives are usedMarek Olšák2017-10-241-0/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: if there's just const buffer 0, set it in place of CONST/SSBO pointerMarek Olšák2017-10-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SI_SGPR_CONST_AND_SHADER_BUFFERS now contains the pointer to const buffer 0 if there is no other buffer there. Benefits: - there is no constbuf descriptor upload and shader load It's assumed that all constant addresses are within bounds. Non-constant addresses are clamped against the last declared CONST variable. This only works if the state tracker ensures the bound constant buffer matches what the shader needs. Once we get 32-bit pointers, we can only do this for user constant buffers where the driver is in charge of the upload so that it can guarantee a 32-bit address. The real performance benefit might not be measurable. These apps get 100% theoretical benefit in all shaders (except where noted): - antichamber - barman arkham origins - borderlands 2 - borderlands pre-sequel - brutal legend - civilization BE - CS:GO - deadcore - dota 2 -- most shaders - europa universalis - grid autosport -- most shaders - left 4 dead 2 - legend of grimrock - life is strange - payday 2 - portal - rocket league - serious sam 3 bfe - talos principle - team fortress 2 - thea - unigine heaven - unigine valley -- also sanctuary and tropics - wasteland 2 - xcom: enemy unknown & enemy within - tesseract - unity (engine) Changed stats only: SGPRS: 2059998 -> 2086238 (1.27 %) VGPRS: 1626888 -> 1626904 (0.00 %) Spilled SGPRs: 7902 -> 7865 (-0.47 %) Code Size: 60924520 -> 60982660 (0.10 %) bytes Max Waves: 374539 -> 374526 (-0.00 %) Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add VS blit shader creationMarek Olšák2017-10-071-0/+12
| | | | | | no users yet Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: hard-code pixel center for interpolateAtSample without multisample ↵Nicolai Hähnle2017-09-131-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | buffers The GLSL rules for interpolateAtSample are unfortunate: "Returns the value of the input interpolant variable at the location of sample number sample. If multisample buffers are not available, the input variable will be evaluated at the center of the pixel. If sample sample does not exist, the position used to interpolate the input variable is undefined." This fix will fallback to monolithic shader compilation when interpolateAtSample is used without multisampling. One alternative would be to always upload 16 sample positions, filling the buffer up with repetition when the actual number of samples is less, and then ANDing the sample ID with 0xf. However, that punishes all well-behaving users of interpolateAtSample, when in reality, only conformance tests should be affected by the issue. Fixes dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.non_multisample_buffer.* Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: apply a mask to gl_SampleMaskIn in the PS prologNicolai Hähnle2017-09-131-1/+4
| | | | | | | | | | | | | gl_SampleMaskIn is supposed to contain set bits only for the samples that are covered by the current fragment shader invocation, but the VGPR initialization hardware loads the set of all bits that are covered at the current pixel. Fixes various tests in dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.* Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: optimize TCS epilog when invocation 0 writes tess factorsMarek Olšák2017-09-111-0/+2
| | | | | | | | | | This removes the barrier and LDS stores and loads for tess factors when it's possible. The removal of the barrier seems more important to me though. In one shader, it removes 17 * 4 bytes from the shader binary. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: proper workaround for LS/HS VGPR initialization bugNicolai Hähnle2017-09-061-0/+1
| | | | | | | | | | | | | | | | | | | When the HS wave is empty, the hardware writes the LS VGPRs starting at v0 instead of v2. Workaround by shifting them back into place when necessary. For simplicity, this is always done in the LS prolog. According to the hardware team, this will be fixed in future chips, so take that into account already. Note that this is not a bug fix, as the bug was already worked around by commit 166823bfd26 ("radeonsi/gfx9: add a temporary workaround for a tessellation driver bug"). This change merely replaces the workaround by one that should be better. v2: add workaround code to shader only when necessary v3: clarify the prefer_mono comment Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: declare new user SGPR indices for bindless samplers/imagesSamuel Pitoiset2017-08-221-1/+3
| | | | | | | | | A new pair of user SGPR is needed for loading the bindless descriptors from shaders. Because the descriptors are global for all stages, there is no need to add separate indices for GFX9. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: make si_shader_selector_reference globally visibleNicolai Hähnle2017-08-221-0/+14
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/nir: perform lowering of input/output driver locationsNicolai Hähnle2017-07-311-0/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: scan NIR shaders to obtain required infoNicolai Hähnle2017-07-311-0/+4
| | | | | | v2: set num_instruction to 2, i.e. 1 + END (Marek) Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: add si_shader_selector::nirNicolai Hähnle2017-07-311-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: move instance divisors into a constant bufferMarek Olšák2017-06-271-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shader key size: 107 -> 47 Divisors of 0 and 1 are encoded in the shader key. Greater instance divisors are loaded from a constant buffer. The shader code doing the division is huge. Is it something we need to worry about? Does any app use instance divisors >= 2? VS prolog disassembly: s_load_dwordx4 s[12:15], s[0:1], 0x80 ; C00A0300 00000080 s_nop 0 ; BF800000 s_waitcnt lgkmcnt(0) ; BF8C007F s_buffer_load_dword s14, s[12:15], 0x4 ; C0220386 00000004 s_waitcnt lgkmcnt(0) ; BF8C007F v_cvt_f32_u32_e32 v4, s14 ; 7E080C0E v_rcp_iflag_f32_e32 v4, v4 ; 7E084704 v_mul_f32_e32 v4, 0x4f800000, v4 ; 0A0808FF 4F800000 v_cvt_u32_f32_e32 v4, v4 ; 7E080F04 v_mul_hi_u32 v5, v4, s14 ; D2860005 00001D04 v_mul_lo_i32 v6, v4, s14 ; D2850006 00001D04 v_cmp_eq_u32_e64 s[12:13], 0, v5 ; D0CA000C 00020A80 v_sub_i32_e32 v5, vcc, 0, v6 ; 340A0C80 v_cndmask_b32_e64 v5, v6, v5, s[12:13] ; D1000005 00320B06 v_mul_hi_u32 v5, v5, v4 ; D2860005 00020905 v_add_i32_e32 v6, vcc, v5, v4 ; 320C0905 v_subrev_i32_e32 v4, vcc, v5, v4 ; 36080905 v_cndmask_b32_e64 v4, v4, v6, s[12:13] ; D1000004 00320D04 v_mul_hi_u32 v5, v4, v1 ; D2860005 00020304 v_add_i32_e32 v4, vcc, s8, v0 ; 32080008 v_mul_lo_i32 v6, v5, s14 ; D2850006 00001D05 v_add_i32_e32 v7, vcc, 1, v5 ; 320E0A81 v_cmp_ge_u32_e64 s[12:13], v1, v6 ; D0CE000C 00020D01 v_sub_i32_e32 v6, vcc, v1, v6 ; 340C0D01 v_cmp_le_u32_e32 vcc, s14, v6 ; 7D960C0E v_cndmask_b32_e64 v8, 0, -1, s[12:13] ; D1000008 00318280 v_cndmask_b32_e64 v6, 0, -1, vcc ; D1000006 01A98280 v_and_b32_e32 v6, v8, v6 ; 260C0D08 v_cmp_eq_u32_e32 vcc, 0, v6 ; 7D940C80 v_cndmask_b32_e32 v6, v7, v5, vcc ; 000C0B07 v_add_i32_e32 v5, vcc, -1, v5 ; 320A0AC1 v_cmp_eq_u32_e32 vcc, 0, v8 ; 7D941080 v_cndmask_b32_e32 v5, v6, v5, vcc ; 000A0B06 v_add_i32_e32 v5, vcc, s9, v5 ; 320A0A09 v2: set prefer_mono for fetched instance divisors Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use #pragma pack to pack si_shader_keyMarek Olšák2017-06-271-0/+8
| | | | | | | | | sizeof(struct si_shader_key): Before reverting the 2 commits: 120 bytes After reverting the 2 commits: 128 bytes With #pragma pack: 107 bytes Reviewed-by: Nicolai Hähnle <[email protected]>
* Revert "radeonsi: use uint32_t to declare si_shader_key.opt.kill_outputs"Marek Olšák2017-06-271-2/+1
| | | | | | This reverts commit 7b2240ac9ce3ba9bd86f4ae8aac53af8878c0b10. Reviewed-by: Nicolai Hähnle <[email protected]>
* Revert "radeonsi: remove 8 bytes from si_shader_key with uint32_t ↵Marek Olšák2017-06-271-2/+1
| | | | | | | | ff_tcs_inputs_to_copy" This reverts commit 6b6fed3a3c81c2b0d319ef121df20a0dc914705f. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: include ac_binary.h for struct ac_shader_binaryEmil Velikov2017-06-171-2/+2
| | | | | | | | | | | | | | The header embeds the struct so it needs the header inclusion instead of the dummy forward declaration. Cc: Nicolai Hähnle <[email protected]> Cc: Marek Olšák <[email protected]> Cc: Tom Stellard <[email protected]> Fixes: 32206c5e560 ("radeonsi: Add radeon_shader_binary member to struct si_shader") Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>