mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	radeonsi/gfx9: prevent shader-db crashes	Marek Olšák	2017-08-22	1	-1/+11
\| \| \| \| \| \| \|	- don't precompile LS and ES (they don't exist on GFX9), compile as VS instead - don't precompile HS and GS (we don't have LS and ES parts) Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: make si_shader_selector_reference globally visible	Nicolai Hähnle	2017-08-22	1	-15/+2
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: add a separate dirty mask for prefetches	Marek Olšák	2017-08-07	1	-2/+31
\| \| \| \| \| \| \| \| \| \|	so that we don't rely on si_pm4_state_enabled_and_changed, allowing us to move prefetches after draw calls. v2: ckear the dirty mask after unbinding shaders Tested-by: Dieter Nützel <[email protected]> (v1) Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
*	radeonsi: add and use si_pm4_state_enabled_and_changed	Marek Olšák	2017-08-07	1	-6/+6
\| \| \| \| \|	Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: de-atomize L2 prefetch	Marek Olšák	2017-08-07	1	-1/+1
\| \| \| \| \| \| \|	I'd like to be able to move the prefetch call site around. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: tweak next-shader assumptions when streamout is used	Nicolai Hähnle	2017-07-31	1	-5/+11
\| \| \| \| \| \|	VS with streamout is always a HW VS. Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi/nir: perform lowering of input/output driver locations	Nicolai Hähnle	2017-07-31	1	-0/+2
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: bypass the shader cache for NIR shaders	Nicolai Hähnle	2017-07-31	1	-2/+3
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: scan NIR shaders to obtain required info	Nicolai Hähnle	2017-07-31	1	-6/+17
\| \| \| \| \| \|	v2: set num_instruction to 2, i.e. 1 + END (Marek) Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: simplify computation of tessellation offchip buffers	Marek Olšák	2017-07-17	1	-15/+4
\| \| \| \| \| \|	This is overly cautious, but better safe than sorry. Reviewed-by: Nicolai Hähnle <[email protected]>
*	gallium: use "ull" number suffix to keep the QtCreator parser happy	Marek Olšák	2017-07-10	1	-5/+5
\| \| \| \| \| \| \|	It can't parse "llu". Reviewed-by: Thomas Helland <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
*	radeonsi: move instance divisors into a constant buffer	Marek Olšák	2017-06-27	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Shader key size: 107 -> 47 Divisors of 0 and 1 are encoded in the shader key. Greater instance divisors are loaded from a constant buffer. The shader code doing the division is huge. Is it something we need to worry about? Does any app use instance divisors >= 2? VS prolog disassembly: s_load_dwordx4 s[12:15], s[0:1], 0x80 ; C00A0300 00000080 s_nop 0 ; BF800000 s_waitcnt lgkmcnt(0) ; BF8C007F s_buffer_load_dword s14, s[12:15], 0x4 ; C0220386 00000004 s_waitcnt lgkmcnt(0) ; BF8C007F v_cvt_f32_u32_e32 v4, s14 ; 7E080C0E v_rcp_iflag_f32_e32 v4, v4 ; 7E084704 v_mul_f32_e32 v4, 0x4f800000, v4 ; 0A0808FF 4F800000 v_cvt_u32_f32_e32 v4, v4 ; 7E080F04 v_mul_hi_u32 v5, v4, s14 ; D2860005 00001D04 v_mul_lo_i32 v6, v4, s14 ; D2850006 00001D04 v_cmp_eq_u32_e64 s[12:13], 0, v5 ; D0CA000C 00020A80 v_sub_i32_e32 v5, vcc, 0, v6 ; 340A0C80 v_cndmask_b32_e64 v5, v6, v5, s[12:13] ; D1000005 00320B06 v_mul_hi_u32 v5, v5, v4 ; D2860005 00020905 v_add_i32_e32 v6, vcc, v5, v4 ; 320C0905 v_subrev_i32_e32 v4, vcc, v5, v4 ; 36080905 v_cndmask_b32_e64 v4, v4, v6, s[12:13] ; D1000004 00320D04 v_mul_hi_u32 v5, v4, v1 ; D2860005 00020304 v_add_i32_e32 v4, vcc, s8, v0 ; 32080008 v_mul_lo_i32 v6, v5, s14 ; D2850006 00001D05 v_add_i32_e32 v7, vcc, 1, v5 ; 320E0A81 v_cmp_ge_u32_e64 s[12:13], v1, v6 ; D0CE000C 00020D01 v_sub_i32_e32 v6, vcc, v1, v6 ; 340C0D01 v_cmp_le_u32_e32 vcc, s14, v6 ; 7D960C0E v_cndmask_b32_e64 v8, 0, -1, s[12:13] ; D1000008 00318280 v_cndmask_b32_e64 v6, 0, -1, vcc ; D1000006 01A98280 v_and_b32_e32 v6, v8, v6 ; 260C0D08 v_cmp_eq_u32_e32 vcc, 0, v6 ; 7D940C80 v_cndmask_b32_e32 v6, v7, v5, vcc ; 000C0B07 v_add_i32_e32 v5, vcc, -1, v5 ; 320A0AC1 v_cmp_eq_u32_e32 vcc, 0, v8 ; 7D941080 v_cndmask_b32_e32 v5, v6, v5, vcc ; 000A0B06 v_add_i32_e32 v5, vcc, s9, v5 ; 320A0A09 v2: set prefer_mono for fetched instance divisors Reviewed-by: Nicolai Hähnle <[email protected]>
*	Revert "radeonsi: use uint32_t to declare si_shader_key.opt.kill_outputs"	Marek Olšák	2017-06-27	1	-3/+2
\| \| \| \| \| \|	This reverts commit 7b2240ac9ce3ba9bd86f4ae8aac53af8878c0b10. Reviewed-by: Nicolai Hähnle <[email protected]>
*	Revert "radeonsi: remove 8 bytes from si_shader_key with uint32_t ↵	Marek Olšák	2017-06-27	1	-6/+2
\| \| \| \| \| \| \| \|	ff_tcs_inputs_to_copy" This reverts commit 6b6fed3a3c81c2b0d319ef121df20a0dc914705f. Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: use the correct LLVMTargetMachineRef in si_build_shader_variant	Nicolai Hähnle	2017-06-22	1	-6/+22
\| \| \| \| \| \| \| \| \| \| \| \|	si_build_shader_variant can actually be called directly from one of normal-priority compiler threads. In that case, the thread_index is only valid for the normal tm array. v2: - use the correct sel/shader->compiler_ctx_state Fixes: 86cc8097266c ("radeonsi: use a compiler queue with a low priority for optimized shaders") Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: track use of bindless samplers/images from tgsi_shader_info	Samuel Pitoiset	2017-06-14	1	-5/+26
\| \| \| \| \| \| \| \| \|	This adds some new helper functions to know if the current draw call (or dispatch compute) is using bindless samplers/images, based on TGSI analysis. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: replace si_vertex_elements::elements with separate fields	Marek Olšák	2017-06-12	1	-5/+2
\| \| \| \| \| \|	It makes si_vertex_elements a little smaller. Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: remove 8 bytes from si_shader_key with uint32_t ff_tcs_inputs_to_copy	Marek Olšák	2017-06-12	1	-2/+6
\| \| \| \| \| \|	The previous patch helps with this. Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: use uint32_t to declare si_shader_key.opt.kill_outputs	Marek Olšák	2017-06-12	1	-2/+3
\| \| \| \| \| \|	the next patch will benefit from this Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: remove 8 bytes from si_shader_key by flattening opt.hw_vs	Marek Olšák	2017-06-12	1	-6/+6
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: don't update dependent states if it has no effect (v2)	Marek Olšák	2017-06-08	1	-5/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This and the previous clip_regs commit decrease IB sizes and the number of si_update_shaders invocations as follows: IB size si_update_shaders calls Borderlands 2 -10% -27% Deus Ex: MD -5% -11% Talos Principle -8% -30% v2: always dirty cb_render_state in set_framebuffer_state Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: update clip_regs on shader state changes only when it's needed	Marek Olšák	2017-06-07	1	-3/+32
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: precompute some fields for PA_CL_VS_OUT_CNTL in si_shader_selector	Marek Olšák	2017-06-07	1	-0/+16
\| \| \| \| \|	Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: add a new helper si_get_vs	Marek Olšák	2017-06-07	1	-4/+2
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: remove 8 bytes from si_shader_key	Marek Olšák	2017-06-07	1	-4/+4
\| \| \| \| \| \| \|	We can use a union in si_shader_key::mono. Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: move PSIZE and CLIPDIST unique IO indices after GENERIC	Marek Olšák	2017-06-07	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Heaven LDS usage for LS+HS is below. The masks are "outputs_written" for LS and HS. Note that 32K is the maximum size. Before: heaven_x64: ls=1f1 tcs=1f1, lds=32K heaven_x64: ls=31 tcs=31, lds=24K heaven_x64: ls=71 tcs=71, lds=28K After: heaven_x64: ls=3f tcs=3f, lds=24K heaven_x64: ls=7 tcs=7, lds=13K heaven_x64: ls=f tcs=f, lds=17K All other apps have a similar decrease in LDS usage, because the "outputs_written" masks are similar. Also, most apps don't write POSITION in these shader stages, so there is room for improvement. (tight per-component input/output packing might help even more) It's unknown whether this improves performance. Tested-by: Edmondo Tommasina <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi/gfx9: prevent a race when the previous shader's main part is missing	Marek Olšák	2017-06-07	1	-0/+2
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi/gfx9: wait for main part compilation of 1st shaders of merged shaders	Marek Olšák	2017-06-07	1	-0/+4
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi/gfx9: fix LS scratch buffer support without TCS for GFX9	Marek Olšák	2017-06-07	1	-3/+18
\| \| \| \| \| \| \| \| \| \|	LS is merged into TCS. If there is no TCS, LS is merged into fixed-func TCS. The problem is the fixed-func TCS was ignored by scratch update functions, so LS didn't have the scratch buffer set up. Note that Mesa 17.1 doesn't have merged shaders. Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: move streamout state update out of si_update_shaders	Marek Olšák	2017-06-07	1	-16/+24
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: move handling of DBG_NO_OPT_VARIANT into si_shader_selector_key	Marek Olšák	2017-06-07	1	-4/+3
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: use a compiler queue with a low priority for optimized shaders	Marek Olšák	2017-06-07	1	-4/+4
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: drop unfinished shader compilations when destroying shaders	Marek Olšák	2017-06-07	1	-2/+3
\| \| \| \| \| \| \|	If we enqueue too many jobs and destroy the GL context, it may take several seconds before the jobs finish. Just drop them instead. Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: only upload (dump to L2) those descriptors that are used by shaders	Marek Olšák	2017-05-18	1	-0/+6
\| \| \| \| \| \| \|	This decreases the size of CE RAM dumps to L2, or the size of descriptor uploads without CE. Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: record which descriptor slots are used by shaders	Marek Olšák	2017-05-18	1	-0/+27
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: rename tcs_tes_uses_prim_id for clarity	Nicolai Hähnle	2017-05-16	1	-6/+6
\| \| \| \| \| \| \| \|	What we care about is whether PrimID is used while tessellation is enabled; whether it's used in TCS/TES or further down the pipeline is irrelevant. Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: fix gl_PrimitiveIDIn in geometry shader when using tessellation	Nicolai Hähnle	2017-05-16	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	This builds on commit 0549ea15ec38 ("radeonsi: fix primitive ID in fragment shader when using tessellation"). Fixes piglit arb_tessellation_shader/execution/gs-primitiveid-instanced.shader_test Cc: 17.1 <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: get rid of secondary input/output word	Nicolai Hähnle	2017-05-12	1	-19/+3
\| \| \| \| \| \| \|	By keeping track of fewer generics, everything can fit into 64 bits. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: skip generic out/in indices without a shader IO index	Nicolai Hähnle	2017-05-12	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	OpenGL uses at most 32 generic outputs/inputs in any stage, and they always have a shader IO index and therefore fit into the outputs_written/ inputs_read/kill_outputs fields. However, Nine uses semantic indices more liberally. We support that in VS-PS pipelines, except that the optimization of killing outputs must be skipped. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: use SI_MAX_IO_GENERIC instead of magic values	Nicolai Hähnle	2017-05-12	1	-2/+2
\| \| \| \| \|	Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: split per-patch from per-vertex indices	Nicolai Hähnle	2017-05-08	1	-3/+3
\| \| \| \| \| \| \|	Make it a bit clearer that the index spaces are logically seperate by having them defined in different functions. Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: load patch_id for TES-as-ES when exporting for PS	Nicolai Hähnle	2017-05-08	1	-2/+2
\| \| \| \| \| \| \|	For some reason, this change is only necessary on SI. Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: fix primitive ID in fragment shader when using tessellation	Nicolai Hähnle	2017-05-08	1	-10/+17
\| \| \| \| \| \| \| \| \| \| \|	In a VS->TCS->TES->PS pipeline, the primitive ID is read from TES exports, so it is as if TES were using the primitive ID. Specifically, this fixes a bug where the primitive ID is not reset at the start of a new instance. Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi/gfx9: allow the scratch buffer in HS and GS	Marek Olšák	2017-05-05	1	-10/+0
\| \| \| \| \| \|	It works now. Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: prevent race conditions when doing scratch patching	Marek Olšák	2017-05-05	1	-2/+30
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: separate scratch state patching code into its own function	Marek Olšák	2017-05-05	1	-46/+55
\| \| \| \| \| \| \|	Picked from a different branch. When we stop using the scratch patching, this function will not be called. Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi/gfx9: also apply scratch relocations to the 1st shader of merged ↵	Marek Olšák	2017-05-05	1	-0/+3
\| \| \| \| \| \|	shaders Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: remove unused parameters from si_shader_apply_scratch_relocs	Marek Olšák	2017-05-05	1	-1/+1
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi/gfx9: fix gl_ViewportIndex	Marek Olšák	2017-05-03	1	-2/+11
\| \| \| \| \| \| \|	v2: remove unnecessary LLVMBuildAnd calls Cc: 17.1 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: pass tessellation ring addresses via user SGPRs	Marek Olšák	2017-04-28	1	-12/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This removes s_load_dword latency for tess rings. We need just 1 SGPR for the address if we use 64K alignment. The final asm for recreating the descriptor is: // s2 is (address >> 16) s_mov_b32 s3, 0 s_lshl_b64 s[4:5], s[2:3], 16 s_mov_b32 s6, -1 s_mov_b32 s7, 0x27fac v2: bitcast the descriptor type from v2i64 to v4i32 Reviewed-by: Nicolai Hähnle <[email protected]>