mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	nir: Remove a bunch of large stack arrays	Jason Ekstrand	2019-07-22	4	-6/+15
\| \| \| \| \|	Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	intel/fs: Stop stack allocating large arrays	Jason Ekstrand	2019-07-22	1	-6/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Normally, we haven't worried too much about stack sizes as Linux tends to be fairly friendly towards large stacks. However, when running DXVK apps under wine, we're suddenly subject to Windows' more stringent stack limitations and can run out of space more easily. In particular, some of the shaders in Elite Dangerous: Horizons have quite a few registers and the arrays in split_virtual_grfs are large enough to blow a 1 MiB stack leading to crashes during shader compilation. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108662 Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: [email protected]
*	egl/android: Update color_buffers querying for buffer age	Nataraj Deshpande	2019-07-22	2	-6/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	color_buffers[] is currently hard coded to 3 for android which fails in droid_window_dequeue_buffer when ANativeWindow creates color_buffers >3 while querying buffer age during dEQP partial_update tests on chromeOS. The patch removes static color_buffers[], queries for MIN_UNDEQUEUED_BUFFERS, sets native window buffer count and allocates the correct number of color_buffers as per android. Fixes dEQP-EGL.functional.partial_update* tests on chromebooks with enabling EGL_KHR_partial_update. v2: update comment instead of removing (Eric Engestrom) v3: change static array to dynamic allocated color_buffers querying MIN_UNDEQUEUED_BUFFERS (Chia-I Wu [email protected]) Fixes: 2acc69da8ce "EGL/Android: Add EGL_EXT_buffer_age extension" Signed-off-by: Nataraj Deshpande <[email protected]> Acked-by: Eric Engestrom <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
*	intel/compiler: Use nir_opt_conditional_discard	Caio Marcelo de Oliveira Filho	2019-07-22	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	anv vkpipeline-db results for SKL: total instructions in shared programs: 3622461 -> 3611281 (-0.31%) instructions in affected programs: 396452 -> 385272 (-2.82%) helped: 2062 HURT: 1 total cycles in shared programs: 1458144669 -> 1458105320 (<.01%) cycles in affected programs: 4171830 -> 4132481 (-0.94%) helped: 1874 HURT: 180 total loops in shared programs: 2437 -> 2437 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 8745 -> 8748 (0.03%) spills in affected programs: 8 -> 11 (37.50%) helped: 1 HURT: 1 total fills in shared programs: 23392 -> 23395 (0.01%) fills in affected programs: 8 -> 11 (37.50%) helped: 1 HURT: 1 LOST: 0 GAINED: 1 No changes to shader-db on i965 or iris. The glsl compiler already does a similar optimization. Improvement suggested by Daniel Schürmann. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	pan/decode: Disable magic divisor debugging	Alyssa Rosenzweig	2019-07-22	1	-0/+2
\| \| \| \| \| \| \|	Memory corruption (for both legitimate and illegitimate reasons) causes this to hang pantrace. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Report spills:fills to shader-db	Alyssa Rosenzweig	2019-07-22	3	-2/+12
\| \| \| \| \| \| \|	Route this info through so we can track how we're doing on register spilling. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	panfrost/midgard: Reenable pipeline register creation	Alyssa Rosenzweig	2019-07-22	1	-10/+9
\| \| \| \| \| \| \|	This was disabled to permit regression-free RA work. Now that the spill code is in place, we can reenable, with some caveats about efficacy. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	panfrost/midgard: Report tls_size	Alyssa Rosenzweig	2019-07-22	4	-0/+13
\| \| \| \| \| \| \| \|	Pipe through the number of bytes of spilled memory used from the compiler into the main driver, where it will be used to allocate the Thread Local Storage buffer. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	panfrost: Set `initialized` in more cases	Alyssa Rosenzweig	2019-07-22	2	-10/+9
\| \| \| \| \| \| \|	Indirect linear writes were not being marked as initialized, causing the back blit to be dropped, breaking the listed tests. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	panfrost/ci: Update expectations	Alyssa Rosenzweig	2019-07-22	1	-4/+0
\| \| \| \| \| \|	We've fixed some shader tests. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	panfrost/midgard: Promote to move, not rewrite for non-SSA	Alyssa Rosenzweig	2019-07-22	1	-2/+9
\| \| \| \| \| \|	Fixes promoted uniform loads to registers. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	panfrost/midgard: Dump MIR of RA failure	Alyssa Rosenzweig	2019-07-22	1	-1/+3
\| \| \| \|	Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard; Dump successor graph when printing MIR	Alyssa Rosenzweig	2019-07-22	1	-2/+12
\| \| \| \| \| \| \|	We just use the pointers of the midgard_block*, which is crude, but it gets the point across and will help debug successor related issues. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Remove debug statement	Alyssa Rosenzweig	2019-07-22	1	-2/+0
\| \| \| \|	Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	panfrost/midgard: Implement register spilling	Alyssa Rosenzweig	2019-07-22	4	-54/+158
\| \| \| \| \| \| \| \| \| \| \| \|	Now that we run RA in a loop, before each iteration after a failed allocation we choose a spill node and spill it to Thread Local Storage using st_int4/ld_int4 instructions (for spills and fills respectively). This allows us to compile complex shaders that normally would not fit within the 16 work register limits, although it comes at a fairly steep performance penalty. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	panfrost/midgard: Add mir_has_arg helper	Alyssa Rosenzweig	2019-07-22	1	-0/+12
\| \| \| \| \| \|	Helps scan the MIR for uses of an index. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	panfrost/midgard: Check write-before-read in liveness analysis	Alyssa Rosenzweig	2019-07-22	1	-0/+13
\| \| \| \| \| \| \| \|	If we write to an index before reading it, the old copy we're checking liveness for isn't live in this block, even if it does get read later. Fixes abnormally high register pressure in shaders with loops. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	panfrost/midgard/disasm: Check for certain tag errors	Alyssa Rosenzweig	2019-07-22	1	-0/+18
\| \| \| \| \| \| \| \|	Midgard bundles contain a tag, as well as a copy of the tag of the next bundle to facilitate prefetch. Do some simple static analysis to detect certain tag errors (particularly on shaders without branching). Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Add OP_IS_CSEL helper	Alyssa Rosenzweig	2019-07-22	1	-0/+7
\| \| \| \|	Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Add mir_rewrite_index_src_single helper	Alyssa Rosenzweig	2019-07-22	2	-6/+13
\| \| \| \| \| \| \|	Rather than rewriting an index away across the whole block, we expose finer (per-instruction) granularity for rewrites. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Ignore inline_constant in liveness	Alyssa Rosenzweig	2019-07-22	1	-0/+3
\| \| \| \| \| \|	It doesn't make any sense to look at it. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	panfrost/midgard: Implement load/store scratch opcodes	Alyssa Rosenzweig	2019-07-22	4	-2/+52
\| \| \| \| \| \| \| \|	These are used to load/store from Thread Local Storage, which is memory allocated per-thread (corresponding to ctx->scratchpad in the command stream) and used for register spilling. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midg/disasm: Check for int varying ops	Alyssa Rosenzweig	2019-07-22	1	-0/+4
\| \| \| \|	Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Remove "aliasing"	Alyssa Rosenzweig	2019-07-22	2	-96/+0
\| \| \| \| \| \| \|	It was a crazy idea that didn't pan out. We're better served by a good copyprop pass. It's also unused now. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	panfrost: Promote uniform registers late	Alyssa Rosenzweig	2019-07-22	6	-82/+174
\| \| \| \| \| \| \| \| \|	Rather than creating either a load or a uniform register read with a fixed beginning offset, we always create a load and then promote to a uniform register later. This will allow us to promote in a register pressure aware manner. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Call scheduler/RA in a loop	Alyssa Rosenzweig	2019-07-22	3	-13/+27
\| \| \| \| \| \| \| \| \| \| \| \|	This will allow us to insert instructions as a result of register allocation, permitting spilling to be implemented. As a side effect, with the assert commented out this would fix a bunch of glamor crashes (due to RA failures) so MATE becomes useable. Ideally we'll have scheduling or RA actually sorted out before the branch point but if not this gives us a one-line out to get X working... Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Remove custom register selection callback	Alyssa Rosenzweig	2019-07-22	1	-19/+0
\| \| \| \| \| \|	What we have is equivalent to the default callback; let's use that. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	radv: fix crash in vkCmdClearAttachments with unused attachment	Samuel Pitoiset	2019-07-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	depth_stencil_attachment and/or ds_resolve attachment can be NULL. This fixes crashes with dEQP-VK.renderpass.suballocation.unused_clear_attachments.* Cc: 19.1 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	i965: free object labels when deleting	Sergii Romantsov	2019-07-22	3	-0/+3
\| \| \| \| \| \| \| \|	Some leaks detected with GL_KHR_debug on i965. CC: Timothy Arceri <[email protected]> Signed-off-by: Sergii Romantsov <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	radv/gfx10: update descriptors for inline uniform blocks	Samuel Pitoiset	2019-07-22	1	-3/+10
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv/gfx10: emit the GS NGG prologue before the nested barrier	Samuel Pitoiset	2019-07-22	1	-6/+1
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv/gfx10: do not allocate space for the ZPASS_DONE bug	Samuel Pitoiset	2019-07-22	1	-6/+8
\| \| \| \| \| \| \|	GFX10 isn't affected. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv/gfx10: do not set ELEMENT_SIZE for buffer descriptors	Samuel Pitoiset	2019-07-22	1	-4/+4
\| \| \| \| \| \| \|	This field doesn't exist. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: clean up fill_geom_tess_rings()	Samuel Pitoiset	2019-07-22	1	-25/+9
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: change a bunch of >= GFX9 to == GFX9	Samuel Pitoiset	2019-07-22	4	-15/+15
\| \| \| \| \|	Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	ac/nir: do not clamp shadow reference on GFX10	Samuel Pitoiset	2019-07-22	1	-2/+6
\| \| \| \| \| \| \| \|	RadeonSI only uses Z32_FLOAT_CLAMP for upgraded depth textures on GFX10 and RADV doesn't promotes Z16 or Z24. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: move nir_opt_conditional_discard out of optimization loop	Daniel Schürmann	2019-07-22	1	-1/+1
\| \| \| \| \| \| \| \|	This late optimization pass is only affected by nir_opt_if() and handles all cases in a single pass. It's enough to call it once after the optimization loop. No changes on vkpipeline-db. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	v3d: fill logicop_func in the fragment shader key when precompiling shaders	Iago Toral Quiroga	2019-07-22	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	Since logicop_func 0 is PIPE_LOGIOP_CLEAR, we were trigger lowerinng of logic ops on precompiled shaders, which we don't want to do. Also, this had the side effect of making shader-db crash, as during this lowering we would try to read the color format swizzle information from the fragment shader key that we don't populate in precompiled shaders because right now we only need it when logic operations are enabled. Reviewed-by: Eric Anholt <[email protected]>
*	v3d: Avoid scheduling an instruction that stalls waiting for SFU retval	Jose Maria Casanova Crespo	2019-07-22	1	-4/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we detect that a scheduling candidate will stall because having a register source that is the written by the SFU unit in the previous instruction we reduce its priority so any non stalling operation would be chosen. The latency of SFU operations is defined as 2. So they would be scheduled earlier if other candidates have the same priority. Finally we won't merge instructions that stall to a previously chosen one. As the result of the previous one would be waiting for an extra cycle. Although shader-db result show that instruction are hurt with an increase of 0.35% the sum of instructions + stalls is reduced a 0.52%. And the total of sfu-stalls is reduced a 63.51%. It implies also a small increase in the max-temps metric because of scheduling earlier SFU operations. total instructions in shared programs: 9102719 -> 9117851 (0.17%) instructions in affected programs: 4324628 -> 4339760 (0.35%) helped: 4162 HURT: 12128 helped stats (abs) min: 1 max: 10 x̄: 1.28 x̃: 1 helped stats (rel) min: 0.09% max: 4.76% x̄: 0.66% x̃: 0.51% HURT stats (abs) min: 1 max: 27 x̄: 1.69 x̃: 1 HURT stats (rel) min: 0.05% max: 7.69% x̄: 0.87% x̃: 0.68% 95% mean confidence interval for instructions value: 0.90 0.96 95% mean confidence interval for instructions %-change: 0.47% 0.50% Instructions are HURT. total max-temps in shared programs: 1327728 -> 1327812 (<.01%) max-temps in affected programs: 4730 -> 4814 (1.78%) helped: 61 HURT: 134 helped stats (abs) min: 1 max: 2 x̄: 1.08 x̃: 1 helped stats (rel) min: 2.70% max: 13.33% x̄: 4.89% x̃: 4.17% HURT stats (abs) min: 1 max: 3 x̄: 1.12 x̃: 1 HURT stats (rel) min: 1.54% max: 20.00% x̄: 6.10% x̃: 5.26% 95% mean confidence interval for max-temps value: 0.28 0.58 95% mean confidence interval for max-temps %-change: 1.80% 3.52% Max-temps are HURT. total sfu-stalls in shared programs: 99551 -> 36324 (-63.51%) sfu-stalls in affected programs: 95029 -> 31802 (-66.53%) helped: 25882 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 2.44 x̃: 2 helped stats (rel) min: 5.26% max: 100.00% x̄: 79.86% x̃: 100.00% 95% mean confidence interval for sfu-stalls value: -2.47 -2.42 95% mean confidence interval for sfu-stalls %-change: -80.18% -79.54% Sfu-stalls are helped. total inst-and-stalls in shared programs: 9202270 -> 9154175 (-0.52%) inst-and-stalls in affected programs: 5618516 -> 5570421 (-0.86%) helped: 22728 HURT: 855 helped stats (abs) min: 1 max: 31 x̄: 2.16 x̃: 1 helped stats (rel) min: 0.07% max: 16.67% x̄: 1.14% x̃: 0.92% HURT stats (abs) min: 1 max: 5 x̄: 1.25 x̃: 1 HURT stats (rel) min: 0.12% max: 5.26% x̄: 1.24% x̃: 0.86% 95% mean confidence interval for inst-and-stalls value: -2.07 -2.01 95% mean confidence interval for inst-and-stalls %-change: -1.07% -1.05% Inst-and-stalls are helped. v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric) Reviewed-by: Eric Anholt <[email protected]>
*	v3d: add shader-db stat to count SFU stalls	Jose Maria Casanova Crespo	2019-07-22	5	-14/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	SFU operations have a latency of 2 cicles, so if their results are used in the following cycle to a SFU instruction, the GPU stalls for an extra cycle until the result is available. This adds the number of stalls to the shader-db debug mode and sum of instruction + stalls to evaluate optimizations to schedule instructions that avoid generating sfu-stalls. v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric) Reviewed-by: Eric Anholt <[email protected]>
*	radv: replace memset()+strcpy() with snprintf()	Eric Engestrom	2019-07-21	1	-3/+1
\| \| \| \| \| \| \|	Just like the next line :) Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: drop unnecessary memset() before snprintf()	Eric Engestrom	2019-07-21	1	-1/+0
\| \| \| \| \| \| \|	snprintf() always terminates the string. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radv: Fix uninitialized warning.	Bas Nieuwenhuizen	2019-07-21	1	-1/+2
\| \| \| \| \| \| \| \|	For es_vgpr_comp_cnt. Fixes: 795adbbadd4 "radv/gfx10: Add pipeline state support for tess." Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
*	virgl: fix a sync issue in virgl_buffer_transfer_extend	Chia-I Wu	2019-07-19	1	-62/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In virgl_buffer_transfer_extend, when no flush is needed, it tries to extend a previously queued transfer instead if it can find one. Comparing to virgl_resource_transfer_prepare, it fails to check if the resource is busy. The existence of a previously queued transfer normally implies that the resource is not busy, maybe except for when the transfer is PIPE_TRANSFER_UNSYNCHRONIZED. Rather than burdening us with a lengthy comment, and potential concerns over breaking it as the transfer code evolves, this commit makes the valid_buffer_range check the only condition to take the fast path. In real world, we hit the fast path almost only because of the valid_buffer_range check. In micro benchmarks, the condition should always be true, otherwise the benchmarks are not very representative of meaningful workloads. I think this fix is justified. The recent change to PIPE_TRANSFER_MAP_DIRECTLY usage disables the fast path. This commit re-enables it as well. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
*	virgl: rework virgl_transfer_queue_extend	Chia-I Wu	2019-07-19	3	-25/+24
\| \| \| \| \| \| \| \|	Do not take a transfer and do the memcpy. Add a _buffer suffix to the function name to make it clear that it is only for buffers. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
*	virgl: fix virgl_buffer_transfer_extend	Chia-I Wu	2019-07-19	1	-0/+1
\| \| \| \| \| \| \| \|	Without setting hw_res, virgl_transfer_queue_extend never finds a match and always returns NULL. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
*	radeonsi: initialize scissor registers etc. without clear state	Marek Olšák	2019-07-19	1	-1/+1
\| \| \| \| \|	Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]> Acked-by: Samuel Pitoiset <[email protected]>
*	radeonsi: return success from vi_dcc_clear_level to simplify callers	Marek Olšák	2019-07-19	3	-28/+26
\| \| \| \| \|	Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]> Acked-by: Samuel Pitoiset <[email protected]>
*	radeonsi: fix compute-based culling regression in 1ce52c1e373	Marek Olšák	2019-07-19	1	-1/+1
\| \| \| \| \|	Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]> Acked-by: Samuel Pitoiset <[email protected]>
*	radeonsi/gfx10: fix VGT_PRIMITIVE_TYPE programming	Marek Olšák	2019-07-19	1	-1/+3
\| \| \| \| \|	Acked-by: Pierre-Eric Pelloux-Prayer <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>