mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	radeonsi/nir: Use nir stripping pass	Connor Abbott	2019-03-12	1	-0/+5
\| \| \| \| \| \| \| \| \|	This reduces compilation time for my shader-db collection from around 40 seconds to 30, vs. 19 seconds for TGSI. There are still some shaders that TGSI caches but NIR doesn't, partly because of more aggressive cross-stage optimizations with NIR. Reviewed-by: Timothy Arceri <[email protected]>
*	nir: Add a stripping pass for improved cacheability	Connor Abbott	2019-03-12	4	-0/+111
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Oftentimes various nir shaders after lowering will be the same, or almost the same. For example, this can happen when the same shader is linked with different shaders to form different pipelines and cross-stage optimizations don't kick in to change it. We want to avoid running the backend twice on these shaders. We were already doing this with radeonsi, but we were storing a few extra pieces of information that made this much less effective compared to TGSI. The worse offender by far was the program name, which caused most of the cache misses. This pass strips out these pieces of information, controlled by the NIR_STRIP debug env variable. Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	radv: fix pointSizeRange limits	Samuel Pitoiset	2019-03-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	The values should match the ones that are emitted. This fixes new CTS dEQP-VK.rasterization.primitive_size.points.*. Fixes: f4e499ec791 ("radv: add initial non-conformant radv vulkan driver") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	iris: Flag fewer dirty bits in BLORP	Sagar Ghuge	2019-03-11	1	-3/+27
\| \| \| \| \| \| \| \| \|	v2: 1) Skip flagging IRIS_DIRTY_DEPTH_BUFFER if BLORP_BATCH_NO_EMIT_DEPTH_STENCIL is set (Kenneth Graunke) 2) Add missing flags (Kenneth Graunke) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	st/glsl_to_nir: fix incorrect arrary access	Timothy Arceri	2019-03-12	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a segfault when we try to access the array using a -1 when the array wasn't allocated in the first place. Before 7536af670b75 we would just access a pre-allocated array that was also load/stored to/from the shader cache. But now the cache will no longer allocate these arrays if they are empty. The change resulted in tests such as the following segfaulting when run with a warm shader cache. tests/spec/arb_arrays_of_arrays/execution/sampler/fs-struct-const-index.shader_test
*	nir: silence a couple new compiler warnings	Brian Paul	2019-03-12	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	[33/630] Compiling C object 'src/compiler/nir/nir@sta/nir_loop_analyze.c.o'. ../src/compiler/nir/nir_loop_analyze.c: In function ‘try_find_trip_count_vars_in_iand’: ../src/compiler/nir/nir_loop_analyze.c:846:29: warning: suggest parentheses around ‘&&’ within ‘\|\|’ [-Wparentheses] if (ind == NULL \|\| ind && (ind)->type != basic_induction \|\| ^ [85/630] Compiling C object 'src/compiler/nir/nir@sta/nir_opt_loop_unroll.c.o'. ../src/compiler/nir/nir_opt_loop_unroll.c: In function ‘complex_unroll_single_terminator’: ../src/compiler/nir/nir_opt_loop_unroll.c:494:17: warning: unused variable ‘unroll_loc’ [-Wunused-variable] nir_cf_node unroll_loc = ^ Reviewed-by: Timothy Arceri <[email protected]>
*	panfrost: Identify fragment_extra flags	Alyssa Rosenzweig	2019-03-12	3	-10/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The fragment_extra structure contains additional fields extending the MRT framebuffer descriptor, snuck in between the main framebuffer descriptor and the render targets. Its fields include those related to transaction elimination and depth/stencil buffers. This patch identifies the flags field (previously just "unk" with some magic values) as well as identifying some (but not all) flags set by the driver. The process of identifying flags brought a bug to light where transaction elimination (checksumming) could not be enabled unless AFBC was in-use. This issue is now resolved. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
*	panfrost: Document "depth-buffer writeback" bit	Alyssa Rosenzweig	2019-03-12	2	-1/+9
\| \| \| \| \| \| \| \| \|	This bit, if set, causes the depth buffer to be copied from GPU tile memory to the provided depth buffer in main memory. If not set, the GPU will not access the main memory (saving considerable memory bandwidth if depth results are not actually used). Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	panfrost: Support linear depth textures	Alyssa Rosenzweig	2019-03-12	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \|	This combination has not yet been seen "in the wild" in traces, but to support linear depth FBOs, ~bruteforce reveals this bit pattern is necessary. It's not yet clear why the meanings of 0x1 and 0x2 are essentially flipped (tiled vs linear for colour, linear vs some sort of tiled for depth). Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
*	panfrost: Allocate dedicated slab for linear BOs	Alyssa Rosenzweig	2019-03-12	2	-15/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, linear BOs shared memory with each other to minimize kernel round-trips / latency, as well as to work around a bug in the free_slab function. These concerns are invalid now, but continuing to use the slab allocator for BOs resulted in memory allocation errors. This issue was aggravated, though not introduced (so not a real regression) in the previous commit. v2 (unreviewed): Fix bug in v1 preventing munmaps from working Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
*	panfrost: Determine framebuffer format bits late	Alyssa Rosenzweig	2019-03-12	1	-17/+42
\| \| \| \| \| \| \| \| \| \| \|	Again, these formats are only properly known at the time of fragment job emit. Rather than hardcoding the format, at least for MFBD we begin to construct the format bits on-demand. This cleans up the code, futureproofs for ES3 framebuffer formats, and should fix bugs regarding FBO colour swizzles. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
*	panfrost: Delay color buffer setup	Alyssa Rosenzweig	2019-03-12	1	-43/+50
\| \| \| \| \| \| \| \| \|	In an effort to cleanup framebuffer management code, we delay colour buffer setup until the FRAGMENT job is actually emitted, allowing the AFBC and linear codepaths to be unified. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
*	panfrost: Combine has_afbc/tiled in layout enum	Alyssa Rosenzweig	2019-03-12	3	-24/+64
\| \| \| \| \| \| \| \|	AFBC, tiled, and linear BO layouts are mutually exclusive; they should be coupled via a single enum rather than ad hoc checks of booleans. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
*	panfrost: Cleanup needless if in create_bo	Alyssa Rosenzweig	2019-03-12	1	-30/+26
\| \| \| \| \| \| \| \| \|	I'm not sure why we were checking for these additional criteria (likely inherited from some other driver); remove the needless checks to cleanup the code and perhaps fix some bugs down the line. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
*	i965: Reimplement all the PIPE_CONTROL rules.	Kenneth Graunke	2019-03-11	1	-136/+403
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implements virtually all documented PIPE_CONTROL restrictions in a centralized helper. You now simply ask for the operations you want, and the pipe control "brain" will figure out exactly what pipe controls to emit to make that happen without tanking your system. The hope is that this will fix some intermittent flushing issues as well as GPU hangs. However, it also has a high risk of causing GPU hangs and other regressions, as this is a particularly sensitive area and poking the bear isn't always advisable. Mark Janes noted that this patch helps with some GPU hangs on Icelake. This does re-enable the VF Invalidate => Write Immediate workaround on Gen8, which had been disabled (bug 103787) due to GPU hangs. The old code did this workaround after another which would have added CS stall bits, so it missed a workaround. The new code orders them properly and appears to work. v4: Don't pass "bo, offset, imm" to a recursive CS stall (caught by Topi Pohjolainen), drop Gen10 workarounds that are unnecessary for production hardware. Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965: Use genxml for emitting PIPE_CONTROL.	Kenneth Graunke	2019-03-11	7	-230/+362
\| \| \| \| \| \| \| \| \| \| \|	While this does add a bunch of boilerplate, it also protects us against the hardware moving bits, or changing their meaning. For something as finnicky as PIPE_CONTROL, the extra safety seems worth it. We turn PIPE_CONTROL_* into an bitfield of arbitrary flags, and then pack them appropriately. Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965: Rename ISP_DIS to INDIRECT_STATE_POINTERS_DISABLE.	Kenneth Graunke	2019-03-11	2	-2/+2
\| \| \| \| \| \|	Clearer name. Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965: Move some genX infrastructure to genX_boilerplate.h.	Kenneth Graunke	2019-03-11	4	-128/+174
\| \| \| \| \| \| \|	This will let us make multiple genX_*.c files, without copy and pasting all this boilerplate. Reviewed-by: Topi Pohjolainen <[email protected]>
*	gallium/winsys/kms: fix incomplete type compilation failure	Brian Paul	2019-03-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Fixes: ../src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c: In function ‘kms_sw_displaytarget_from_handle’: ../src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c:402:60: error: dereferencing pointer to incomplete type ‘const struct pipe_resource’ templ->format, ^ Reviewed-by: Mathias Fröhlich <[email protected]>
*	drisw: fix incomplete type compilation failure	Brian Paul	2019-03-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Fixes: ../src/gallium/winsys/sw/dri/dri_sw_winsys.c: In function ‘dri_sw_displaytarget_display’: ../src/gallium/winsys/sw/dri/dri_sw_winsys.c:255:39: error: dereferencing pointer to incomplete type ‘struct pipe_box’ offset = dri_sw_dt->stride * box->y; ^ Reviewed-by: Mathias Fröhlich <[email protected]>
*	st/mesa: minor refactoring of texture/sampler delete code	Brian Paul	2019-03-11	3	-6/+11
\| \| \| \| \| \| \| \| \| \| \| \|	Rename st_texture_free_sampler_views() to st_delete_texture_sampler_views() to align with st_DeleteTextureObject(), its only caller. Move the call to st_texture_release_all_sampler_views() from st_DeleteTextureObject() to st_delete_texture_sampler_views() so all the sampler view clean-up code is in one place. Reviewed-by: Neha Bhende <[email protected]>
*	st/mesa: rename st_texture_release_sampler_view()	Brian Paul	2019-03-11	3	-5/+5
\| \| \| \| \| \| \|	To st_texture_release_context_sampler_view() to be more clear that it's context-specific. Reviewed-by: Neha Bhende <[email protected]>
*	st/mesa: add/improve sampler view comments	Brian Paul	2019-03-11	1	-2/+8
\| \| \| \|	Reviewed-by: Neha Bhende <[email protected]>
*	st/mesa: move around some code in st_context.c	Brian Paul	2019-03-11	2	-122/+116
\| \| \| \| \| \| \| \| \| \| \| \|	st_init_driver_functions() is only called in st_context.c so there's no need for the prototype in st_context.h To avoid a forward declaration of st_init_driver_functions() in st_context.c, we need to move around several other functions. No functional change. Reviewed-by: Neha Bhende <[email protected]>
*	st/mesa: move utility functions, macros into new st_util.h file	Brian Paul	2019-03-11	33	-91/+184
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	To de-clutter st_context.h. Clean up remaining function prototypes in st_context.h. The st_vp_uses_current_values() helper is only used in st_context.c so move it there. The st_get_active_states() function is only used in st_context.c so remove its prototype in st_context.h Reviewed-by: Neha Bhende <[email protected]>
*	anv: destroy descriptor sets when pool gets reset	Juan A. Suarez Romero	2019-03-11	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	As stated in Vulkan spec: "Resetting a descriptor pool recycles all of the resources from all of the descriptor sets allocated from the descriptor pool back to the descriptor pool, and the descriptor sets are implicitly freed." This fixes dEQP-VK.api.descriptor_pool.* Fixes: 14f6275c92f1 "anv/descriptor_set: add reference counting for..." Reviewed-by: Jason Ekstrand <[email protected]> Tested-by: Clayton Craft <[email protected]>
*	nir: find induction/limit vars in iand instructions	Timothy Arceri	2019-03-12	1	-8/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This will be used to help find the trip count of loops that look like the following: while (a < x && i < 8) { ... i++; } Where the NIR will end up looking something like this: vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */) loop { ... vec1 1 ssa_12 = ilt ssa_225, ssa_11 vec1 1 ssa_17 = ilt ssa_226, ssa_1 vec1 1 ssa_18 = iand ssa_12, ssa_17 vec1 1 ssa_19 = inot ssa_18 if ssa_19 { ... break } else { ... } } On RADV this unrolls a bunch of loops in F1-2017 shaders. Totals from affected shaders: SGPRS: 4112 -> 4136 (0.58 %) VGPRS: 4132 -> 4052 (-1.94 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 515444 -> 587720 (14.02 %) bytes LDS: 2 -> 2 (0.00 %) blocks Max Waves: 194 -> 196 (1.03 %) Wait states: 0 -> 0 (0.00 %) It also unrolls a couple of loops in shader-db on radeonsi. Totals from affected shaders: SGPRS: 128 -> 128 (0.00 %) VGPRS: 64 -> 64 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 6880 -> 9504 (38.14 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 16 -> 16 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Ian Romanick <[email protected]>
*	nir: pass nir_op to calculate_iterations()	Timothy Arceri	2019-03-12	1	-7/+10
\| \| \| \| \| \| \| \|	Rather than getting this from the alu instruction this allows us some flexibility. In the following pass we instead pass the inverse op. Reviewed-by: Ian Romanick <[email protected]>
*	nir: add get_induction_and_limit_vars() helper to loop analysis	Timothy Arceri	2019-03-12	1	-15/+26
\| \| \| \| \| \| \|	This helps make find_trip_count() a little easier to follow but will also be used by a following patch. Reviewed-by: Ian Romanick <[email protected]>
*	nir: add helper to return inversion op of a comparison	Timothy Arceri	2019-03-12	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This will be used to help find the trip count of loops that look like the following: while (a < x && i < 8) { ... i++; } Where the NIR will end up looking something like this: vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */) loop { ... vec1 1 ssa_12 = ilt ssa_225, ssa_11 vec1 1 ssa_17 = ilt ssa_226, ssa_1 vec1 1 ssa_18 = iand ssa_12, ssa_17 vec1 1 ssa_19 = inot ssa_18 if ssa_19 { ... break } else { ... } } So in order to find the trip count we need to find the inverse of ilt. Reviewed-by: Ian Romanick <[email protected]>
*	nir: simplify the loop analysis trip count code a little	Timothy Arceri	2019-03-12	1	-81/+82
\| \| \| \| \| \| \| \| \| \|	Here we create a helper is_supported_terminator_condition() and use that rather than embedding all the trip count code inside a switch. The new helper will also be used in a following patch. Reviewed-by: Ian Romanick <[email protected]>
*	nir: unroll some loops with a variable limit	Timothy Arceri	2019-03-12	1	-0/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For some loops can have a single terminator but the exact trip count is still unknown. For example: for (int i = 0; i < imin(x, 4); i++) ... Shader-db results radeonsi (all affected are from Tropico 5): Totals from affected shaders: SGPRS: 144 -> 152 (5.56 %) VGPRS: 124 -> 108 (-12.90 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 5180 -> 6640 (28.19 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 17 -> 21 (23.53 %) Wait states: 0 -> 0 (0.00 %) Shader-db results i965 (SKL): total loops in shared programs: 3808 -> 3802 (-0.16%) loops in affected programs: 6 -> 0 helped: 6 HURT: 0 vkpipeline-db results RADV (Unrolls some Skyrim VR shaders): Totals from affected shaders: SGPRS: 304 -> 304 (0.00 %) VGPRS: 296 -> 292 (-1.35 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 15756 -> 25884 (64.28 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 29 -> 29 (0.00 %) Wait states: 0 -> 0 (0.00 %) v2: fix bug where last iteration would get optimised away by mistake. Reviewed-by: Ian Romanick <[email protected]>
*	nir: calculate trip count for more loops	Timothy Arceri	2019-03-12	3	-6/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support to loop analysis for loops where the induction variable is compared to the result of min(variable, constant). For example: for (int i = 0; i < imin(x, 4); i++) ... We add a new bool to the loop terminator struct in order to differentiate terminators with this exit condition. Reviewed-by: Ian Romanick <[email protected]>
*	nir: add partial loop unrolling support	Timothy Arceri	2019-03-12	1	-8/+199
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds partial loop unrolling support and makes use of a guessed trip count based on array access. The code is written so that we could use partial unrolling more generally, but for now it's only use when we have guessed the trip count. We use partial unrolling for this guessed trip count because its possible any out of bounds array access doesn't otherwise affect the shader e.g the stores/loads to/from the array are unused. So we insert a copy of the loop in the innermost continue branch of the unrolled loop. Later on its possible for nir_opt_dead_cf() to then remove the loop in some cases. A Renderdoc capture from the Rise of the Tomb Raider benchmark, reports the following change in an affected compute shader: GPU duration: 350 -> 325 microseconds shader-db results radeonsi VEGA (NIR backend): SGPRS: 1008 -> 816 (-19.05 %) VGPRS: 684 -> 432 (-36.84 %) Spilled SGPRs: 539 -> 0 (-100.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 39708 -> 45812 (15.37 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 105 -> 144 (37.14 %) Wait states: 0 -> 0 (0.00 %) shader-db results i965 SKL: total instructions in shared programs: 13098265 -> 13103359 (0.04%) instructions in affected programs: 5126 -> 10220 (99.38%) helped: 0 HURT: 21 total cycles in shared programs: 332039949 -> 331985622 (-0.02%) cycles in affected programs: 289252 -> 234925 (-18.78%) helped: 12 HURT: 9 vkpipeline-db results VEGA: Totals from affected shaders: SGPRS: 184 -> 184 (0.00 %) VGPRS: 448 -> 448 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 26076 -> 24428 (-6.32 %) bytes LDS: 6 -> 6 (0.00 %) blocks Max Waves: 5 -> 5 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Ian Romanick <[email protected]>
*	nir: add new partially_unrolled bool to nir_loop	Timothy Arceri	2019-03-12	2	-0/+2
\| \| \| \| \| \| \| \| \| \|	In order to stop continuously partially unrolling the same loop we add the bool partially_unrolled to nir_loop, we add it here rather than in nir_loop_info because nir_loop_info is only set via loop analysis and is intended to be cleared before each analysis. Also nir_loop_info is never cloned. Reviewed-by: Ian Romanick <[email protected]>
*	nir: add guess trip count support to loop analysis	Timothy Arceri	2019-03-12	2	-6/+86
\| \| \| \| \| \| \| \| \| \| \| \|	This detects an induction variable used as an array index to guess the trip count of the loop. This enables us to do a partial unroll of the loop, which can eventually result in the loop being eliminated. v2: check if the induction var is used to index more than a single array and if so get the size of the smallest array. Reviewed-by: Ian Romanick <[email protected]>
*	panfrost: Add support for PAN_MESA_DEBUG	Tomeu Vizoso	2019-03-12	6	-27/+88
\| \| \| \| \|	Signed-off-by: Tomeu Vizoso <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
*	panfrost/midgard: Add support for MIDGARD_MESA_DEBUG	Tomeu Vizoso	2019-03-12	2	-22/+50
\| \| \| \| \|	Signed-off-by: Tomeu Vizoso <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
*	nir/spirv: Fix assert when unsampled OpTypeImage has unknown 'Depth'	Xavier Bouchoux	2019-03-11	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	'dxc' hlsl-to-spirv compiler appears to emit 2 (Unknown) in the depth field, when the image is not sampled and the value is not needed. Previously, shaders failed with: SPIR-V parsing FAILED: In file ../src/compiler/spirv/spirv_to_nir.c:1412 !is_shadow 632 bytes into the SPIR-V binary Reviewed-by: Jason Ekstrand <[email protected]>
*	iris: Fix write enable in pinning of depth/stencil resources	Kenneth Graunke	2019-03-11	1	-12/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We may bind new Z/S buffers (which come via the framebuffer CSO, triggering IRIS_DIRTY_DEPTH_BUFFER), but with writes disabled. The next draw may enable Z or S writes (which come via the ZSA CSO, triggering IRIS_DIRTY_WM_DEPTH_STENCIL), which requires us to update our pin to have the write flag. So, update pinning if either dirty flag changes. To clarify, pass cso_zsa to the pinning function rather than pulling the random values out of ice->state, which unfortunately have to exist for the resolve code since iris_depth_stencil_alpha_state only exists in iris_state.c.
*	iris: Refactor depth/stencil buffer pinning into a helper.	Kenneth Graunke	2019-03-11	1	-37/+28
\| \| \| \| \| \| \|	This avoids the code duplication that caused me to put things in the wrong place in the previous commit. One used to have extra flushes, but we moved those out so now these are identical and can be easily shared.
*	iris: Move depth/stencil flushes so they actually do something	Kenneth Graunke	2019-03-11	2	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit d6dd57d43cd (iris: Add missing depth cache flushes) added the depth/stencil flushes to the wrong place. I meant to add them to the iris_upload_dirty_render_state code that emits the packets, but I accidentally added them to the nearly identical looking code in iris_restore_render_saved_bos. This meant we missed the actual flushing at draw time, but instead did pointless flushing on the first draw in a batch where things are already flushed anyway. This commit moves them to iris_resolve.c, next to the depth prepares, similar to what we do for color buffers. i965 does them elsewhere, but I'm not sure why - this seems like the most consistent place.
*	st/dri: allow direct UYVY import	Christian Gmeiner	2019-03-11	1	-0/+2
\| \| \| \| \| \| \|	Push this format to the pipe driver unchanged. Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	iris: Fix TES gl_PatchVerticesIn handling.	Kenneth Graunke	2019-03-11	2	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. If we switch the TCS for one with a different number of output vertices, then the TES's gl_PatchVerticesIn value will change. We need to re-upload in this case. For now, re-emit constants whenever the TCS/TES are swapped out. 2. If there is no TCS, then we can't grab gl_PatchVerticesIn from the TCS info. Since it's a passthrough, we can just use the primitive's patch count (like the TCS gl_PatchVerticesIn does). Fixes KHR-GL45.tessellation_shader.single.max_patch_vertices and KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	iris: Rework default tessellation level uploads	Kenneth Graunke	2019-03-11	2	-39/+33
\| \| \| \| \| \| \| \| \| \| \|	Now that we've added a system value uploading mechanism, we may as well reuse the same system for default tessellation levels. This simplifies the state upload code a bit. Also fixes: KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_tessLevel Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	iris: Face should be a system value.	Timur Kristóf	2019-03-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This patch adds PIPE_CAP_TGSI_FS_FACE_IS_INTEGER_SYSVAL which despite its name is not a TGSI-specific capability, just lets the state tracker know that it should generate a system value for FACE. This is needed if we want to run tgsi_to_nir on iris. Signed-off-by: Timur Kristóf <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	vc4: Switch the post-RA scheduler over to the DAG datastructure.	Eric Anholt	2019-03-11	1	-110/+73
\| \| \| \|	Just a small code reduction from shared infrastructure.
*	v3d: Use the DAG datastructure for QPU instruction scheduling.	Eric Anholt	2019-03-11	2	-117/+75
\| \| \| \|	Just a small code reduction from shared infrastructure.
*	vc4: Reuse list_for_each_entry_rev().	Eric Anholt	2019-03-11	1	-2/+2
\|
*	v3d: Reuse list_for_each_entry_rev().	Eric Anholt	2019-03-11	1	-2/+2
\|