mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	nir: Add ir3-specific version of most SSBO intrinsics	Eduardo Lima Mitev	2019-03-13	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \|	These are ir3 specific versions of SSBO intrinsics that add an extra source to hold the element offset (dword), which is what the backend instructions need. The original byte-offset source provided by NIR is not replaced because on a4xx and a5xx the backend still needs it. Reviewed-by: Rob Clark <[email protected]>
*	nir: Add a pass to combine store_derefs to same vector	Caio Marcelo de Oliveira Filho	2019-03-13	5	-0/+579
\| \| \| \| \| \| \| \| \|	v2: (all from Jason) Reuse existing function for the end of the block combinations. Check the SSA values are coming from the right place in tests. Document the case when the store to array_deref is reused. Reviewed-by: Jason Ekstrand <[email protected]>
*	glsl/lower_vector_derefs: Don't use a temporary for TCS outputs	Jason Ekstrand	2019-03-13	1	-10/+64
\| \| \| \| \| \| \| \| \| \| \| \|	Tessellation control shader outputs act as if they have memory backing them and you can have multiple writes to different components of the same vector in-flight at the same time. When this happens, the load vec store pattern that gets used by ir_triop_vector_insert doesn't yield the correct results. Instead, just emit a sequence of conditional assignments. Reviewed-by: Ian Romanick <[email protected]> Cc: [email protected]
*	glsl/list: Add a list variant of insert_after	Jason Ekstrand	2019-03-13	1	-0/+26
\| \| \| \| \|	Reviewed-by: Ian Romanick <[email protected]> Caio Marcelo de Oliveira Filho <[email protected]>
*	nir/loop_unroll: Fix out-of-bounds access handling	Jason Ekstrand	2019-03-12	1	-12/+2
\| \| \| \| \| \| \| \| \| \| \|	The previous code was completely broken when it came to constructing the undef values. I'm not sure how it ever worked. For the case of a copy that reads an undefined value, we can just delete the copy because the destination is a valid undefined value. This saves us the effort of trying to construct a value for an arbitrary copy_deref intrinsic. Fixes: e8a8937a04 "nir: add partial loop unrolling support" Reviewed-by: Timothy Arceri <[email protected]>
*	nir: Add a pass for lowering IO back to vector when possible	Jason Ekstrand	2019-03-12	5	-1/+392
\| \| \| \| \| \| \| \|	This pass tries to turn scalar and array-of-scalar IO variables into vector IO variables whenever possible. Reviewed-by: Connor Abbott <[email protected]> Cc: "19.0" <[email protected]>
*	nir: Add a stripping pass for improved cacheability	Connor Abbott	2019-03-12	4	-0/+111
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Oftentimes various nir shaders after lowering will be the same, or almost the same. For example, this can happen when the same shader is linked with different shaders to form different pipelines and cross-stage optimizations don't kick in to change it. We want to avoid running the backend twice on these shaders. We were already doing this with radeonsi, but we were storing a few extra pieces of information that made this much less effective compared to TGSI. The worse offender by far was the program name, which caused most of the cache misses. This pass strips out these pieces of information, controlled by the NIR_STRIP debug env variable. Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: silence a couple new compiler warnings	Brian Paul	2019-03-12	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	[33/630] Compiling C object 'src/compiler/nir/nir@sta/nir_loop_analyze.c.o'. ../src/compiler/nir/nir_loop_analyze.c: In function ‘try_find_trip_count_vars_in_iand’: ../src/compiler/nir/nir_loop_analyze.c:846:29: warning: suggest parentheses around ‘&&’ within ‘\|\|’ [-Wparentheses] if (ind == NULL \|\| ind && (ind)->type != basic_induction \|\| ^ [85/630] Compiling C object 'src/compiler/nir/nir@sta/nir_opt_loop_unroll.c.o'. ../src/compiler/nir/nir_opt_loop_unroll.c: In function ‘complex_unroll_single_terminator’: ../src/compiler/nir/nir_opt_loop_unroll.c:494:17: warning: unused variable ‘unroll_loc’ [-Wunused-variable] nir_cf_node unroll_loc = ^ Reviewed-by: Timothy Arceri <[email protected]>
*	nir: find induction/limit vars in iand instructions	Timothy Arceri	2019-03-12	1	-8/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This will be used to help find the trip count of loops that look like the following: while (a < x && i < 8) { ... i++; } Where the NIR will end up looking something like this: vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */) loop { ... vec1 1 ssa_12 = ilt ssa_225, ssa_11 vec1 1 ssa_17 = ilt ssa_226, ssa_1 vec1 1 ssa_18 = iand ssa_12, ssa_17 vec1 1 ssa_19 = inot ssa_18 if ssa_19 { ... break } else { ... } } On RADV this unrolls a bunch of loops in F1-2017 shaders. Totals from affected shaders: SGPRS: 4112 -> 4136 (0.58 %) VGPRS: 4132 -> 4052 (-1.94 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 515444 -> 587720 (14.02 %) bytes LDS: 2 -> 2 (0.00 %) blocks Max Waves: 194 -> 196 (1.03 %) Wait states: 0 -> 0 (0.00 %) It also unrolls a couple of loops in shader-db on radeonsi. Totals from affected shaders: SGPRS: 128 -> 128 (0.00 %) VGPRS: 64 -> 64 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 6880 -> 9504 (38.14 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 16 -> 16 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Ian Romanick <[email protected]>
*	nir: pass nir_op to calculate_iterations()	Timothy Arceri	2019-03-12	1	-7/+10
\| \| \| \| \| \| \| \|	Rather than getting this from the alu instruction this allows us some flexibility. In the following pass we instead pass the inverse op. Reviewed-by: Ian Romanick <[email protected]>
*	nir: add get_induction_and_limit_vars() helper to loop analysis	Timothy Arceri	2019-03-12	1	-15/+26
\| \| \| \| \| \| \|	This helps make find_trip_count() a little easier to follow but will also be used by a following patch. Reviewed-by: Ian Romanick <[email protected]>
*	nir: add helper to return inversion op of a comparison	Timothy Arceri	2019-03-12	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This will be used to help find the trip count of loops that look like the following: while (a < x && i < 8) { ... i++; } Where the NIR will end up looking something like this: vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */) loop { ... vec1 1 ssa_12 = ilt ssa_225, ssa_11 vec1 1 ssa_17 = ilt ssa_226, ssa_1 vec1 1 ssa_18 = iand ssa_12, ssa_17 vec1 1 ssa_19 = inot ssa_18 if ssa_19 { ... break } else { ... } } So in order to find the trip count we need to find the inverse of ilt. Reviewed-by: Ian Romanick <[email protected]>
*	nir: simplify the loop analysis trip count code a little	Timothy Arceri	2019-03-12	1	-81/+82
\| \| \| \| \| \| \| \| \| \|	Here we create a helper is_supported_terminator_condition() and use that rather than embedding all the trip count code inside a switch. The new helper will also be used in a following patch. Reviewed-by: Ian Romanick <[email protected]>
*	nir: unroll some loops with a variable limit	Timothy Arceri	2019-03-12	1	-0/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For some loops can have a single terminator but the exact trip count is still unknown. For example: for (int i = 0; i < imin(x, 4); i++) ... Shader-db results radeonsi (all affected are from Tropico 5): Totals from affected shaders: SGPRS: 144 -> 152 (5.56 %) VGPRS: 124 -> 108 (-12.90 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 5180 -> 6640 (28.19 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 17 -> 21 (23.53 %) Wait states: 0 -> 0 (0.00 %) Shader-db results i965 (SKL): total loops in shared programs: 3808 -> 3802 (-0.16%) loops in affected programs: 6 -> 0 helped: 6 HURT: 0 vkpipeline-db results RADV (Unrolls some Skyrim VR shaders): Totals from affected shaders: SGPRS: 304 -> 304 (0.00 %) VGPRS: 296 -> 292 (-1.35 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 15756 -> 25884 (64.28 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 29 -> 29 (0.00 %) Wait states: 0 -> 0 (0.00 %) v2: fix bug where last iteration would get optimised away by mistake. Reviewed-by: Ian Romanick <[email protected]>
*	nir: calculate trip count for more loops	Timothy Arceri	2019-03-12	3	-6/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support to loop analysis for loops where the induction variable is compared to the result of min(variable, constant). For example: for (int i = 0; i < imin(x, 4); i++) ... We add a new bool to the loop terminator struct in order to differentiate terminators with this exit condition. Reviewed-by: Ian Romanick <[email protected]>
*	nir: add partial loop unrolling support	Timothy Arceri	2019-03-12	1	-8/+199
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds partial loop unrolling support and makes use of a guessed trip count based on array access. The code is written so that we could use partial unrolling more generally, but for now it's only use when we have guessed the trip count. We use partial unrolling for this guessed trip count because its possible any out of bounds array access doesn't otherwise affect the shader e.g the stores/loads to/from the array are unused. So we insert a copy of the loop in the innermost continue branch of the unrolled loop. Later on its possible for nir_opt_dead_cf() to then remove the loop in some cases. A Renderdoc capture from the Rise of the Tomb Raider benchmark, reports the following change in an affected compute shader: GPU duration: 350 -> 325 microseconds shader-db results radeonsi VEGA (NIR backend): SGPRS: 1008 -> 816 (-19.05 %) VGPRS: 684 -> 432 (-36.84 %) Spilled SGPRs: 539 -> 0 (-100.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 39708 -> 45812 (15.37 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 105 -> 144 (37.14 %) Wait states: 0 -> 0 (0.00 %) shader-db results i965 SKL: total instructions in shared programs: 13098265 -> 13103359 (0.04%) instructions in affected programs: 5126 -> 10220 (99.38%) helped: 0 HURT: 21 total cycles in shared programs: 332039949 -> 331985622 (-0.02%) cycles in affected programs: 289252 -> 234925 (-18.78%) helped: 12 HURT: 9 vkpipeline-db results VEGA: Totals from affected shaders: SGPRS: 184 -> 184 (0.00 %) VGPRS: 448 -> 448 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 26076 -> 24428 (-6.32 %) bytes LDS: 6 -> 6 (0.00 %) blocks Max Waves: 5 -> 5 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Ian Romanick <[email protected]>
*	nir: add new partially_unrolled bool to nir_loop	Timothy Arceri	2019-03-12	2	-0/+2
\| \| \| \| \| \| \| \| \| \|	In order to stop continuously partially unrolling the same loop we add the bool partially_unrolled to nir_loop, we add it here rather than in nir_loop_info because nir_loop_info is only set via loop analysis and is intended to be cleared before each analysis. Also nir_loop_info is never cloned. Reviewed-by: Ian Romanick <[email protected]>
*	nir: add guess trip count support to loop analysis	Timothy Arceri	2019-03-12	2	-6/+86
\| \| \| \| \| \| \| \| \| \| \| \|	This detects an induction variable used as an array index to guess the trip count of the loop. This enables us to do a partial unroll of the loop, which can eventually result in the loop being eliminated. v2: check if the induction var is used to index more than a single array and if so get the size of the smallest array. Reviewed-by: Ian Romanick <[email protected]>
*	nir/spirv: Fix assert when unsampled OpTypeImage has unknown 'Depth'	Xavier Bouchoux	2019-03-11	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	'dxc' hlsl-to-spirv compiler appears to emit 2 (Unknown) in the depth field, when the image is not sampled and the value is not needed. Previously, shaders failed with: SPIR-V parsing FAILED: In file ../src/compiler/spirv/spirv_to_nir.c:1412 !is_shadow 632 bytes into the SPIR-V binary Reviewed-by: Jason Ekstrand <[email protected]>
*	nir/serialize: Prevent writing uninitialized state_slot data	Connor Abbott	2019-03-11	1	-5/+14
\| \| \| \| \| \| \| \|	The nir_state_slot struct had some padding that was never initialized. Serializing the individual parts of the struct is more robust and avoids the overhead of zeroing it at creation, so just do that. Reviewed-by: Jason Ekstrand <[email protected]>
*	Revert MR 369 (Fix extract_i8 and extract_u8 for 64-bit integers)	Kenneth Graunke	2019-03-09	1	-24/+10
\| \| \| \| \| \| \|	This broke piles of image load store tests (179 failures on CI, mesa_master build #15546, previous build right before this landed was green). I'd rather not leave the tree on fire over the weekend, so let's revert for now, and we can figure out what happened next week.
*	nir/algebraic: Add missing 16-bit extract_[iu]8 patterns	Ian Romanick	2019-03-08	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	No shader-db changes on any Intel platform. v2: Use a loop to generate patterns. Suggested by Jason. Reviewed-by: Matt Turner <[email protected]> [v1] Reviewed-by: Dylan Baker <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
*	nir/algebraic: Add missing 64-bit extract_[iu]8 patterns	Ian Romanick	2019-03-08	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	No shader-db changes on any Intel platform. v2: Use a loop to generate patterns. Suggested by Jason. Reviewed-by: Matt Turner <[email protected]> [v1] Reviewed-by: Dylan Baker <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
*	nir/algebraic: Remove redundant extract_[iu]8 patterns	Ian Romanick	2019-03-08	1	-14/+4
\| \| \| \| \| \| \| \|	No shader-db changes on any Intel platform. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Dylan Baker <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
*	nir/algebraic: Fix up extract_[iu]8 after loop unrolling	Ian Romanick	2019-03-08	1	-2/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Skylake, Broadwell, and Haswell had similar results. (Skylake shown) total instructions in shared programs: 15256840 -> 15256837 (<.01%) instructions in affected programs: 4713 -> 4710 (-0.06%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.06% max: 0.08% x̄: 0.06% x̃: 0.06% total cycles in shared programs: 372286583 -> 372286583 (0.00%) cycles in affected programs: 198516 -> 198516 (0.00%) helped: 1 HURT: 1 helped stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01% HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 HURT stats (rel) min: 0.01% max: 0.01% x̄: 0.01% x̃: 0.01% No changes on any other Intel platform. v2: Use a loop to generate patterns. Suggested by Jason. Reviewed-by: Matt Turner <[email protected]> [v1] Reviewed-by: Dylan Baker <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
*	nir/linker: fix ARRAY_SIZE query with xfb varyings	Alejandro Piñeiro	2019-03-08	1	-1/+2
\| \| \| \| \| \|	For a non-array varying, it is expecting ARRAY_SIZE as 1, instead of 0. Reviewed-by: Timothy Arceri <[email protected]>
*	nir/linker: Fix TRANSFORM_FEEDBACK_BUFFER_INDEX	Antia Puentes	2019-03-08	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From the ARB_enhanced_layouts specification: "For the property TRANSFORM_FEEDBACK_BUFFER_INDEX, a single integer identifying the index of the active transform feedback buffer associated with an active variable is written to <params>. For variables corresponding to the special names "gl_NextBuffer", "gl_SkipComponents1", "gl_SkipComponents2", "gl_SkipComponents3", and "gl_SkipComponents4", -1 is written to <params>." We were storing the xfb_buffer value, instead of the value corresponding to GL_TRANSFORM_FEEDBACK_BUFFER_INDEX. Note that the implementation assumes that varyings would be sorted by offset and buffer. Signed-off-by: Antia Puentes <[email protected]> Signed-off-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
*	nir/linker: use nir_gather_xfb_info	Alejandro Piñeiro	2019-03-08	1	-186/+54
\| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of a custom ARB_gl_spirv xfb gather info pass. In fact, this is not only about reusing code, but the current custom code was not handling properly how many varyings are enumerated from some complex types. So this change is also about fixing some corner cases. v2: Use util_bitcount, simplify current stage check (Kenneth) Reviewed-by: Timothy Arceri <[email protected]>
*	nir/xfb: handle arrays and AoA of basic types	Alejandro Piñeiro	2019-03-08	1	-10/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On OpenGL, a array of a simple type adds just one varying. So gl_transform_feedback_varying_info struct defined at mtypes.h includes the parameters Type (base_type) and Size (number of elements). This commit checks this when the recursive add_var_xfb_outputs call handles arrays, to ensure that just one is addded. We also need to take into account AoA here v2: use glsl_type_is_leaf from nir_types (Timothy Arceri) v3: simplified aoa check, without the need ot using glsl_type_is_leaf, using glsl_types_is_struct (Timothy Arceri) Reviewed-by: Timothy Arceri <[email protected]>
*	nir_types: add glsl_type_is_struct helper	Alejandro Piñeiro	2019-03-08	2	-0/+7
\| \| \| \|	Reviewed-by: Timothy Arceri <[email protected]>
*	nir/xfb: sort varyings too	Alejandro Piñeiro	2019-03-08	1	-2/+17
\| \| \| \| \| \| \| \| \| \|	Right now we are only re-sorting outputs. But it is better to sort too varyings, as linker expect them to be sorted out (as it was done on GLSL). For varyings, and to make easier to compute buffer_index, we sort also by buffer. We could do the same for outputs, but we lack a reason for that, so we left it as it is (just offset). Reviewed-by: Timothy Arceri <[email protected]>
*	nir/xfb: adding varyings on nir_xfb_info and gather_info	Alejandro Piñeiro	2019-03-08	2	-7/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to be used for OpenGL (right now for ARB_gl_spirv). This commit adds two new structures: * nir_xfb_varying_info: that identifies each individual varying. For each one, we need to know the type, buffer and xfb_offset * nir_xfb_buffer_info: as now for each buffer, in addition to the stride, we need to know how many varyings are assigned to it. For this patch, the only case where num_outputs != num_varyings is with the case of doubles, that for dvec3/4 could require more than one output. There are more cases though (like aoa), that will be handled on following patches. v2: updated after new nir general XFB support introduced for "anv: Add support for VK_EXT_transform_feedback" v3: compute num_varyings beforehand for allocating, instead of relying on num_outputs as approximate value (Timothy Arceri) Reviewed-by: Timothy Arceri <[email protected]>
*	nir_types: add glsl_varying_count helper	Alejandro Piñeiro	2019-03-08	2	-0/+7
\| \| \| \|	Reviewed-by: Timothy Arceri <[email protected]>
*	nir/xfb: add component_offset at nir_xfb_info	Alejandro Piñeiro	2019-03-08	2	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Where component_offset here is the offset when accessing components of a packed variable. Or in other words, location_frac on nir.h. Different places of mesa use different names for it. Technically nir_xfb_info consumer can get the same from the component_mask, it seems somewhat forced to make it to compute it, instead of providing it. v2: rename local location_frac for comp_offset, more similar to the intended use (Timothy Arceri) Reviewed-by: Timothy Arceri <[email protected]>
*	nir/builder: Add a build_deref_array_imm helper	Jason Ekstrand	2019-03-07	7	-17/+25
\| \| \| \| \| \| \| \|	Unlike most of the cases in which we do this by hand, the new helper properly handles non-32-bit pointers. Reviewed-by: Karol Herbst <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	nir/builder: Cast array indices in build_deref_follower	Jason Ekstrand	2019-03-07	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \|	There's no guarantee when build_deref_follower is called that the two derefs have the same bit size destination. Insert a cast on the array index in case we have differing bit sizes. While we're here, insert some asserts in build_deref_array and build_deref_ptr_as_array. The validator will catch violations here but they're easier to debug if we catch them while building. Reviewed-by: Karol Herbst <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	nir/builder: Emit better code for iadd/imul_imm	Jason Ekstrand	2019-03-07	2	-5/+24
\| \| \| \| \| \| \| \| \|	Because we already know the immediate right-hand parameter, we can potentially save the optimizer a bit of work. Reviewed-by: Karol Herbst <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	nir: free dead_ctx in case of no progress	Tapani Pälli	2019-03-07	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes a leak: ==7576== 320 (48 direct, 272 indirect) bytes in 1 blocks are definitely lost in loss record 26 of 26 ==7576== at 0x4C2EE3B: malloc (vg_replace_malloc.c:309) ==7576== by 0x53EF0E4: ralloc_size (ralloc.c:119) ==7576== by 0x53EF0C2: ralloc_context (ralloc.c:113) ==7576== by 0x5471F64: nir_split_per_member_structs (nir_split_per_member_structs.c:176) ==7576== by 0x51288CF: anv_shader_compile_to_nir (anv_pipeline.c:216) Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
*	glsl: use NIR function inlining for drivers that use glsl_to_nir()	Timothy Arceri	2019-03-06	2	-2/+83
\| \| \| \| \| \| \| \|	glsl_to_nir() is still missing support for converting certain functions to NIR, so for those we use the GLSL IR optimisations to remove the functions. Reviewed-by: Eric Anholt <[email protected]>
*	glsl/freedreno/panfrost: pass gl_context to the standalone compiler	Timothy Arceri	2019-03-06	3	-5/+7
\| \| \| \| \| \| \|	This allows us to use the ctx with glsl_to_nir() in a following patch. Reviewed-by: Eric Anholt <[email protected]>
*	nir/lower_doubles: Inline functions directly in lower_doubles	Jason Ekstrand	2019-03-06	2	-17/+35
\| \| \| \| \| \| \| \| \| \| \| \|	Instead of trusting the caller to already have created a softfp64 function shader and added all its functions to our shader, we simply take the softfp64 shader as an argument and do the function inlining ouselves. This means that there's no more nasty functions lying around that the caller needs to worry about cleaning up. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	nir/deref: Expose nir_opt_deref_impl	Jason Ekstrand	2019-03-06	2	-1/+2
\| \| \| \| \| \|	Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	nir/inline_functions: Break inlining into a builder helper	Jason Ekstrand	2019-03-06	3	-40/+60
\| \| \| \| \| \| \| \| \| \| \| \| \|	This pulls the guts of function inlining into a builder helper so that it can be used elsewhere. The rest of the infrastructure is still needed for most inlining cases to ensure that everything gets inlined and only ever once. However, there are use-cases where you just want to inline one little thing. This new helper also has a neat trick where it can seamlessly inline a function from one nir_shader into another. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	glsl/nir: Inline functions in float64_funcs_to_nir	Jason Ekstrand	2019-03-06	1	-0/+5
\| \| \| \| \| \| \| \| \| \|	This doesn't really change anything as the functions will all get inlined anyway. However it does let us do a bit of the work earlier and in a common place. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	glsl/nir: Add a shared helper for building float64 shaders	Jason Ekstrand	2019-03-06	4	-0/+65
\| \| \| \| \| \|	Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	nir: Teach loop unrolling about 64-bit instruction lowering	Jason Ekstrand	2019-03-06	3	-13/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The lowering we do for 64-bit instructions can cause a single NIR ALU instruction to blow up into hundreds or thousands of instructions potentially with control flow. If loop unrolling isn't aware of this, it can unroll a loop 20 times which contains a nir_op_fsqrt which we then lower to a full software implementation based on integer math. Those 20 invocations suddenly get a lot more expensive than NIR loop unrolling currently expects. By giving it an approximate estimate function, we can prevent loop unrolling from going to town when it shouldn't. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	nir: Expose double and int64 op_to_options_mask helpers	Jason Ekstrand	2019-03-06	3	-51/+23
\| \| \| \| \| \| \| \| \|	We already have one internally for int64 but we don't have a similar one for doubles so we'll have to make one. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	compiler/nir: add an is_conversion field to nir_op_info	Iago Toral Quiroga	2019-03-06	3	-33/+47
\| \| \| \| \| \| \| \| \|	This is set to True only for numeric conversion opcodes. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	nir: rename glsl_type_is_struct() -> glsl_type_is_struct_or_ifc()	Timothy Arceri	2019-03-06	21	-39/+39
\| \| \| \| \| \| \| \| \| \|	Replace done using: find ./src -type f -exec sed -i -- \ 's/glsl_type_is_struct(/glsl_type_is_struct_or_ifc(/g' {} \; Acked-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
*	glsl: rename record_types -> struct_types	Timothy Arceri	2019-03-06	2	-10/+10
\| \| \| \| \| \|	Acked-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Kenneth Graunke <[email protected]>