mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	nir: Make nir_lower_io_to_temporaries store an impl internally.	Kenneth Graunke	2016-08-25	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \|	This changes the pass internals to work with a nir_function_impl directly rather than a nir_function. The next patch will change the API. v2: Rebase after framebuffer fetch landed. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
*	nir: Handle FB fetch outputs correctly in nir_lower_io_to_temporaries.	Francisco Jerez	2016-08-25	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \|	This requires emitting a series of copies at the top of the program from each output variable to the corresponding temporary. The initial copy can be skipped for non-framebuffer fetch outputs whose initial value is undefined, and the final copy needs to be skipped for read-only outputs (i.e. gl_LastFragData), since it would be illegal to emit a store output intrinsic for it. Reviewed-by: Kenneth Graunke <[email protected]>
*	nir: Pass through fb_fetch_output and OutputsRead from GLSL IR.	Francisco Jerez	2016-08-25	1	-0/+9
\| \| \| \| \| \| \| \| \|	The NIR representation of framebuffer fetch is the same as the GLSL IR's until interface variables are lowered away, at which point it will be translated to load output intrinsics. The GLSL-to-NIR pass just needs to copy the bits over to the NIR program. Reviewed-by: Kenneth Graunke <[email protected]>
*	nir/phi_builder: Don't recurse in value_get_block_def	Jason Ekstrand	2016-08-25	1	-29/+36
\| \| \| \| \| \| \| \| \| \| \| \| \|	In some programs, we can have very deep dominance trees and the recursion can cause us to risk stack overflows. Instead, we replace the recursion with a pair of loops, one at the start and one at the end. This is functionally equivalent to what we had before and it's actually a bit easier to read in the new form without the recursion. Signed-off-by: Jason Ekstrand <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97225 Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	nir: Walk blocks in source code order in lower_vars_to_ssa.	Matt Turner	2016-08-25	2	-106/+106
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prior to this commit rename_variables_block() is recursively called, performing a depth-first traversal of the control flow graph. The function uses a non-trivial amount of stack space for local variables, which puts us in danger of smashing the stack, given a sufficiently deep dominance tree. XCOM: Enemy Within contains a shader with such a dominance tree (1574 nir_blocks in total, depth of at least 143). Jason tells me that he believes that any walk over the nir_blocks that respects dominance is sufficient (a DFS might have been necessary prior to the introduction of nir_phi_builder). In fact, the introduction of nir_phi_builder made the problem worse: rename_variables_block(), walks to the bottom of the dominance tree before calling nir_phi_builder_value_get_block_def() which walks back to the top of the dominance tree... In any case, this patch ensures we avoid that problem as well. Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97225 Reviewed-by: Connor Abbott <[email protected]>
*	nir: avoid segfault when ssa src not found	Timothy Arceri	2016-08-23	1	-0/+3
\| \| \| \| \| \| \|	Without this the following line will segfault and we don't get to see the results of the validate_assert() above. Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Fix crash in nir_lower_drawpixels.	Eric Anholt	2016-08-22	1	-0/+2
\| \| \| \| \| \| \| \|	Generally you'd see the gl_Color reference first and get some cursor set. However, in piglit draw-pixel-with-texture we're now seeing the TexCoord dereferenced first. Reviewed-by: Rob Clark <[email protected]>
*	nir: Fix a comment typo in nir_lower_drawpixels.	Eric Anholt	2016-08-22	1	-1/+1
\| \| \| \|	Reviewed-by: Rob Clark <[email protected]>
*	nir: Define system values for vc4's blending-lowering arguments.	Eric Anholt	2016-08-22	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	In the GLSL-to-NIR conversion of VC4, I had a bit of trouble with what I was calling the "state uniforms" that I was putting into the NIR fighting with its other lowering passes. Instead of using magic uniform base numbers in the backend, follow the lead of load_user_clip_plane and just define system values for them. v2: Fix unintended change to channel_num, drop unspecified const_index value on blend_const_color_r_float. Reviewed-by: Kenneth Graunke <[email protected]>
*	nir: Add an IO scalarizing pass using the intrinsic's first_component.	Eric Anholt	2016-08-19	2	-0/+130
\| \| \| \| \| \| \| \| \| \|	vc4 wants to have per-scalar IO load/stores so that dead code elimination can happen on a more granular basis, which it has been doing in the backend using a multiplication by 4 of the intrinsic's driver_location. We can represent it properly in the NIR using the first_component field, though. Reviewed-by: Kenneth Graunke <[email protected]>
*	nir: Add nir_builder support for individual system value loads.	Eric Anholt	2016-08-19	4	-15/+31
\| \| \| \| \| \| \| \| \| \|	The previous nir_load_system_value(b, nir_intrinsic_load_whatever), 0) was rather verbose, when system values should be easy to generate. The index is left out because only one system value had an index included in it. Reviewed-by: Kenneth Graunke <[email protected]>
*	nir: Move the undef of nir_intrinsics.h macros to the .h.	Eric Anholt	2016-08-19	2	-3/+3
\| \| \| \| \| \| \|	I wanted to include this from nir_builder as well, so it also needed the undefs. Reviewed-by: Kenneth Graunke <[email protected]>
*	nir: Use the system-value front face for twoside lowering.	Eric Anholt	2016-08-19	1	-16/+7
\| \| \| \| \| \| \| \|	GLSL-to-NIR generates system value usage, and vc4/freedreno would both like the system value instead of the varying, so switch this pass over to it. Reviewed-by: Kenneth Graunke <[email protected]>
*	nir: Rely on the fact that bcsel takes a well formed boolean.	Kenneth Graunke	2016-08-19	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to Connor, it's safe to assume that the first operand of bcsel, as well as the operand of b2f and b2i, must be well formed booleans. https://lists.freedesktop.org/archives/mesa-dev/2016-August/125658.html With the previous improvements to a@bool handling, this now has no change in shader-db instruction counts on Broadwell. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	nir/search: Extend 'a@bool' to handle a couple of system values.	Kenneth Graunke	2016-08-18	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	load_front_face and load_helper_invocation produce booleans. On Broadwell: total instructions in shared programs: 11638956 -> 11638011 (-0.01%) instructions in affected programs: 115093 -> 114148 (-0.82%) helped: 628 HURT: 14 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir/search: Fold src_is_bool()/alu_instr_is_bool() into src_is_type().	Kenneth Graunke	2016-08-18	1	-31/+19
\| \| \| \| \| \| \| \| \| \| \| \|	I don't want src_is_bool() and src_is_type(x, nir_type_bool) to behave differently. Having the logic spread out over three functions makes it harder to decide where to put new logic, as well. So, combine them all. It's a bit simpler because there's now only one recursive function rather than a pair of mutually recursive functions. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir/search: Introduce a src_is_type() helper for 'a@type' handling.	Kenneth Graunke	2016-08-18	1	-13/+29
\| \| \| \| \| \| \| \| \| \| \| \|	Currently, 'a@type' can only match if 'a' is produced by an ALU instruction. This is rather limited - there are other cases we can easily detect which we should handle. Extending the code in-place would be fairly messy, so we introduce a new src_is_type() helper. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir/builder: Add bany_inequal and bany helpers.	Kenneth Graunke	2016-08-18	1	-0/+19
\| \| \| \| \| \| \| \| \| \|	The first simply picks the bany_inequal[234] opcodes based on the SSA def's number of components. The latter implicitly compares with zero to achieve the same semantics of GLSL's any(). Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
*	nir/algebraic: Optimize common array indexing sequence	Ian Romanick	2016-08-17	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some shaders include code that looks like: uniform int i; uniform vec4 bones[...]; foo(bones[i * 3], bones[i * 3 + 1], bones[i * 3 + 2]); CSE would do some work on this: x = i * 3 foo(bones[x], bones[x + 1], bones[x + 2]); The compiler may then add '<< 4 + base' to the index calculations. This results in expressions like x = i * 3 foo(bones[x << 4], bones[(x + 1) << 4], bones[(x + 2) << 4]); Just rearranging the math to produce (i * 48) + 16 saves an instruction, and it allows CSE to do more work. x = i * 48; foo(bones[x], bones[x + 16], bones[x + 32]); So, ~6 instructions becomes ~3. Some individual shader-db results look pretty bad. However, I have a really, really hard time believing the change in estimated cycles in, for example, 3dmmes-taiji/51.shader_test after looking that change in the generated code. G45 total instructions in shared programs: 4020840 -> 4010070 (-0.27%) instructions in affected programs: 177460 -> 166690 (-6.07%) helped: 894 HURT: 0 total cycles in shared programs: 98829000 -> 98784990 (-0.04%) cycles in affected programs: 3936648 -> 3892638 (-1.12%) helped: 894 HURT: 0 Ironlake total instructions in shared programs: 6418887 -> 6408117 (-0.17%) instructions in affected programs: 177460 -> 166690 (-6.07%) helped: 894 HURT: 0 total cycles in shared programs: 143504542 -> 143460532 (-0.03%) cycles in affected programs: 3936648 -> 3892638 (-1.12%) helped: 894 HURT: 0 Sandy Bridge total instructions in shared programs: 8357887 -> 8339251 (-0.22%) instructions in affected programs: 432715 -> 414079 (-4.31%) helped: 2795 HURT: 0 total cycles in shared programs: 118284184 -> 118207412 (-0.06%) cycles in affected programs: 6114626 -> 6037854 (-1.26%) helped: 2478 HURT: 317 Ivy Bridge total instructions in shared programs: 7669390 -> 7653822 (-0.20%) instructions in affected programs: 388234 -> 372666 (-4.01%) helped: 2795 HURT: 0 total cycles in shared programs: 68381982 -> 68263684 (-0.17%) cycles in affected programs: 1972658 -> 1854360 (-6.00%) helped: 2458 HURT: 307 Haswell total instructions in shared programs: 7082636 -> 7067068 (-0.22%) instructions in affected programs: 388234 -> 372666 (-4.01%) helped: 2795 HURT: 0 total cycles in shared programs: 68282020 -> 68164158 (-0.17%) cycles in affected programs: 1891820 -> 1773958 (-6.23%) helped: 2459 HURT: 261 Broadwell total instructions in shared programs: 9002466 -> 8985875 (-0.18%) instructions in affected programs: 658784 -> 642193 (-2.52%) helped: 2795 HURT: 5 total cycles in shared programs: 78503092 -> 78450404 (-0.07%) cycles in affected programs: 2873304 -> 2820616 (-1.83%) helped: 2275 HURT: 415 Skylake total instructions in shared programs: 9156978 -> 9140387 (-0.18%) instructions in affected programs: 682625 -> 666034 (-2.43%) helped: 2795 HURT: 5 total cycles in shared programs: 75591392 -> 75550574 (-0.05%) cycles in affected programs: 3192120 -> 3151302 (-1.28%) helped: 2271 HURT: 425 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Drop an unused program/hash_table.h include.	Eric Anholt	2016-08-10	1	-1/+0
\| \| \| \| \|	Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
*	nir: make use of nir_cf_list_extract() helper	Timothy Arceri	2016-08-09	1	-2/+1
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	nir: Always print non-identity swizzles.	Matt Turner	2016-08-08	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we would not print a swizzle on ssa_52 when only its .x component is used (as seen in the definition of ssa_53): vec3 ssa_52 = fadd ssa_51, ssa_51 vec1 ssa_53 = flog2 ssa_52 vec1 ssa_54 = flog2 ssa_52.y vec1 ssa_55 = flog2 ssa_52.z But this makes the interpretation of the RHS of the definition difficult to understand and dependent on the size of the LHS. Just print swizzles when they are not the identity swizzle, so the previous example is now printed as: vec3 ssa_52 = fadd ssa_51.xyz, ssa_51.xyz vec1 ssa_53 = flog2 ssa_52.x vec1 ssa_54 = flog2 ssa_52.y vec1 ssa_55 = flog2 ssa_52.z Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Make nir_opt_remove_phis see through moves.	Kenneth Graunke	2016-08-04	1	-1/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I found a shader in Tales of Maj'Eyal that contains: if ssa_21 { block block_1: /* preds: block_0 / ...instructions that prevent the select peephole... vec1 32 ssa_23 = imov ssa_4 vec1 32 ssa_24 = imov ssa_4.y vec1 32 ssa_25 = imov ssa_4.z / succs: block_3 / } else { block block_2: / preds: block_0 / vec1 32 ssa_26 = imov ssa_4 vec1 32 ssa_27 = imov ssa_4.y vec1 32 ssa_28 = imov ssa_4.z / succs: block_3 / } block block_3: / preds: block_1 block_2 */ vec1 32 ssa_29 = phi block_1: ssa_23, block_2: ssa_26 vec1 32 ssa_30 = phi block_1: ssa_24, block_2: ssa_27 vec1 32 ssa_31 = phi block_1: ssa_25, block_2: ssa_28 Here, copy propagation will bail because phis cannot perform swizzles, and CSE won't do anything because there is no dominance relationship between the imovs. By making nir_opt_remove_phis handle identical moves, we can eliminate the phis and rewrite everything to use ssa_4 directly, so all the moves become dead and get eliminated. I don't think we need to check "exact" - just the alu sources. Presumably phi sources should match in their exactness. On Broadwell: total instructions in shared programs: 11639872 -> 11638535 (-0.01%) instructions in affected programs: 134222 -> 132885 (-1.00%) helped: 338 HURT: 0 v2: Fix return value to be NULL, not false (caught by Iago). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
*	nir: Make nir_alu_srcs_equal non-static.	Kenneth Graunke	2016-08-04	2	-1/+4
\| \| \| \| \|	Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
*	nir: Turn imov/fmov of undef into undef.	Kenneth Graunke	2016-08-04	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On Broadwell: total instructions in shared programs: 11640214 -> 11639872 (-0.00%) instructions in affected programs: 17744 -> 17402 (-1.93%) helped: 78 HURT: 0 total spills in shared programs: 2924 -> 2922 (-0.07%) spills in affected programs: 104 -> 102 (-1.92%) helped: 1 HURT: 0 total fills in shared programs: 4394 -> 4389 (-0.11%) fills in affected programs: 237 -> 232 (-2.11%) helped: 1 HURT: 0 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
*	nir: Allow opt_peephole_select to work on empty blocks.	Eric Anholt	2016-08-03	1	-7/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	nir_opt_peephole_select has the job of removing IF statements with no side effects. However, if the IF statement's successor didn't have any instructions in it, we were skipping it, which occurred in mupen64 on vc4 with glsl_to_nir enabled: instructions in affected programs: 6134 -> 4120 (-32.83%) total uniforms in shared programs: 38268 -> 38219 (-0.13%) No changes on Haswell shader-db. Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: fix validation message	Timothy Arceri	2016-08-03	1	-2/+2
\| \| \| \| \| \| \|	Looks like a copy and paste error from f752effa087 Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
*	glsl: Separate overlapping sentinel nodes in exec_list.	Matt Turner	2016-07-26	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	I do appreciate the cleverness, but unfortunately it prevents a lot more cleverness in the form of additional compiler optimizations brought on by -fstrict-aliasing. No difference in OglBatch7 (n=20). Co-authored-by: Davin McCall <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	nir: Lower interp_var_at_* like a normal load_var for flat inputs.	Kenneth Graunke	2016-07-22	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	"flat centroid" and "flat sample" both just mean "flat", so we should ignore interpolateAtCentroid/Sample and just return the flat value. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97032 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
*	nir/lower_tex: Add support for lowering coordinate offsets	Jason Ekstrand	2016-07-22	2	-0/+64
\| \| \| \| \| \| \| \| \| \|	On i965, we can't support coordinate offsets for texelFetch or rectangle textures. Previously, we were doing this with a GLSL pass but we need to do it in NIR if we want those workarounds for SPIR-V. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
*	nir/lower_tex: Add some helpers for working with tex sources	Jason Ekstrand	2016-07-22	1	-16/+30
\| \| \| \| \| \|	Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
*	nir: Add a helper for determining the type of a texture source	Jason Ekstrand	2016-07-22	1	-0/+44
\| \| \| \| \| \|	Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
*	nir: Add a base const_index to shared atomic intrinsics.	Kenneth Graunke	2016-07-21	1	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 52e75dcb8c04c0dde989970c4c587cbe8313f7cf made nir_lower_io start using nir_intrinsic_set_base instead of writing const_index[0] directly. However, those intrinsics apparently don't /have/ a base, so this caused assert failures. However, the old code was happily setting non-existent const_index fields, so it was pretty bogus too. Jason pointed out that load_shared and store_shared have a base, and that the i965 driver uses that field. So presumably atomics should have one as well, so that loads/stores/atomics all refer to variables with consistent addressing. Cc: "12.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
*	nir: add doubles component packing support	Timothy Arceri	2016-07-21	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \|	This makes sure we give the correct driver location for doubles when using component packing. Specifically it handles packing a dvec3 with a double which is the only packing scenario allowed which spans across two locations. Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
*	nir/inline: Constant-initialize local variables in the callee if needed	Jason Ekstrand	2016-07-20	1	-2/+40
\| \| \| \| \| \|	Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
*	nir: Add a nir_deref_foreach_leaf helper	Jason Ekstrand	2016-07-20	2	-0/+120
\| \| \| \| \| \|	Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
*	nir: Add nir_load_interpolated_input lowering code.	Kenneth Graunke	2016-07-20	2	-5/+100
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now nir_lower_io can optionally produce load_interpolated_input and load_barycentric_* intrinsics for fragment shader inputs. flat inputs continue using regular load_input. v2: Use a nir_shader_compiler_options flag rather than ad-hoc boolean passing (in response to review feedback from Chris Forbes). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Add new intrinsics for fragment shader input interpolation.	Kenneth Graunke	2016-07-20	5	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backends can normally handle shader inputs solely by looking at load_input intrinsics, and ignore the nir_variables in nir->inputs. One exception is fragment shader inputs. load_input doesn't capture the necessary interpolation information - flat, smooth, noperspective mode, and centroid, sample, or pixel for the location. This means that backends have to interpolate based on the nir_variables, then associate those with the load_input intrinsics (say, by storing a map of which variables are at which locations). With GL_ARB_enhanced_layouts, we're going to have multiple varyings packed into a single vec4 location. The intrinsics make this easy: simply load N components from location <loc, component>. However, working with variables and correlating the two is very awkward; we'd much rather have intrinsics capture all the necessary information. Fragment shader input interpolation typically works by producing a set of barycentric coordinates, then using those to do a linear interpolation between the values at the triangle's corners. We represent this by introducing five new load_barycentric_* intrinsics: - load_barycentric_pixel (ordinary variable) - load_barycentric_centroid (centroid qualified variable) - load_barycentric_sample (sample qualified variable) - load_barycentric_at_sample (ARB_gpu_shader5's interpolateAtSample()) - load_barycentric_at_offset (ARB_gpu_shader5's interpolateAtOffset()) Each of these take the interpolation mode (smooth or noperspective only) as a const_index, and produce a vec2. The last two also take a sample or offset source. We then introduce a new load_interpolated_input intrinsic, which is like a normal load_input intrinsic, but with an additional barycentric coordinate source. The intention is that flat inputs will still use regular load_input intrinsics. This makes them distinguishable from normal inputs that need fancy interpolation, while also providing all the necessary data. This nicely unifies regular inputs and interpolateAt functions. Qualifiers and variables become irrelevant; there are just load_barycentric intrinsics that determine the interpolation. v2: Document the interp_mode const_index value, define a new BARYCENTRIC() helper rather than using SYSTEM_VALUE() for some of them (requested by Jason Ekstrand). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Fix uninitialized use of 'replacement'.	Kenneth Graunke	2016-07-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	For intrinsics we don't care about, just skip to the next loop iteration and process the next instruction. We don't want to execute the rest of the code. This was a bug in commit cdfc05ea6e8c87876cdbf588aa8e03d70f3da4bb. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
*	nir/algebraic: Optimize fabs(u2f(x))	Ian Romanick	2016-07-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	I noticed this when I tried to do frexp(float(some_unsigned)) in the ir_unop_find_lsb lowering pass. The code generated for frexp() uses fabs, and this resulted in an extra instruction. Ultimately I ended up not using frexp. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	compiler: Rename INTERP_QUALIFIER_* to INTERP_MODE_*.	Kenneth Graunke	2016-07-17	4	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Likewise, rename the enum type to glsl_interp_mode. Beyond the GLSL front-end, talking about "interpolation modes" seems more natural than "interpolation qualifiers" - in the IR, we're removed from how exactly the source language specifies how to interpolate an input. Also, SPIR-V calls these "decorations" rather than "qualifiers". Generated by: $ find . -regextype egrep -regex '.*\.(c\|cpp\|h)' -type f -exec sed -i \ -e 's/INTERP_QUALIFIER_/INTERP_MODE_/g' \ -e 's/glsl_interp_qualifier/glsl_interp_mode/g' {} \; Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Dave Airlie <[email protected]>
*	nir: Use dest.ssa.num_components rather than intrin->num_components.	Kenneth Graunke	2016-07-15	1	-1/+1
\| \| \| \| \| \| \| \| \|	I recently refactored this to share code between load and atomic lowering. loads used intrin->num_components, while atomics used intrin->dest.ssa.num_components. They should be equivalent, but Jason wanted me to use the latter. I missed applying his review. Signed-off-by: Kenneth Graunke <[email protected]>
*	nir: Update outdated intrinsic const_index comments.	Kenneth Graunke	2016-07-15	1	-7/+9
\| \| \| \| \|	Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Use nir_intrinsic_set_base in atomic lowering.	Kenneth Graunke	2016-07-15	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	This is more readable and also offers assertions that protect against setting const_index fields on the wrong kind of intrinsic. Suggested by Jason. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Split nir_lower_io's input/output/atomic handling into helpers.	Kenneth Graunke	2016-07-15	1	-91/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original function was becoming a bit hard to read, with the details of creating and filling out load/store/atomic atomics all in one function. This patch makes helpers for creating each type of intrinsic, and also combines them with the *_op() helpers, as they're closely coupled and not too large. v2: Minor style nits from Jason. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Drop bogus nir_var_shader_in case in nir_lower_io's store_op().	Kenneth Graunke	2016-07-15	1	-1/+0
\| \| \| \| \| \| \|	This can't happen, the caller asserts that mode is shader_out or shared. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Share destination rewriting and replacement code in IO lowering.	Kenneth Graunke	2016-07-15	1	-25/+19
\| \| \| \| \| \| \| \|	Both loads and atomics had identical code to rewrite destinations, and all cases had the same two lines to replace instructions. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Share get_io_offset handling in nir_lower_io.	Kenneth Graunke	2016-07-15	1	-24/+9
\| \| \| \| \| \| \| \| \| \| \| \|	The load/store/atomic cases all duplicated the get_io_offset code, with a few tiny differences: stores didn't bother checking for per-vertex inputs, because they can't be stored to, and atomics didn't check at all, since shared variables aren't per-vertex. However, it's harmless to check, and allows us to share more code. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Make a 'var' temporary in nir_lower_io.	Kenneth Graunke	2016-07-15	1	-16/+12
\| \| \| \| \| \| \|	Less typing and word wrapping issues than intrin->variables[0]->var. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Add optimization for (a \|\| True == True)	Eric Anholt	2016-07-12	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This was appearing in vc4 VS/CS in mupen64, due to vertex attrib lowering producing some constants that were getting compared. total instructions in shared programs: 112276 -> 112198 (-0.07%) instructions in affected programs: 2239 -> 2161 (-3.48%) total estimated cycles in shared programs: 283102 -> 283038 (-0.02%) estimated cycles in affected programs: 2365 -> 2301 (-2.71%) Reviewed-by: Jason Ekstrand <[email protected]>