mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	nir/algebraic: Simplify fsat of fsign	Ian Romanick	2018-10-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	These allows us to not support fsign.sat in the Intel compiler backend, and that will simplify some later changes. No shader-db changes on any Intel platform. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
*	nir/algebraic: sign(x)xx is abs(x)*x	Ian Romanick	2018-10-09	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	shader-db results: All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15106023 -> 15105981 (<.01%) instructions in affected programs: 300 -> 258 (-14.00%) helped: 6 HURT: 0 helped stats (abs) min: 7 max: 7 x̄: 7.00 x̃: 7 helped stats (rel) min: 14.00% max: 14.00% x̄: 14.00% x̃: 14.00% 95% mean confidence interval for instructions value: -7.00 -7.00 95% mean confidence interval for instructions %-change: -14.00% -14.00% Instructions are helped. total cycles in shared programs: 566050327 -> 566050075 (<.01%) cycles in affected programs: 2826 -> 2574 (-8.92%) helped: 6 HURT: 0 helped stats (abs) min: 40 max: 44 x̄: 42.00 x̃: 42 helped stats (rel) min: 8.89% max: 8.94% x̄: 8.92% x̃: 8.92% 95% mean confidence interval for cycles value: -44.30 -39.70 95% mean confidence interval for cycles %-change: -8.95% -8.88% Cycles are helped. No changes on Gen6 or earlier. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
*	nir: Add helper functions to get the instruction that generated a nir_src	Ian Romanick	2018-10-09	1	-0/+23
\| \| \| \| \|	Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
*	nir/alu_to_scalar: Use ssa_for_alu_src in hand-rolled expansions	Jason Ekstrand	2018-10-04	1	-15/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ssa_for_alu_src helper will correctly handle swizzles and other source modifiers for you. The expansions for unpack_half_2x16, pack_uvec2_to_uint, and pack_uvec4_to_uint were all broken with regards to swizzles. The brokenness of unpack_half_2x16 was causing rendering errors in Rise of the Tomb Raider on Intel ever since c11833ab24dcba26 which added an extra copy propagation to the optimization pipeline and caused us to start seeing swizzles where we hadn't seen any before. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107926 Fixes: 9ce901058f3d "nir: Add lowering of nir_op_unpack_half_2x16." Fixes: 9b8786eba955 "nir: Add lowering support for packing opcodes." Tested-by: Alex Smith <[email protected]> Tested-by: Józef Kucia <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	nir/from_ssa: Don't rewrite derefs destinations to registers	Jason Ekstrand	2018-10-02	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \|	We already call nir_rematerialize_derefs_in_use_blocks_impl prior to calling nir_lower_ssa_defs_to_regs_block so the assertion that all deref uses in the block should hold. This fixes the following CTS test when SPIR-V optimization recipe 1: dEQP-VK.glsl.struct.local.loop_nested_struct_array_vertex Fixes: 606eb56ab9449b "intel/nir: Only lower load/store derefs" Reviewed-by: Iago Toral Quiroga <[email protected]>
*	nir/cf: Remove phi sources if needed in nir_handle_add_jump	Jason Ekstrand	2018-10-02	1	-17/+21
\| \| \| \| \| \| \| \| \| \| \| \|	If the block in which the jump is inserted is the predecessor of a phi then we need to remove phi sources otherwise the phi may end up with things improperly connected. This fixes the following CTS test when dEQP is run with SPIR-V optimization recipe 1: dEQP-VK.glsl.functions.control_flow.return_in_nested_loop_vertex Cc: [email protected] Reviewed-by: Iago Toral Quiroga <[email protected]>
*	nir: add initializer data to fix MSVC compile error	Juan A. Suarez Romero	2018-09-19	1	-1/+1
\| \| \| \| \| \| \|	CC: Jason Ekstrand <[email protected]> Fixes: 82799a5d1b8 ("nir: Add a small pass to rematerialize derefs per-block") Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
*	nir: Add some asserts that we don't put derefs in phis	Jason Ekstrand	2018-09-19	3	-0/+6
\| \| \| \| \| \| \| \| \|	The lcssa and phis_to_regs passes are used by various NIR optimizations that modify the CFG. Putting a couple of asserts will help ensure that we don't accidentally put derefs in phis as part of an optimization pass. Reviewed-by: Iago Toral Quiroga <[email protected]>
*	nir/opt_if: Re-materialize derefs in use blocks before peeling loops	Jason Ekstrand	2018-09-19	1	-6/+7
\| \| \| \| \| \|	Reviewed-by: Iago Toral Quiroga <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107879 Cc: "18.2" <[email protected]>
*	nir/loop_unroll: Re-materialize derefs in use blocks before unrolling	Jason Ekstrand	2018-09-19	1	-9/+4
\| \| \| \| \| \| \| \| \| \|	When we're about to re-arrange a bunch of blocks, it's a good idea to make sure that we don't have deref uses crossing block boundaries. Otherwise we may end up with a deref going through a phi and that would be bad. Reviewed-by: Iago Toral Quiroga <[email protected]> Cc: "18.2" <[email protected]>
*	nir: Add a small pass to rematerialize derefs per-block	Jason Ekstrand	2018-09-19	2	-0/+134
\| \| \| \| \| \| \| \| \|	This pass re-materializes deref instructions on a per-block basis to ensure that every use of a deref occurs in the same block as the instruction which uses it. Reviewed-by: Iago Toral Quiroga <[email protected]> Cc: "18.2" <[email protected]>
*	nir: add loop unroll support for complex wrapper loops	Timothy Arceri	2018-09-14	1	-37/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In GLSL IR we cheat with switch statements and simply convert them into loops with a single iteration. This allowed us to make use of the existing jump instruction handling provided by the loop handing code, it also allows dead code to be cleaned up once we have wrapped the code in a loop. However using loops in this way created previously unrollable loops which limits further optimisations. Here we provide a way to unroll loops that end in a break and have multiple other exits. All shader-db changes are from the dolphin uber shaders. There is a small amount of HURT shaders but in general the improvements far exceed the HURT. shader-db results IVB: total instructions in shared programs: 10018187 -> 10016468 (-0.02%) instructions in affected programs: 104080 -> 102361 (-1.65%) helped: 36 HURT: 15 total cycles in shared programs: 220065064 -> 154529655 (-29.78%) cycles in affected programs: 126063017 -> 60527608 (-51.99%) helped: 51 HURT: 0 total loops in shared programs: 2515 -> 2308 (-8.23%) loops in affected programs: 903 -> 696 (-22.92%) helped: 51 HURT: 0 total spills in shared programs: 4370 -> 4124 (-5.63%) spills in affected programs: 1397 -> 1151 (-17.61%) helped: 9 HURT: 12 total fills in shared programs: 4581 -> 4419 (-3.54%) fills in affected programs: 2201 -> 2039 (-7.36%) helped: 9 HURT: 15 Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: propagates if condition evaluation down some alu chains	Timothy Arceri	2018-09-14	1	-0/+128
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	v2: - only allow nir_op_inot or nir_op_b2i when alu input is 1. - use some helpers as suggested by Jason. v3: - evaluate alu op for single input alu ops - add helper function to decide if to propagate through alu - make use of nir_before_src in another spot shader-db IVB results: total instructions in shared programs: 9993483 -> 9993472 (-0.00%) instructions in affected programs: 1300 -> 1289 (-0.85%) helped: 11 HURT: 0 total cycles in shared programs: 219476091 -> 219476059 (-0.00%) cycles in affected programs: 7675 -> 7643 (-0.42%) helped: 10 HURT: 1 Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: evaluate if condition uses inside the if branches	Timothy Arceri	2018-09-14	2	-0/+138
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since we know what side of the branch we ended up on we can just replace the use with a constant. All the spill changes in shader-db are from Dolphin uber shaders, despite some small regressions the change is clearly positive. V2: insert new constant after any phis in the use->parent_instr->type == nir_instr_type_phi path. v3: - use nir_after_block_before_jump() for inserting const - check dominance of phi uses correctly v4: - create some helpers as suggested by Jason. v5 (Jason Ekstrand): - Use LIST_ENTRY to get the phi src shader-db results IVB: total instructions in shared programs: 9999201 -> 9993483 (-0.06%) instructions in affected programs: 163235 -> 157517 (-3.50%) helped: 132 HURT: 2 total cycles in shared programs: 231670754 -> 219476091 (-5.26%) cycles in affected programs: 143424120 -> 131229457 (-8.50%) helped: 115 HURT: 24 total spills in shared programs: 4383 -> 4370 (-0.30%) spills in affected programs: 1656 -> 1643 (-0.79%) helped: 9 HURT: 18 total fills in shared programs: 4610 -> 4581 (-0.63%) fills in affected programs: 374 -> 345 (-7.75%) helped: 6 HURT: 0 Reviewed-by: Jason Ekstrand <[email protected]>
*	Replace uses of _mesa_bitcount with util_bitcount	Dylan Baker	2018-09-07	2	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	and _mesa_bitcount_64 with util_bitcount_64. This fixes a build problem in nir for platforms that don't have popcount or popcountll, such as 32bit msvc. v2: - Fix additional uses of _mesa_bitcount added after this was originally written Acked-by: Eric Engestrom <[email protected]> (v1) Acked-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	nir: Drop the vs_inputs_dual_locations option	Jason Ekstrand	2018-09-06	3	-42/+19
\| \| \| \| \| \| \| \| \| \| \| \| \|	It was very inconsistently handled; the only things that made use of it were glsl_to_nir, glspirv, and nir_gather_info. In particular, nir_lower_io completely ignored it so anyone using nir_lower_io on 64-bit vertex attributes was going to be in for a shock. Also, as of the previous commit, it's set by every driver that supports 64-bit vertex attributes. There's no longer any reason to have it be an option so let's just delete it. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
*	radeonsi/nir: Set vs_inputs_dual_locations and let NIR do the remap	Jason Ekstrand	2018-09-06	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were going out of our way to disable dual-location re-mapping in NIR only to then do the remapping in st_glsl_to_nir.cpp. Presumably, this was so that double_inputs would be correct for the core state tracker. However, now that we've it to gl_program::DualSlotInputs which is unaffected by NIR lowering, we can let NIR lower things for us. The one tricky bit here is that we have to remap the inputs_read bitfield back to the single-slot convention for the gallium state tracker to use. Since radeonsi is the only NIR-capable gallium driver that also supports GL_ARB_vertex_attrib_64bit, we only have to worry about radeonsi when making core gallium state tracker changes. Acked-by: Marek Olšák <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
*	compiler: Move double_inputs to gl_program::DualSlotInputs	Jason Ekstrand	2018-09-06	3	-20/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, we had two field in shader_info: double_inputs_read and double_inputs. Presumably, the one was for all double inputs that are read and the other is all that exist. However, because nir_gather_info regenerates these two values, there is a possibility, if a variable gets deleted, that the value of double_inputs could change over time. This is a problem because double_inputs is used to remap the input locations to a two-slot-per-dvec3/4 scheme for i965. If that mapping were to change between glsl_to_nir and back-end state setup, we would fall over when trying to map the NIR outputs back onto the GL location space. This commit changes the way slot re-mapping works. Instead of the double_inputs field in shader_info, it adds a DualSlotInputs bitfield to gl_program. By having it in gl_program, we more easily guarantee that NIR passes won't touch it after it's been set. It also makes more sense to put it in a GL data structure since it's really a mapping from GL slots to back-end and/or NIR slots and not really a NIR shader thing. Tested-by: Alejandro Piñeiro <[email protected]> (ARB_gl_spirv tests) Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
*	anv,i965: Lower away image derefs in the driver	Jason Ekstrand	2018-08-29	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, the back-end compiler turn image access into magic uniform reads and there was a complex contract between back-end compiler and driver about setting up and filling out those params. As of this commit, both drivers now lower image_deref_load_param_intel intrinsics to load_uniform intrinsics controlled by the driver and lower the other image_deref_* intrinsics to image_* intrinsics which take an actual binding table index. There are still "magic" uniforms but they are now added and controlled entirely by the driver and that contract no longer spans components. This also has the side-effect of making most image use compile-time binding table indices. Previously, all image access pulled the binding table index from a uniform. Part of the reason for this was that the magic uniforms made it difficult to decouple binding table indices from the uniforms and, since they are indexed completely differently (especially in Vulkan), it was hard to pull them apart. Now that the driver is handling both, it's trivial to decouple the two and provide actual binding table indices. Shader-db results on Kaby Lake: total instructions in shared programs: 15166872 -> 15164293 (-0.02%) instructions in affected programs: 115834 -> 113255 (-2.23%) helped: 191 HURT: 0 total cycles in shared programs: 571311495 -> 571196465 (-0.02%) cycles in affected programs: 4757115 -> 4642085 (-2.42%) helped: 73 HURT: 67 total spills in shared programs: 10951 -> 10926 (-0.23%) spills in affected programs: 742 -> 717 (-3.37%) helped: 7 HURT: 0 total fills in shared programs: 22226 -> 22201 (-0.11%) fills in affected programs: 1146 -> 1121 (-2.18%) helped: 7 HURT: 0 Reviewed-by: Kenneth Graunke <[email protected]>
*	nir: Add handle/index-based image intrinsics	Jason Ekstrand	2018-08-29	3	-20/+82
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	nir: Use a bitfield for image access qualifiers	Jason Ekstrand	2018-08-29	2	-10/+7
\| \| \| \| \| \| \| \| \| \|	This commit expands the current memory access enum to contain the extra two bits provided for images. We choose to follow the SPIR-V convention of NonReadable and NonWriteable because readonly implies that you can read so readonly + writeonly doesn't make as much sense as NonReadable + NonWriteable. Reviewed-by: Kenneth Graunke <[email protected]>
*	intel/compiler: Do image load/store lowering to NIR	Jason Ekstrand	2018-08-29	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit moves our storage image format conversion codegen into NIR instead of doing it in the back-end. This has the advantage of letting us run it through NIR's optimizer which is pretty effective at shrinking things down. In the common case of rgba8, the number of instructions emitted after NIR is done with it is half of what it was with the lowering happening in the back-end. On the downside, the back-end's lowering is able to directly use predicates and the NIR lowering has to use IFs. Shader-db results on Kaby Lake: total instructions in shared programs: 15166910 -> 15166872 (<.01%) instructions in affected programs: 5895 -> 5857 (-0.64%) helped: 15 HURT: 0 Clearly, we don't have that much image_load_store happening in the shaders in shader-db.... Reviewed-by: Kenneth Graunke <[email protected]>
*	nir: Make image load/store intrinsics variable-width	Jason Ekstrand	2018-08-29	1	-2/+2
\| \| \| \| \| \| \| \| \|	Instead of requiring 4 components, this allows them to potentially use fewer. Both the SPIR-V and GLSL paths still generate vec4 intrinsics so drivers which assume 4 components should be safe. However, we want to be able to shrink them for i965. Reviewed-by: Kenneth Graunke <[email protected]>
*	nir/format_convert: Fix a bitmask in unpack_11f11f10f	Jason Ekstrand	2018-08-29	1	-1/+1
\| \| \| \| \| \|	Fixes: 4e337b42f9a2 "nir/format_convert: Add pack/unpack for R11F_G11F_B10F" Reviewed-by: Kenneth Graunke <[email protected]>
*	nir/format_convert: Rename pack_r11g11b10f to pack_11f11f10f	Jason Ekstrand	2018-08-29	1	-1/+1
\| \| \| \| \| \|	This matches the unpack function. Reviewed-by: Kenneth Graunke <[email protected]>
*	nir/format_convert: Add [us]norm conversion helpers	Jason Ekstrand	2018-08-29	1	-0/+56
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	nir/format_convert: Rename nir_format_bitcast_uint_vec	Jason Ekstrand	2018-08-29	1	-2/+3
\| \| \| \| \| \| \| \|	We have a name for that, it's called a uvec. This just makes the function name a bit shorter. While we're here, we also add an assert for one of the assumptions this function makes. Reviewed-by: Kenneth Graunke <[email protected]>
*	nir/format_convert: Add vec mask and sign-extend helpers	Jason Ekstrand	2018-08-29	1	-8/+27
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	nir/format_convert: Add support for unpacking signed integers	Jason Ekstrand	2018-08-29	1	-8/+29
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	nir/opcodes: Make unpack_half_2x16_split_* variable-width	Jason Ekstrand	2018-08-29	1	-4/+4
\| \| \| \| \| \| \|	There is nothing inherent about these opcodes that requires them to only take scalars. It's very convenient if we let them take vectors as well. Reviewed-by: Kenneth Graunke <[email protected]>
*	nir/algebraic: Add some max/min optimizations	Jason Ekstrand	2018-08-29	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Found by inspection. This doesn't help much now but we'll see this pattern with images if you load UNORM and then store UNORM. Shader-db results on Kaby Lake: total instructions in shared programs: 15166916 -> 15166910 (<.01%) instructions in affected programs: 761 -> 755 (-0.79%) helped: 6 HURT: 0 Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	nir/algebraic: Add more extract_[iu](8\|16) optimizations	Jason Ekstrand	2018-08-29	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds the "(a << N) >> M" family of mask or sign-extensions. Not a huge win right now but this pattern will soon be generated by NIR format lowering code. Shader-db results on Kaby Lake: total instructions in shared programs: 15166918 -> 15166916 (<.01%) instructions in affected programs: 36 -> 34 (-5.56%) helped: 2 HURT: 0 Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	nir/algebraic: Be more careful converting ushr to extract_u8/16	Jason Ekstrand	2018-08-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	If it's not the right bit-size, it may not actually be the correct extraction. For now, we'll only worry about 32-bit versions. Fixes: 905ff8619824 "nir: Recognize open-coded extract_u16" Fixes: 76289fbfa84a "nir: Recognize open-coded extract_u8" Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	nir: add loop unroll support for wrapper loops	Timothy Arceri	2018-08-29	1	-0/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for unrolling the classic do { // ... } while (false) that is used to wrap multi-line macros. GLSL IR also wraps switch statements in a loop like this. shader-db results IVB: total loops in shared programs: 2515 -> 2512 (-0.12%) loops in affected programs: 33 -> 30 (-9.09%) helped: 3 HURT: 0 Reviewed-by: Jason Ekstrand <[email protected]>
*	nir/opt_loop_unroll: Remove unneeded phis if we make progress	Timothy Arceri	2018-08-29	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \|	Now that SSA values can be derefs and they have special rules, we have to be a bit more careful about our LCSSA phis. In particular, we need to clean up in case LCSSA ended up creating a phi node for a deref. This avoids validation issues with some CTS tests with the following patch, but its possible this we could also see the same problem with the existing unrolling passes. Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: add complex_loop bool to loop info	Timothy Arceri	2018-08-29	2	-2/+12
\| \| \| \| \| \| \| \| \| \| \|	In order to be sure loop_terminator_list is an accurate representation of all the jumps in the loop we need to be sure we didn't encounter any other complex behaviour such as continues, nested breaks, etc during analysis. This will be used in the following patch. Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: always attempt to find loop terminators	Timothy Arceri	2018-08-29	1	-7/+7
\| \| \| \| \| \| \|	This will help later patches with unrolling loops that end with a break i.e. loops the always exit on their first interation. Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Remove outdated comment	Caio Marcelo de Oliveira Filho	2018-08-28	1	-3/+0
\| \| \| \|	Reviewed-by: Jason Ekstrand <[email protected]>
*	mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering	Kevin Rogovin	2018-08-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This extension provides new GLSL built-in function beginFragmentShaderOrderingIntel() that guarantees (taking wording of GL_INTEL_fragment_shader_ordering extension) that any memory transactions issued by shader invocations from previous primitives mapped to same xy window coordinates (and same sample when per-sample shading is active), complete and are visible to the shader invocation that called beginFragmentShaderOrderingINTEL(). One advantage of INTEL_fragment_shader_ordering over ARB_fragment_shader_interlock is that it provides a function that operates as a memory barrie (instead of a defining a critcial section) that can be called under arbitary control flow from any function (in contrast the begin/end of ARB_fragment_shader_interlock may only be called once, from main(), under no control flow. Signed-off-by: Kevin Rogovin <[email protected]> Reviewed-by: Plamena Manolova <[email protected]>
*	nir: Pull block_ends_in_jump into nir.h	Jason Ekstrand	2018-08-27	3	-23/+13
\| \| \| \| \| \| \|	We had two different implementations in different files. May as well have one and put it in nir.h. Reviewed-by: Timothy Arceri <[email protected]>
*	nir: Add an array copy optimization	Jason Ekstrand	2018-08-23	3	-0/+414
\| \| \| \| \| \| \| \| \| \| \| \|	This peephole optimization looks for a series of load/store_deref or copy_deref instructions that copy an array from one variable to another and turns it into a copy_deref that copies the entire array. The pattern it looks for is extremely specific but it's good enough to pick up on the input array copies in DXVK and should also be able to pick up the sequence generated by spirv_to_nir for a OpLoad of a large composite followed by OpStore. It can always be improved later if needed. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	nir: Add a array-of-vector variable shrinking pass	Jason Ekstrand	2018-08-23	2	-0/+718
\| \| \| \| \| \| \|	This pass looks for variables with vector or array-of-vector types and narrows the type to only the components used. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	nir: Add an array splitting pass	Jason Ekstrand	2018-08-23	2	-0/+584
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This pass looks for array variables where at least one level of the array is never indirected and splits it into multiple smaller variables. This pass doesn't really do much now because nir_lower_vars_to_ssa can already see through arrays of arrays and can detect indirects on just one level or even see that arr[i][0][5] does not alias arr[i][1][j]. This pass exists to help other passes more easily see through arrays of arrays. If a back-end does implement arrays using scratch or indirects on registers, having more smaller arrays is likely to have better memory efficiency. v2 (Jason Ekstrand): - Better comments and naming (some from Caio) - Rework to use one hash map instead of two v2.1 (Jason Ekstrand): - Fix a couple of bugs that were added in the rework including one which basically prevented it from running Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	nir: Add a structure splitting pass	Jason Ekstrand	2018-08-23	3	-0/+277
\| \| \| \| \| \| \| \| \| \| \|	This pass doesn't really do much now because nir_lower_vars_to_ssa can already see through structures and considers them to be "split". This pass exists to help other passes more easily see through structure variables. If a back-end does implement arrays using scratch or indirects on registers, having more smaller arrays is likely to have better memory efficiency. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	nir: Add floating point atomic min, max, and compare-swap instrinsics	Ian Romanick	2018-08-22	3	-2/+24
\| \| \| \| \|	Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	nir: Add floating point atomic add instrinsics	Ian Romanick	2018-08-22	3	-0/+8
\| \| \| \| \|	Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	nir: Give end_block its own index	Caio Marcelo de Oliveira Filho	2018-08-22	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \|	Since there's no particular reason for the index to be 0, choose an index that is not used by other block. This is convenient when we store "per-block" data in an array AND look for the successors data (e.g. any kind of backwards data-flow analysis). v2: Add a note about end_block's index. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Skip common instructions when comparing deref paths	Caio Marcelo de Oliveira Filho	2018-08-22	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Deref paths may share the same deref instructions in their chains, e.g. ssa_100 = deref_var A ssa_101 = deref_struct "array_field" of ssa_100 ssa_102 = deref_array "[1]" of ssa_101 ssa_103 = deref_struct "field_a" of ssa_102 ssa_104 = deref_struct "field_a" of ssa_103 when comparing the two last deref instructions, their paths will share a common sequence ssa_100, ssa_101, ssa_102. This patch skips to next iteration if the deref instructions are the same. Path[0] (the var) is still handled specially, so in the case above, only ssa_101 and ssa_102 will be skipped. Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Export deref comparison functions	Caio Marcelo de Oliveira Filho	2018-08-22	3	-132/+132
\| \| \| \| \|	Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir/vars_to_ssa: Don't build deref nodes for non-local variables	Jason Ekstrand	2018-08-22	1	-4/+14
\| \| \| \|	Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>