mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	nir/clone: Expose nir_constant_clone	Jason Ekstrand	2016-03-23	2	-4/+5
\| \| \| \|	Reviewed-by: Rob Clark <[email protected]>
*	nir: Fix whitespace	Jason Ekstrand	2016-03-23	1	-1/+1
\| \| \| \|	Reviewed-by: Rob Clark <[email protected]>
*	nir: Don't abs slt and friends	Ian Romanick	2016-03-22	1	-0/+4
\| \| \| \| \| \| \|	No shader-db changes, but this is symmetric with the previous commit. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	nir: Don't abs the result of b2f or b2i	Ian Romanick	2016-03-22	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the results below, 2 SIMD16 shaders in Trine are lost. G4X total instructions in shared programs: 4012279 -> 4011108 (-0.03%) instructions in affected programs: 116776 -> 115605 (-1.00%) helped: 339 HURT: 0 total cycles in shared programs: 84315862 -> 84313584 (-0.00%) cycles in affected programs: 1767232 -> 1764954 (-0.13%) helped: 274 HURT: 81 Ironlake total instructions in shared programs: 6399073 -> 6396998 (-0.03%) instructions in affected programs: 218050 -> 215975 (-0.95%) helped: 600 HURT: 0 total cycles in shared programs: 128892088 -> 128888810 (-0.00%) cycles in affected programs: 2867452 -> 2864174 (-0.11%) helped: 422 HURT: 137 Sandy Bridge total instructions in shared programs: 8462174 -> 8460759 (-0.02%) instructions in affected programs: 178529 -> 177114 (-0.79%) helped: 596 HURT: 0 total cycles in shared programs: 117542276 -> 117534098 (-0.01%) cycles in affected programs: 1239166 -> 1230988 (-0.66%) helped: 369 HURT: 150 Ivy Bridge total instructions in shared programs: 7775131 -> 7773410 (-0.02%) instructions in affected programs: 162903 -> 161182 (-1.06%) helped: 590 HURT: 0 total cycles in shared programs: 65759882 -> 65747268 (-0.02%) cycles in affected programs: 1004354 -> 991740 (-1.26%) helped: 467 HURT: 141 Haswell total instructions in shared programs: 7107786 -> 7106327 (-0.02%) instructions in affected programs: 140954 -> 139495 (-1.04%) helped: 590 HURT: 0 total cycles in shared programs: 64668028 -> 64655322 (-0.02%) cycles in affected programs: 967080 -> 954374 (-1.31%) helped: 452 HURT: 149 LOST: 2 GAINED: 0 Broadwell total instructions in shared programs: 8980029 -> 8978287 (-0.02%) instructions in affected programs: 197232 -> 195490 (-0.88%) helped: 715 HURT: 0 total cycles in shared programs: 70070448 -> 70055970 (-0.02%) cycles in affected programs: 975724 -> 961246 (-1.48%) helped: 471 HURT: 111 LOST: 2 GAINED: 0 Skylake total instructions in shared programs: 9115178 -> 9113436 (-0.02%) instructions in affected programs: 203012 -> 201270 (-0.86%) helped: 715 HURT: 0 total cycles in shared programs: 68848660 -> 68834004 (-0.02%) cycles in affected programs: 993888 -> 979232 (-1.47%) helped: 473 HURT: 116 LOST: 2 GAINED: 0 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	nir: Simplify 0 < fabs(a)	Ian Romanick	2016-03-22	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sandy Bridge / Ivy Bridge / Haswell total instructions in shared programs: 8462180 -> 8462174 (-0.00%) instructions in affected programs: 564 -> 558 (-1.06%) helped: 6 HURT: 0 total cycles in shared programs: 117542462 -> 117542276 (-0.00%) cycles in affected programs: 9768 -> 9582 (-1.90%) helped: 12 HURT: 0 Broadwell / Skylake total instructions in shared programs: 8980833 -> 8980826 (-0.00%) instructions in affected programs: 626 -> 619 (-1.12%) helped: 7 HURT: 0 total cycles in shared programs: 70077900 -> 70077714 (-0.00%) cycles in affected programs: 9378 -> 9192 (-1.98%) helped: 12 HURT: 0 G45 and Ironlake showed no change. v2: Modify the comments to look more like a proof. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	nir: Simplify 0 >= b2f(a)	Ian Romanick	2016-03-22	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This also prevented some regressions with other patches in my local tree. Broadwell / Skylake total instructions in shared programs: 8980835 -> 8980833 (-0.00%) instructions in affected programs: 45 -> 43 (-4.44%) helped: 1 HURT: 0 total cycles in shared programs: 70077904 -> 70077900 (-0.00%) cycles in affected programs: 122 -> 118 (-3.28%) helped: 1 HURT: 0 No changes on earlier platforms. v2: Modify the comments to look more like a proof. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	nir: Simplify i2b with negated or abs operand	Ian Romanick	2016-03-22	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This enables removing ssa_201 and ssa_202 in sequences like: vec1 ssa_200 = flt ssa_199, ssa_194 vec1 ssa_201 = b2i ssa_200 vec1 ssa_202 = i2b -ssa_201 shader-db results: Sandy Bridge total instructions in shared programs: 8462257 -> 8462180 (-0.00%) instructions in affected programs: 3846 -> 3769 (-2.00%) helped: 35 HURT: 0 total cycles in shared programs: 117542934 -> 117542462 (-0.00%) cycles in affected programs: 20072 -> 19600 (-2.35%) helped: 20 HURT: 1 Ivy Bridge total instructions in shared programs: 7775252 -> 7775137 (-0.00%) instructions in affected programs: 3645 -> 3530 (-3.16%) helped: 35 HURT: 0 total cycles in shared programs: 65760522 -> 65760068 (-0.00%) cycles in affected programs: 21082 -> 20628 (-2.15%) helped: 25 HURT: 2 Haswell total instructions in shared programs: 7108666 -> 7108589 (-0.00%) instructions in affected programs: 3253 -> 3176 (-2.37%) helped: 35 HURT: 0 total cycles in shared programs: 64675726 -> 64675272 (-0.00%) cycles in affected programs: 21034 -> 20580 (-2.16%) helped: 26 HURT: 1 Broadwell / Skylake total instructions in shared programs: 8980912 -> 8980835 (-0.00%) instructions in affected programs: 3223 -> 3146 (-2.39%) helped: 35 HURT: 0 total cycles in shared programs: 70077926 -> 70077904 (-0.00%) cycles in affected programs: 21886 -> 21864 (-0.10%) helped: 21 HURT: 6 G45 and Ironlake showed no change. Signed-off-by: Ian Romanick <[email protected]> Suggested-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	nir: Lower flrp with Boolean interpolator to bcsel	Ian Romanick	2016-03-22	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On Intel platforms that don't set lower_flrp, using bcsel instead of flrp seems to be a small amount worse. On those platforms, the use of flrp, bcsel, and multiply of b2f is still an active area of research. In review, Matt suggested this is because bcsel turns into CMP+SEL, and because of the flag register we can't schedule instructions well. shader-db results: G4X / Ironlake total instructions in shared programs: 4016538 -> 4012279 (-0.11%) instructions in affected programs: 161556 -> 157297 (-2.64%) helped: 1077 HURT: 1 total cycles in shared programs: 84328296 -> 84315862 (-0.01%) cycles in affected programs: 4174570 -> 4162136 (-0.30%) helped: 926 HURT: 53 Unsurprisingly, no changes on later platforms. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	nir: propagate bitsize information in nir_search	Connor Abbott	2016-03-17	3	-27/+247
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we replace an expresion we have to compute bitsize information for the replacement. We do this in two passes to validate that bitsize information is consistent and correct: first we propagate bitsize from child nodes to parent, then we do it the other way around, starting from the original's instruction destination bitsize. v2 (Iago): - Always use nir_type_bool32 instead of nir_type_bool when generating algebraic optimizations. Before we used nir_type_bool32 with constants and nir_type_bool with variables. - Fix bool comparisons in nir_search.c to account for bitsized types. v3 (Sam): - Unpack the double constant value as unsigned long long (8 bytes) in nir_algrebraic.py. v4 (Sam): - Use helpers to get type size and base type from nir_alu_type. Signed-off-by: Iago Toral Quiroga <[email protected]> Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
*	nir: add a bit_size parameter to nir_ssa_dest_init	Connor Abbott	2016-03-17	20	-54/+112
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	v2: Squash multiple commits addressing the new parameter in different files so we don't break the build (Iago) v3: Fix tgsi (Samuel) v4: Fix nir_clone.c (Samuel) v5: Fix vc4 and freedreno (Iago) v6 (Sam) - Fix build errors in nir_lower_indirect_derefs - Use helper to get type size from nir_alu_type. Signed-off-by: Iago Toral Quiroga <[email protected]> Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Tested-by: Rob Clark <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
*	nir: rename nir_const_value fields to include bitsize information	Iago Toral Quiroga	2016-03-17	14	-53/+53
\| \| \| \| \|	Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
*	nir: update opcode definitions for different bit sizes	Connor Abbott	2016-03-17	5	-157/+262
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some opcodes need explicit bitsizes, and sometimes we need to use the double version when constant folding. v2: fix output type for u2f (Iago) v3: do not change vecN opcodes to be float. The next commit will add infrastructure to enable 64-bit integer constant folding so this is isn't really necessary. Also, that created problems with source modifiers in some cases (Iago) v4 (Jason): - do not change bcsel to work in terms of floats - leave ldexp generic Squashed changes to handle different bit sizes when constant folding since otherwise we would break the build. v2: - Use the bit-size information from the opcode information if defined (Iago) - Use helpers to get type size and base type of nir_alu_type enum (Sam) - Do not fallback to sized types to guess bit-size information. (Jason) Squashed changes in i965 and gallium/nir drivers to support sized types. These functions should only see sized types, but we can't make that change until we make sure that nir uses the sized versions in all the relevant places. A later commit will address this. Signed-off-by: Iago Toral Quiroga <[email protected]> Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
*	nir: add nir_{src,dest}_bit_size() helpers	Connor Abbott	2016-03-17	1	-0/+12
\| \| \| \| \| \| \| \|	v2: use a ternary (Jason) Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
*	nir: Add a bit_size to nir_register and nir_ssa_def	Jason Ekstrand	2016-03-17	3	-4/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This really hacky commit adds a bit size to registers and SSA values. It also adds rules in the validator to validate that they do the right things. It's still an open question as to whether or not we want a bit_size in nir_alu_instr or if we just want to let it inherit from the destination. I'm inclined to just let it inherit from the destination. A similar question needs to be asked about intrinsics. v2 (Connor): - Relax validation: comparisons have explicit destination sizes and implicit source sizes. v3 (Sam): - Use helpers to get size and base types of nir_alu_type enum. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
*	nir: Add explicitly sized types	Jason Ekstrand	2016-03-17	1	-1/+28
\| \| \| \| \| \| \| \| \| \| \| \|	v2: Fix size/type mask to properly handle 8-bit types. v3: Add helpers to get the bitsize and base type of a nir_alu_type enum. Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
*	nir: Lower shared var atomics during nir_lower_io	Jordan Justen	2016-03-17	1	-2/+85
\| \| \| \| \|	Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Add support for lowering load/stores of shared variables	Jordan Justen	2016-03-17	5	-8/+32
\| \| \| \| \|	Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Add atomic operations on variables	Jordan Justen	2016-03-17	1	-0/+27
\| \| \| \| \| \| \| \| \|	This allows us to first generate atomic operations for shared variables using these opcodes, and then later we can lower those to the shared atomics intrinsics with nir_lower_io. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Add compute shader shared variable storage class	Jordan Justen	2016-03-17	7	-3/+26
\| \| \| \| \| \| \| \| \|	Previously we were receiving shared variable accesses via a lowered intrinsic function from glsl. This change allows us to send in variables instead. For example, when converting from SPIR-V. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir/print: Add space after shader_storage var mode	Jordan Justen	2016-03-17	1	-1/+1
\| \| \| \| \|	Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir/clone: Add support for cloning a single function_impl	Jason Ekstrand	2016-03-12	2	-32/+81
\| \| \| \|	Reviewed-by: Jordan Justen <[email protected]>
*	nir/validate: Better function validation	Jason Ekstrand	2016-03-12	1	-7/+15
\| \| \| \|	Reviewed-by: Jordan Justen <[email protected]>
*	nir/print: Better function argument printing	Jason Ekstrand	2016-03-12	1	-2/+10
\| \| \| \| \| \| \| \|	Since we aren't going to put the function parameters or the return variable in the list of locals, it won't get a proper declaration. This changes nir_print to print the type along with each parameter or return variable. Reviewed-by: Jordan Justen <[email protected]>
*	nir/print: Factor variable name lookup into a helper	Jason Ekstrand	2016-03-12	1	-30/+36
\| \| \| \| \| \| \| \|	Otherwise, we have a problem when we go to print functions with arguments because their names get added to the hash table during declaration which happens after we print the prototype. Reviewed-by: Jordan Justen <[email protected]>
*	nir: Create function parameters in function_impl_create	Jason Ekstrand	2016-03-12	1	-0/+20
\| \| \| \|	Reviewed-by: Jordan Justen <[email protected]>
*	nir: Add a helper for creating a "bare" nir_function_impl	Jason Ekstrand	2016-03-12	2	-10/+20
\| \| \| \|	Reviewed-by: Jordan Justen <[email protected]>
*	nir: Add a new "param" variable mode for parameters and return variables	Jason Ekstrand	2016-03-12	3	-2/+13
\| \| \| \|	Reviewed-by: Jordan Justen <[email protected]>
*	nir/glsl: Remove dead function parameter handling code	Jason Ekstrand	2016-03-12	1	-46/+5
\| \| \| \| \| \| \| \| \|	NIR has never been used on IR where we haven't already done function inlining so this code has been dead from the beginning. Let's just get rid of it for now. We can always put it back in if we decide to use NIR for function inlining at some point in the future. Reviewed-by: Jordan Justen <[email protected]>
*	nir: Add a pass for lower indirect variable dereferences	Jason Ekstrand	2016-03-08	3	-0/+242
\| \| \| \| \| \| \| \|	This new pass lowers load/store_var intrinsics that act on indirect derefs to if-ladder of direct load/store_var intrinsics. The if-ladders perform a simple binary search on the indirect. Reviewed-by: Connor Abbott <[email protected]>
*	nir: Recognize open-coded extract_u16.	Matt Turner	2016-03-04	1	-0/+5
\| \| \| \| \| \| \| \|	No shader-db changes, but does recognize some extract_u16 which enables the next patch to optimize some code. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
*	nir: Recognize open-coded extract_u8.	Matt Turner	2016-03-04	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Two shaders that appear in Unigine benchmarks (Heaven and Valley) unpack three bytes from an integer and convert each into a float: float((val >> 16u) & 0xffu) float((val >> 8u) & 0xffu) float((val >> 0u) & 0xffu) Instead of shifting, masking, and type converting like this: shr(8) g15<1>UD g25<8,8,1>UD 0x00000010UD and(8) g16<1>UD g15<8,8,1>UD 0x000000ffUD mov(8) g17<1>F g16<8,8,1>UD shr(8) g18<1>UD g25<8,8,1>UD 0x00000008UD and(8) g19<1>UD g18<8,8,1>UD 0x000000ffUD mov(8) g20<1>F g19<8,8,1>UD and(8) g21<1>UD g25<8,8,1>UD 0x000000ffUD mov(8) g22<1>F g21<8,8,1>UD i965 can simply extract a byte and convert to float in a single instruction: mov(8) g17<1>F g25.2<32,8,4>UB mov(8) g20<1>F g25.1<32,8,4>UB mov(8) g22<1>F g25.0<32,8,4>UB This patch implements the first step: recognizing byte extraction. A later patch will optimize out the conversion to float. instructions in affected programs: 28568 -> 27450 (-3.91%) helped: 7 cycles in affected programs: 210076 -> 203144 (-3.30%) helped: 7 This patch decreases the number of instructions in the two Unigine programs by: #1721: 4520 -> 4374 instructions (-3.23%) #1706: 3752 -> 3582 instructions (-4.53%) Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
*	nir: Remove the const_offset from nir_tex_instr	Jason Ekstrand	2016-02-10	5	-32/+5
\| \| \| \| \| \| \| \| \| \| \|	When NIR was originally drafted, there was no easy way to determine if something was constant or not. The result was that we had lots of special-casing for constant values such as this. Now that load_const instructions are SSA-only, it's really easy to find constants and this isn't really needed anymore. Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Rob Clark <[email protected]>
*	nir/lower_vec_to_movs: Better report channels handled by insert_mov	Jason Ekstrand	2016-02-10	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes two issues. First, we had a use-after-free in the case where the instruction got deleted and we tried to return mov->dest.write_mask. Second, in the case where we are doing a self-mov of a register, we delete those channels that are moved to themselves from the write-mask. This means that those channels aren't reported as being handled even though they are. We now stash off the write-mask before remove unneeded channels so that they still get reported as handled. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94073 Reviewed-by: Matt Turner <[email protected]> Cc: "11.0 11.1" <[email protected]>
*	nir: Separate texture from sampler in nir_tex_instr	Jason Ekstrand	2016-02-09	10	-18/+94
\| \| \| \| \| \| \| \| \| \| \| \| \|	This commit adds the capability to NIR to support separate textures and samplers. As it currently stands, glsl_to_nir only sets the texture deref and leaves the sampler deref alone as it did before and nir_lower_samplers assumes this. Backends can still assume that they are combined and only look at only at the texture index. Or, if they wish, they can assume that they are separate because nir_lower_samplers, tgsi_to_nir, and prog_to_nir all set both texture and sampler index whenever a sampler is required (the two indices are the same in this case). Reviewed-by: Kenneth Graunke <[email protected]>
*	nir/tex_instr: Rename sampler to texture	Jason Ekstrand	2016-02-09	11	-58/+58
\| \| \| \| \| \| \| \| \|	We're about to separate the two concepts. When we do, the sampler will become optional. Doing a rename first makes the separation a bit more safe because drivers that depend on GLSL or TGSI behaviour will be fine to just use the texture index all the time. Reviewed-by: Kenneth Graunke <[email protected]>
*	nir: Add some braces around loops and ifs	Jason Ekstrand	2016-02-09	1	-5/+10
\|
*	nir: use const_index helpers	Rob Clark	2016-02-09	11	-24/+23
\| \| \| \| \|	Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	gtn: use const_index helpers	Rob Clark	2016-02-09	1	-8/+9
\| \| \| \| \|	Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: const_index helpers	Rob Clark	2016-02-09	4	-100/+191
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Direct access to intr->const_index[n], where different slots have different meanings, is somewhat confusing. Instead, let's put some extra info in nir_intrinsic_infos[] about which slots map to what, and add some get/set helpers. The helpers validate that the field being accessed (base/writemask/etc) is applicable for the intrinsic opc, for some extra safety. And nir_print can use this to dump out decoded const_index fields. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: remove unused nir_variable fields	Timothy Arceri	2016-02-09	2	-20/+0
\| \| \| \| \| \| \|	These are used in GLSL IR to removed unused varyings and match transform feedback variables. There is no need to use these in NIR. Reviewed-by: Kenneth Graunke <[email protected]>
*	nir: Recognize open-coded bitfield_reverse.	Matt Turner	2016-02-08	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Helps 11 shaders in UnrealEngine4 demos. I seriously hope they would have given us bitfieldReverse() if we exposed GL 4.0 (but we do expose ARB_gpu_shader5, so why not use that anyway?). instructions in affected programs: 4875 -> 4633 (-4.96%) cycles in affected programs: 270516 -> 244516 (-9.61%) I suspect there's a lot of room to improve nir_search/opt_algebraic's handling of this. We'd actually like to match, e.g., step2 by matching step1 once and then doing a pointer comparison for the second instance of step1, but unfortunately we generate an enormous tuple for instead. The .text size increases by 6.5% and the .data by 17.5%. text data bss dec hex filename 22957 45224 0 68181 10a55 nir_libnir_la-nir_opt_algebraic.o 24461 53160 0 77621 12f35 nir_libnir_la-nir_opt_algebraic.o I'd be happy to remove this if Unreal4 uses bitfieldReverse() if it is in a GL 4.0 context once we expose GL 4.0. Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Handle large unsigned values in opt_algebraic.	Matt Turner	2016-02-08	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The next patch adds an algebraic rule that uses the constant 0xff00ff00. Without this change, the build fails with return hex(struct.unpack('I', struct.pack('i', self.value))[0]) struct.error: 'i' format requires -2147483648 <= number <= 2147483647 The hex() function handles integers of any size, and assigning a negative value to an unsigned does what we want in C. The pack/unpack is unnecessary (and as we see, buggy). Reviewed-by: Dylan Baker <[email protected]>
*	nir: Do opt_algebraic in reverse order.	Matt Turner	2016-02-08	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Walking the SSA definitions in order means that we consider the smallest algebraic optimizations before larger optimizations. So if a smaller rule is part of a larger rule, the smaller one will happen first, preventing the larger one from happening. instructions in affected programs: 32721 -> 32611 (-0.34%) helped: 106 In programs whose nir_optimize loop count changes (129 of them): before: 1164 optimization loops after: 1071 optimization loops Of the 129 affected, 16 programs' optimization loop counts increased. Prevents regressions and annoyances in the next commits. Reviewed-by: Eduardo Lima Mitev <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Recognize product of open-coded pow()s.	Matt Turner	2016-02-08	1	-0/+2
\| \| \| \| \| \|	Prevents regressions in the next commit. Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Add opt_algebraic rules for xor with zero.	Matt Turner	2016-02-08	1	-0/+2
\| \| \| \| \| \| \| \|	instructions in affected programs: 668 -> 664 (-0.60%) helped: 4 Reviewed-by: Eduardo Lima Mitev <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Add lowering support for unpacking opcodes.	Matt Turner	2016-02-01	2	-0/+32
\| \| \| \|	Reviewed-by: Iago Toral Quiroga <[email protected]>
*	nir: Add lowering support for packing opcodes.	Matt Turner	2016-02-01	4	-0/+66
\| \| \| \|	Reviewed-by: Iago Toral Quiroga <[email protected]>
*	nir: Add opcodes to extract bytes or words.	Matt Turner	2016-02-01	3	-0/+28
\| \| \| \| \| \|	The uint versions zero extend while the int versions sign extend. Reviewed-by: Iago Toral Quiroga <[email protected]>
*	glsl: Remove 2x16 half-precision pack/unpack opcodes.	Matt Turner	2016-02-01	1	-9/+0
\| \| \| \| \| \|	i965/fs was the only consumer, and we're now doing the lowering in NIR. Reviewed-by: Iago Toral Quiroga <[email protected]>
*	nir: Add lowering of nir_op_unpack_half_2x16.	Matt Turner	2016-02-01	2	-4/+29
\| \| \| \|	Reviewed-by: Iago Toral Quiroga <[email protected]>