mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	nir/lower_clip: Fix incorrect driver loc for clipdist outputs	Rob Clark	2019-12-04	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \|	Somehow adjusting maxloc based on existing outputs got lost, resulting in the clipdist varying clobbering the position varying. Causing a shader that had no position output in freedreno/ir3, which triggers GPU hangs in neverball. Fixes: d0f746b6458 ("nir: Save nir_variable pointers in nir_lower_clip_vs rather than locs.") Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
*	nir/load_store_vectorize: fix combining stores with aliasing loads between	Rhys Perry	2019-12-04	2	-2/+16
\| \| \| \| \| \| \| \| \|	v2: add test Fixes: ce9205c03bd ('nir: add a load/store vectorization pass') Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> (v1) Reviewed-by: Connor Abbott <[email protected]> (v2)
*	nir/algebraic: Rearrange bcsel sequences generated by nir_opt_peephole_select	Ian Romanick	2019-12-02	1	-0/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reviewed-by: Matt Turner <[email protected]> All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 14660366 -> 14653437 (-0.05%) instructions in affected programs: 316166 -> 309237 (-2.19%) helped: 905 HURT: 10 helped stats (abs) min: 1 max: 36 x̄: 7.67 x̃: 6 helped stats (rel) min: 0.13% max: 18.75% x̄: 4.28% x̃: 3.60% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.10% max: 1.33% x̄: 0.70% x̃: 0.97% 95% mean confidence interval for instructions value: -7.91 -7.23 95% mean confidence interval for instructions %-change: -4.46% -3.99% Instructions are helped. total cycles in shared programs: 228571646 -> 228549759 (<.01%) cycles in affected programs: 56239919 -> 56218032 (-0.04%) helped: 681 HURT: 216 helped stats (abs) min: 1 max: 5156 x̄: 45.49 x̃: 10 helped stats (rel) min: <.01% max: 10.45% x̄: 1.29% x̃: 0.65% HURT stats (abs) min: 1 max: 320 x̄: 42.09 x̃: 14 HURT stats (rel) min: <.01% max: 37.04% x̄: 1.38% x̃: 0.49% 95% mean confidence interval for cycles value: -41.51 -7.29 95% mean confidence interval for cycles %-change: -0.80% -0.49% Cycles are helped. LOST: 1 GAINED: 0
*	nir/algebraic: Simplify some Inf and NaN avoidance code	Ian Romanick	2019-12-02	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since a is non-negative, neither fsqrt nor frsq should return NaN. frsq should only return Inf when fsqrt returns 0. The changes are pretty small, but this turns a few hundred hurt shaders in the next patch into helped shaders. An alternative to the intBitsToFloat is to import numpy and do np.finfo(np.float32).max. That's more explicit, but we may also want to have specific bit encodings of float values later. I could be convinced either way, but intBitsToFloat(0x7f7fffff) was what I implemented first. Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Matt Turner <[email protected]> All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 14661140 -> 14661104 (<.01%) instructions in affected programs: 7520 -> 7484 (-0.48%) helped: 36 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.32% max: 0.61% x̄: 0.49% x̃: 0.52% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.52% -0.47% Instructions are helped. total cycles in shared programs: 228585416 -> 228584806 (<.01%) cycles in affected programs: 56321 -> 55711 (-1.08%) helped: 32 HURT: 0 helped stats (abs) min: 2 max: 98 x̄: 19.06 x̃: 10 helped stats (rel) min: 0.08% max: 6.41% x̄: 1.09% x̃: 0.65% 95% mean confidence interval for cycles value: -28.32 -9.80 95% mean confidence interval for cycles %-change: -1.63% -0.54% Cycles are helped. Sandy Bridge total cycles in shared programs: 152991077 -> 152991075 (<.01%) cycles in affected programs: 11525 -> 11523 (-0.02%) helped: 2 HURT: 2 helped stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3 helped stats (rel) min: 0.07% max: 0.11% x̄: 0.09% x̃: 0.09% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08% 95% mean confidence interval for cycles value: -5.27 4.27 95% mean confidence interval for cycles %-change: -0.16% 0.15% Inconclusive result (value mean confidence interval includes 0). No changes on Iron Lake or GM45.
*	nir/opt_peephole_select: Don't count some unary operations	Ian Romanick	2019-12-02	1	-1/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In many cases, fsat, fneg, fabs, ineg, and iabs will get folded into another instruction as either source or destination modifiers. Counting them as instructions means that some if-statements won't get converted to selects. For example, vec1 32 ssa_25 = flt32 ssa_0, ssa_23.x /* succs: block_1 block_2 / if ssa_25 { block block_1: / preds: block_0 / vec1 32 ssa_26 = fabs ssa_24 vec1 32 ssa_27 = fneg ssa_26 vec1 32 ssa_28 = fabs ssa_20 vec1 32 ssa_29 = fneg ssa_28 vec1 32 ssa_30 = fmul ssa_27, ssa_29 vec1 32 ssa_31 = fsat ssa_30 / succs: block_3 / } else { block block_2: / preds: block_0 / / succs: block_3 / } block block_3: / preds: block_1 block_2 */ block_1 isn't really 6 instructions, but it will be counted that way. Most callers of the peephole_select pass use either 1 or 8. It's very easy to blow way past either of these limits with things that are really only one or two actual instructions. I also tried some fancier things like making sure the fsat was of another SSA def from the same block, but the simple test was actually better. The i965 back-end SEL peephole pass still helps ~700 shaders in shader-db with this change. Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Matt Turner <[email protected]> All Gen6+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 14743694 -> 14738910 (-0.03%) instructions in affected programs: 156575 -> 151791 (-3.06%) helped: 1204 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 3.97 x̃: 3 helped stats (rel) min: 0.15% max: 19.57% x̄: 5.15% x̃: 4.55% 95% mean confidence interval for instructions value: -4.12 -3.82 95% mean confidence interval for instructions %-change: -5.35% -4.95% Instructions are helped. total cycles in shared programs: 231749141 -> 231602916 (-0.06%) cycles in affected programs: 2818975 -> 2672750 (-5.19%) helped: 876 HURT: 322 helped stats (abs) min: 2 max: 788 x̄: 180.99 x̃: 220 helped stats (rel) min: <.01% max: 43.82% x̄: 20.75% x̃: 19.44% HURT stats (abs) min: 1 max: 1188 x̄: 38.27 x̃: 20 HURT stats (rel) min: 0.09% max: 102.67% x̄: 5.17% x̃: 1.70% 95% mean confidence interval for cycles value: -130.47 -113.64 95% mean confidence interval for cycles %-change: -14.85% -12.72% Cycles are helped. total sends in shared programs: 730495 -> 730491 (<.01%) sends in affected programs: 46 -> 42 (-8.70%) helped: 2 HURT: 0 Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8122757 -> 8122617 (<.01%) instructions in affected programs: 14716 -> 14576 (-0.95%) helped: 46 HURT: 1 helped stats (abs) min: 1 max: 8 x̄: 3.07 x̃: 3 helped stats (rel) min: 0.36% max: 10.00% x̄: 2.54% x̃: 1.06% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.59% max: 1.59% x̄: 1.59% x̃: 1.59% 95% mean confidence interval for instructions value: -3.42 -2.54 95% mean confidence interval for instructions %-change: -3.28% -1.62% Instructions are helped. total cycles in shared programs: 188510100 -> 188509780 (<.01%) cycles in affected programs: 58994 -> 58674 (-0.54%) helped: 32 HURT: 1 helped stats (abs) min: 2 max: 96 x̄: 10.06 x̃: 6 helped stats (rel) min: 0.05% max: 15.29% x̄: 1.37% x̃: 0.31% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.68% max: 0.68% x̄: 0.68% x̃: 0.68% 95% mean confidence interval for cycles value: -16.34 -3.06 95% mean confidence interval for cycles %-change: -2.46% -0.15% Cycles are helped.
*	nir/lower_io_to_vector: don't create arrays when not needed	Rhys Perry	2019-12-02	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \|	Some backends require that there are no array varyings. If there were no arrays in the input shader, the pass shouldn't have to create new ones. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2103 Fixes: bcd14756eec ('nir/lower_io_to_vector: add flat mode') Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
*	nir: Make algebraic backtrack and reprocess after a replacement.	Eric Anholt	2019-11-26	2	-22/+97
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The algebraic pass was exhibiting O(n^2) behavior in dEQP-GLES2.functional.uniform_api.random.3 and dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 (along with other code-generated tests, and likely real-world loop-unroll cases). In the process of using fmul(b2f(x), b2f(x)) -> b2f(iand(x, y)) to transform: result = b2f(a == b); result = b2f(c == d); ... result = b2f(z == w); -> temp = (a == b) temp = temp && (c == d) ... temp = temp && (z == w) result = b2f(temp); nir_opt_algebraic, proceeding bottom-to-top, would match and convert the top-most fmul(b2f(), b2f()) case each time, leaving the new b2f to be matched by the next fmul down on the next time algebraic got run by the optimization loop. Back in 2016 in 7be8d0773229 ("nir: Do opt_algebraic in reverse order."), Matt changed algebraic to go bottom-to-top so that we would match the biggest patterns first. This helped his cases, but I believe introduced this failure mode. Instead of reverting that, now that we've got the automaton, we can update the automaton's state recursively and just re-process any instructions whose state has changed (indicating that they might match new things). There's a small chance that the state will hash to the same value and miss out on this round of algebraic, but this seems to be good enough to fix dEQP. Effects with NIR_VALIDATE=0 (improvement is better with validation enabled): Intel shader-db runtime -0.954712% +/- 0.333844% (n=44/46, obvious throttling outliers removed) dEQP-GLES2.functional.uniform_api.random.3 runtime -65.3512% +/- 4.22369% (n=21, was 1.4s) dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 runtime -68.8066% +/- 6.49523% (was 4.8s) v2: Use two worklists, suggested by @cwabbott, to cut out a bunch of tricky code. Runtime of uniform_api.random.3 down -0.790299% +/- 0.244213% compred to v1. v3: Re-add the nir_instr_remove() that I accidentally dropped in v2, fixing infinite loops. Reviewed-by: Connor Abbott <[email protected]>
*	nir: Refactor algebraic's block walk	Eric Anholt	2019-11-26	1	-31/+31
\| \| \| \| \| \| \| \| \|	My motivation was to clarify the changes in the following commit, but incidentally, it reduces runtime of dEQP-GLES2.functional.uniform_api.random.3 (an algebraic-heavy testcase) by -5.39524% +/- 2.21179% (n=15) Reviewed-by: Connor Abbott <[email protected]>
*	nir: Maintain the algebraic automaton's state as we work.	Connor Abbott	2019-11-26	2	-38/+78
\| \| \| \| \| \| \| \|	In order to have nir_opt_algebraic be able to do further algebraic work on the output of a replacement, we need to maintain the automaton's state. Reviewed-by: Eric Anholt <[email protected]>
*	nir: Add a scheduler pass to reduce maximum register pressure.	Eric Anholt	2019-11-25	3	-0/+1092
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is similar to a scheduler I've written for vc4 and i965, but this time written at the NIR level so that hopefully it's reusable. A notable new feature it has is Goodman/Hsu's heuristic of "once we've started processing the uses of a value, prioritize processing the rest of their uses", which should help avoid the heuristic otherwise making such systematically bad choices around getting texture results consumed. Results for v3d: total instructions in shared programs: 6497588 -> 6518242 (0.32%) total threads in shared programs: 154000 -> 152828 (-0.76%) total uniforms in shared programs: 2119629 -> 2068681 (-2.40%) total spills in shared programs: 4984 -> 472 (-90.53%) total fills in shared programs: 6418 -> 1546 (-75.91%) Acked-by: Alyssa Rosenzweig <[email protected]> (v1) Reviewed-by: Alejandro Piñeiro <[email protected]> (v2) v2: Use the DAG datastructure, fold in the scheduling-for-parallelism patch, include SSA defs in live values so we can switch to bottom-up if we want. v3: Squash in improvements from Alejandro Piñeiro for getting V3D to successfully register allocate on GLES3.1 dEQP. Make sure that discards don't move after store_output. Comment spelling fix.
*	nir: add load/store vectorizer tests	Rhys Perry	2019-11-25	2	-0/+1763
\| \| \| \| \| \| \| \| \| \| \|	v7: run nir_opt_algebraic v9: rework the callback function v9: update alignment on all loads/stores, even if they're not vectorized v10: add tests for 64-bit offsets v10: add tests for signed offsets Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Connor Abbott <[email protected]> (v9)
*	nir: add a load/store vectorization pass	Rhys Perry	2019-11-25	3	-0/+1313
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This pass combines intersecting, adjacent and identical loads/stores into potentially larger ones and will be used by ACO to greatly reduce the number of memory operations. v2: handle nir_deref_type_ptr_as_array v3: assume explicitly laid out types for derefs v4: create less deref casts v4: fix shared boolean vectorization v4: fix copy+paste error in resources_different v4: fix extract_subvector() to pass nir_load_store_vectorize_test.ssbo_load_intersecting_32_32_64 v4: rebase v5: subtract from deref/offset instead of scheduling offset calculations v5: various non-functional changes/cleanups v5: require less metadata and preserve more v5: rebase v6: cleanup and improve dependency handling v6: emit less deref casts v6: pass undef to components not set in the write_mask for new stores v7: fix 8-bit extract_vector() with 64-bit input v7: cleanup creation of store write data v7: update align correctly for when the bit size of load/store increases v7: rename extract_vector to extract_component and update comment v8: prevent combining of row-major matrix column acceses v9: rework process_block() to be able to vectorize more v9: rework the callback function v9: update alignment on all loads/stores, even if they're not vectorized v9: remove entry::store_value, since it will not be updated if it's was from a vectorized load v9: fix bug in subtract_deref(), causing artifacts in Dishonored 2 v9: handle nir_intrinsic_scoped_memory_barrier v10: use nir_ssa_scalar v10: handle non-32-bit offsets v10: use signed offsets for comparison v10: improve create_entry_key_from_offset() v10: support load_shared/store_shared v10: remove strip_deref_casts() v10: don't ever pass NULL to memcmp v10: remove recursion in gcd() v10: fix outdated comment v11: use the new nir_extract_bits() v12: remove use of nir_src_as_const_value in resources_different v13: make entry key hash function deterministic v13: simplify mask_sign_extend() v14: add comment in hash_entry_key() about hashing pointers Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Connor Abbott <[email protected]> (v9)
*	nir: add nir_num_variable_modes and nir_var_mem_push_const	Rhys Perry	2019-11-25	2	-2/+9
\| \| \| \| \| \| \| \| \| \|	These will be useful in the upcoming load/store vectorizer. v11: rebase Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: no-op C99 _Pragma() with MSVC	Brian Paul	2019-11-23	1	-0/+7
\| \| \| \| \| \| \| \| \| \|	This fixes a build failure on MSVC. BTW, it looks like clang supports _Pragma() but I don't know if it understands the "gcc unroll N" directive. Signed-off-by: Brian Paul <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	nir/serialize: support any num_components for remaining instructions	Marek Olšák	2019-11-23	1	-4/+13
\| \| \| \| \| \| \| \| \|	Only NPOT vectors greater than vec4 use the extra uint32. This is for instructions that share the dest code. load_const and undef already support 1-16 in the header. Reviewed-by: Connor Abbott <[email protected]>
*	nir/serialize: use 3 unused bits in intrinsic for packed_const_indices	Marek Olšák	2019-11-23	1	-11/+10
\| \| \| \|	Reviewed-by: Connor Abbott <[email protected]>
*	nir/serialize: don't serialize redundant nir_intrinsic_instr::num_components	Marek Olšák	2019-11-23	1	-6/+16
\| \| \| \|	Reviewed-by: Connor Abbott <[email protected]>
*	nir/serialize: serialize writemask for vec8 and vec16	Marek Olšák	2019-11-23	1	-9/+16
\| \| \| \|	Reviewed-by: Connor Abbott <[email protected]>
*	nir/serialize: serialize swizzles for vec8 and vec16	Marek Olšák	2019-11-23	1	-8/+43
\| \| \| \|	Reviewed-by: Connor Abbott <[email protected]>
*	nir/serialize: reuse the writemask field for 2 src X swizzles of SSA ALU	Marek Olšák	2019-11-23	1	-3/+33
\| \| \| \|	Reviewed-by: Connor Abbott <[email protected]>
*	nir/serialize: remove up to 3 consecutive equal ALU instruction headers	Marek Olšák	2019-11-23	1	-16/+65
\| \| \| \| \| \| \| \| \| \| \|	vec4 scalarized ALUs typically have 4 equal instruction headers, so remove the last 3. There are no bits left in the ALU header for more flags, so future extensions of NIR will have to use something like instr_type == 15 to describe more complex ALU instructions. Reviewed-by: Connor Abbott <[email protected]>
*	nir/serialize: try to pack both deref array src into 32 bits	Marek Olšák	2019-11-23	1	-5/+28
\| \| \| \|	Reviewed-by: Connor Abbott <[email protected]>
*	nir/serialize: cleanup - fold nir_deref_type_var cases into switches	Marek Olšák	2019-11-23	1	-16/+19
\| \| \| \|	Reviewed-by: Connor Abbott <[email protected]>
*	nir/serialize: try to put deref->var index into the unused bits of the header	Marek Olšák	2019-11-23	1	-10/+23
\| \| \| \|	Reviewed-by: Connor Abbott <[email protected]>
*	nir/serialize: don't serialize mode for deref non-cast instructions	Marek Olšák	2019-11-23	1	-5/+12
\| \| \| \| \| \| \| \| \| \|	It can be derived from src and var. This frees 10 bits in the header that will be used later. "mode" is moved in the structure, because those bits will be used for something else later. Reviewed-by: Connor Abbott <[email protected]>
*	nir/serialize: don't store deref types if not needed	Marek Olšák	2019-11-23	1	-4/+26
\| \| \| \| \| \| \|	- type_cast: deduplicate types if the last one is the same - derive the type from the parent for other derefs Reviewed-by: Connor Abbott <[email protected]>
*	nir/serialize: try to pack two alu srcs into 1 uint32	Marek Olšák	2019-11-23	1	-21/+76
\| \| \| \|	Reviewed-by: Connor Abbott <[email protected]>
*	nir/serialize: pack nir_intrinsic_instr::const_index[] better	Marek Olšák	2019-11-23	1	-5/+84
\| \| \| \|	Reviewed-by: Connor Abbott <[email protected]>
*	nir/serialize: pack 1-component constants into 20 bits if possible	Marek Olšák	2019-11-23	1	-37/+135
\| \| \| \| \| \| \| \| \| \|	The majority of constants can be packed like this. v2: - use enum for the packing encoding, - trim packed_value to 20 bits add 1 bit to last_component, which simplifies a later commit Reviewed-by: Connor Abbott <[email protected]>
*	nir/serialize: pack load_const with non-64-bit constants better	Marek Olšák	2019-11-23	1	-2/+46
\| \| \| \| \| \| \|	v2: use blob_write_uint8/16 Reviewed-by: Jason Ekstrand <[email protected]> (v1) Reviewed-by: Connor Abbott <[email protected]>
*	nir/serialize: try to store a diff in var data locations instead of var data	Marek Olšák	2019-11-23	1	-15/+73
\| \| \| \|	Reviewed-by: Connor Abbott <[email protected]>
*	nir/serialize: deduplicate serialized var types by reusing the last unique one	Marek Olšák	2019-11-23	1	-10/+39
\| \| \| \|	Reviewed-by: Connor Abbott <[email protected]>
*	nir/serialize: don't serialize var->data for temporaries	Marek Olšák	2019-11-23	1	-12/+37
\| \| \| \|	Reviewed-by: Connor Abbott <[email protected]>
*	nir/serialize: pack src better and limit the object count to 1M from 1G	Marek Olšák	2019-11-23	1	-33/+75
\| \| \| \| \| \| \|	We need to limit the object count to 1M to free 10 bits for the src modifiers. Reviewed-by: Connor Abbott <[email protected]>
*	nir/serialize: pack instructions better	Marek Olšák	2019-11-23	1	-106/+297
\| \| \| \|	Reviewed-by: Connor Abbott <[email protected]>
*	nir/range_analysis: Make sure the table validation only occurs once	Ian Romanick	2019-11-22	1	-38/+58
\| \| \| \| \| \| \| \| \| \| \| \| \|	All of the tables are static const, so they only need to be validated once. As noted in the previous commit, the compiler should be able to eliminate all of this code when the assertions would pass. Even with the help of the previous commit, this does not always occur. -Og: -95.688 +/- 3.91935 (-24.9562% +/- 1.0222%) N=5 -O1: No difference proven at 95.0% confidence. N=5 -O2: -1.962 +/- 0.85001 (-0.860013% +/- 0.372589%) N=5 Reviewed-by: Eric Anholt <[email protected]>
*	nir/range-analysis: Add pragmas to help loop unrolling	Ian Romanick	2019-11-22	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I was pretty liberal with these assertions when I wrote this code because I had assumed that GCC would unroll the loops, inline the look ups of static const arrays with now constant indices, and then elmininate all the actuall assertions. It seems none of this happens even at -O3. Adding the pragmas helps encourage loop unrolling at some optimization levels. I tested by running shader-db with NIR_VALIDATE=false on a Core i7 Haswell desktop system. -Og: No difference proven at 95.0% confidence. N=5 -O1: -48.304 +/- 1.221 (-16.3343% +/- 0.412888%) N=5 -O2: -49.94 +/- 1.23521 (-17.9634% +/- 0.444303%) N=5 v2: Add a _Pragma to an inner loop that was accidentally dropped during a rebase. Reviewed-by: Eric Anholt <[email protected]>
*	nir: Add load_sampler_lod_paramaters_pan intrinsic	Alyssa Rosenzweig	2019-11-22	1	-0/+4
\| \| \| \| \| \| \| \| \|	This loads in the <min_lod, max_lod, lod_bias> settings for a given sampler, which is necessary for lowering clamps/biases on certain Midgard chips. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Tomeu Vizoso <[email protected]>
*	nir/serialize: do ctx = {0} instead of manual initializations	Marek Olšák	2019-11-21	1	-4/+2
\| \| \| \|	Reviewed-by: Connor Abbott <[email protected]>
*	nir: strip as we serialize to remove the nir_shader_clone call	Marek Olšák	2019-11-21	4	-133/+34
\| \| \| \| \| \|	Serializing stripped NIR is faster now. Reviewed-by: Connor Abbott <[email protected]>
*	nir: fix deref offset builder	Dave Airlie	2019-11-22	1	-1/+1
\| \| \| \| \| \|	Use the correct bit size Reviewed-by: Jason Ekstrand <[email protected]>
*	vtn/opencl: add clz support	Dave Airlie	2019-11-22	1	-0/+8
\| \| \| \| \| \|	This is needed for OpenCL Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: add 64-bit ufind_msb lowering support. (v2)	Dave Airlie	2019-11-22	2	-0/+24
\| \| \| \| \| \| \| \|	This adds the option to lower 64-bit ufind_msb opcodes. v2: use split_x/y removes component loops (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
*	spirv/nir/opencl: handle some multiply instructions.	Dave Airlie	2019-11-22	1	-0/+37
\| \| \| \| \| \| \|	This adds support for some missing 24-bit and hi multiply variants. Reviewed-by: Jason Ekstrand <[email protected]>
*	nir/validate: validate num_components on registers and intrinsics	Karol Herbst	2019-11-21	1	-8/+16
\| \| \| \| \| \| \| \| \| \| \|	also make 8 and 16 compoments invalid. We will enable that later again when we actually support it. v2: fix validation of nir_intrinsic_instr::num_components correct validation of instr->num_components Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir/large_constants: use nir_index_vars and nir_variable::index	Rhys Perry	2019-11-20	1	-12/+8
\| \| \| \| \|	Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
*	nir: add nir_variable::index and nir_index_vars	Rhys Perry	2019-11-20	2	-0/+41
\| \| \| \| \| \| \| \| \|	This will be useful as a deterministic identifier/index for the variable. v2: fix comment style Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Connor Abbott <[email protected]> (v1)
*	nir: make nir_variable::{num_members,num_state_slots} a uint16_t	Rhys Perry	2019-11-20	1	-2/+2
\| \| \| \| \| \| \|	Doesn't shrink it (at least, on x86-64) and leaves space for more members. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
*	nir/lower_alu_to_scalar: Support lowering 8- and 16-bit reduce ops	Neil Roberts	2019-11-20	1	-0/+8
\| \| \| \| \|	Reviewed-by: Rob Clark <[email protected]> Acked-by: Alyssa Rosenzweig <[email protected]>
*	nir: Add a 8-bit bool type	Neil Roberts	2019-11-20	2	-2/+12
\| \| \| \| \| \| \| \|	Adds nir_type_bool8 as well as 8-bit versions of all the bool opcodes. Reviewed-by: Rob Clark <[email protected]> Acked-by: Alyssa Rosenzweig <[email protected]>