mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	nir/loop_analyze: Refactor detection of limit vars	Jason Ekstrand	2019-07-18	1	-54/+51
\| \| \| \| \| \| \| \| \| \|	This commit reworks both get_induction_and_limit_vars() and try_find_trip_count_vars_in_iand to return true on success and not modify their output parameters on failure. This makes their callers significantly simpler. Reviewed-by: Timothy Arceri <[email protected]> (cherry picked from commit 0333649e638a38258957fd8b7e0367d73bbc7a80)
*	nir/regs_to_ssa: Handle regs in phi sources properly	Jason Ekstrand	2019-07-17	1	-2/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sources of phi instructions act as if they occur at the very end of the predecessor block not the block in which the phi lives. In order to handle them correctly, we have to skip phi sources on the normal instruction walk and handle them as a separate walk over the successor phis. While registers in phi instructions is a bit of an oddity it can happen when we temporarily go out-of-SSA for control-flow manipulations. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111075 Cc: [email protected] Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> (cherry picked from commit 6fb685fe4b762c8030f86895707516e2481e9ece)
*	nir,intel: Add support for lowering 64-bit nir_opt_extract_*	Jason Ekstrand	2019-07-16	2	-0/+39
\| \| \| \| \| \| \| \| \| \|	We need this when doing full software 64-bit emulation. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110309 Fixes: cbad201c2b3 "nir/algebraic: Add missing 64-bit extract_[iu]8..." Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (cherry picked from commit 0ba508d7a3b6a006b5b8db1e865d33efc8d0abd5)
*	nir/opt_if: Clean up single-src phis in opt_if_loop_terminator	Jason Ekstrand	2019-07-16	3	-0/+16
\| \| \| \| \| \| \|	Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111071 Fixes: 2a74296f24ba "nir: add opt_if_loop_terminator()" Reviewed-by: Timothy Arceri <[email protected]> (cherry picked from commit 7a19e05e8c84152af3a15868f5ef781142ac8e23)
*	nir/loop_analyze: Bail if we encounter swizzles	Jason Ekstrand	2019-07-15	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	None of the current code knows what to do with swizzles. Take the safe option for now and bail if we see one. This does have a small shader-db impact but it is at least safe. Shader-db results on Kaby Lake: total loops in shared programs: 4364 -> 4388 (0.55%) loops in affected programs: 5 -> 29 (480.00%) helped: 5 HURT: 29 Shader-db results on Haswell: total loops in shared programs: 4373 -> 4370 (-0.07%) loops in affected programs: 5 -> 2 (-60.00%) helped: 5 HURT: 2 Fixes: 6772a17acc8ee "nir: Add a loop analysis pass" Reviewed-by: Timothy Arceri <[email protected]> (cherry picked from commit 9a3cb6f5fec040dea4a229b93f789995b36f9c09)
*	nir/loop_analyze: Handle bit sizes correctly in calculate_iterations	Jason Ekstrand	2019-07-15	1	-27/+48
\| \| \| \| \| \| \| \| \| \| \|	The current code assumes everything is 32-bit which is very likely true but not guaranteed by any means. Instead, use nir_eval_const_opcode to do the calculations in a bit-size-agnostic way. We also use the new constant constructors to build the correct size constants. Fixes: 6772a17acc8ee "nir: Add a loop analysis pass" Reviewed-by: Timothy Arceri <[email protected]> (cherry picked from commit 268ad47c1115be8a8444d8e0e40af71623f9d281)
*	nir: Add more helpers for working with const values	Jason Ekstrand	2019-07-15	2	-0/+135
\| \| \| \| \|	Reviewed-by: Timothy Arceri <[email protected]> (cherry picked from commit ce5581e23e54be91e4c1ad6a6c5990eca6677ceb)
*	nir/loop_analyze: Fix phi-of-identical-alu detection	Jason Ekstrand	2019-07-15	1	-26/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	One issue was that the original version didn't check that swizzles matched when comparing ALU instructions so it could end up matching very different instructions. Using the nir_instrs_equal function from nir_instr_set.c which we use for CSE should be much more reliable. Another was that the loop assumes it will only run two iterations which may not be true. If there's something which guarantees that this case only happens for phis after ifs, it wasn't documented. Fixes: 9e6b39e1d521 "nir: detect more induction variables" Reviewed-by: Timothy Arceri <[email protected]> (cherry picked from commit 9f7ffe41dd185487479ea8846df1f5cdbf1b83a6)
*	nir/instr_set: Expose nir_instrs_equal()	Jason Ekstrand	2019-07-15	2	-59/+62
\| \| \| \| \|	Reviewed-by: Timothy Arceri <[email protected]> (cherry picked from commit 6e984bcb92cf5e8b7da7387bc73cf6519ea2f43d)
*	nir: Add a helper to determine if an intrinsic can be reordered	Connor Abbott	2019-07-15	3	-11/+13
\| \| \| \| \| \| \| \|	This is simple now, but we're going to be adding a few more conditions to this later. Reviewed-by: Timothy Arceri <[email protected]> (cherry picked from commit a1c737927c0d96f26ce487930aa9a2ed323814c9)
*	nir: Use nir_src_bit_size instead of alu1->dest.dest.ssa.bit_size	Ian Romanick	2019-07-09	2	-1/+218
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is important because, for example nir_op_fne has dest.dest.ssa.bit_size == 1, but the source operands can be 16-, 32-, or 64-bits. Fixing this helps partial redundancy elimination for compares in a few more shaders. v2: Add unit tests for nir_opt_comparison_pre that are fixed by this commit. All Intel platforms had similar results. total instructions in shared programs: 17179408 -> 17179081 (<.01%) instructions in affected programs: 43958 -> 43631 (-0.74%) helped: 118 HURT: 2 helped stats (abs) min: 1 max: 5 x̄: 2.87 x̃: 2 helped stats (rel) min: 0.06% max: 4.12% x̄: 1.19% x̃: 0.81% HURT stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6 HURT stats (rel) min: 5.83% max: 6.06% x̄: 5.94% x̃: 5.94% 95% mean confidence interval for instructions value: -3.08 -2.37 95% mean confidence interval for instructions %-change: -1.30% -0.85% Instructions are helped. total cycles in shared programs: 360959066 -> 360942386 (<.01%) cycles in affected programs: 774274 -> 757594 (-2.15%) helped: 111 HURT: 4 helped stats (abs) min: 1 max: 1591 x̄: 169.49 x̃: 36 helped stats (rel) min: <.01% max: 24.43% x̄: 8.86% x̃: 2.24% HURT stats (abs) min: 1 max: 2068 x̄: 533.25 x̃: 32 HURT stats (rel) min: 0.02% max: 5.10% x̄: 3.06% x̃: 3.56% 95% mean confidence interval for cycles value: -200.61 -89.47 95% mean confidence interval for cycles %-change: -10.32% -6.58% Cycles are helped. Reviewed-by: Jason Ekstrand <[email protected]> [v1] Suggested-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]> Fixes: be1cc3552bc ("nir: Add nir_const_value_negative_equal") (cherry picked from commit 0ac5ff9ecb26ebc07a48e4f15539f975cef9b82a)
*	nir: Add unit tests for nir_opt_comparison_pre	Ian Romanick	2019-07-09	4	-1/+334
\| \| \| \| \| \| \| \| \| \|	Each tests has a comment with the expected before and after NIR. The tests don't actually check this. The tests only check whether or not the optimization pass reported progress. I couldn't think of a robust, future-proof way to check the before and after code. Reviewed-by: Matt Turner <[email protected]> (cherry picked from commit b08d7040518cdf76792952ceef72cadaa54d0179)
*	nir/propagate_invariant: Don't add NULL vars to the hash table	Jason Ekstrand	2019-06-06	1	-1/+10
\| \| \| \| \| \| \| \|	Fixes: 8410cf66d "nir/propagate_invariant: Skip unknown vars" Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Eric Anholt <[email protected]> (cherry picked from commit d96878a66a559f6690f01e82f06fcf92ae958d3c)
*	nir: Actually propagate progress in nir_opt_move_load_ubo.	Bas Nieuwenhuizen	2019-06-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Found with Jasons new metadata rework (https://gitlab.freedesktop.org/mesa/mesa/merge_requests/950). Fixes: af355aaa071 "nir: add nir_opt_move_load_ubo() optimization pass" Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> (cherry picked from commit e24a7840f60ac2290761ea2dc2831e8c3ba8bbfc)
*	nir/dead_cf: Call instructions aren't dead	Jason Ekstrand	2019-05-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	When we inlined cf_node_has_side_effects into node_is_dead, all the conditions flipped and we forgot to flip one. Fortunately, it doesn't matter right now because no one uses this pass on shaders with more than one function. Fixes: b50465d197 "nir/dead_cf: Inline cf_node_has_side_effects" Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> (cherry picked from commit 8948048c6f01209bac0051e41cd84c38853bd251)
*	nir/lower_non_uniform: safely iterate over blocks	Lionel Landwerlin	2019-05-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a problem where the same instruction gets replaced twice. This was happening when the replaced instruction would be at the end of a block. Replacement of : if ssa_8 { .... intrinsic bindless_image_store (ssa_44, ssa_16, ssa_0, ssa_15) (5, 0, 34836, 32) /* image_dim=Buf / / image_array=false / / format=34836 / / access=32 / } Would be : if ssa_8 { loop { vec1 32 ssa_47 = intrinsic read_first_invocation (ssa_44) () vec1 1 ssa_48 = ieq ssa_47, ssa_44 if ssa_48 { loop { vec1 32 ssa_49 = intrinsic read_first_invocation (ssa_44) () vec1 1 ssa_50 = ieq ssa_49, ssa_44 if ssa_50 { intrinsic bindless_image_store (ssa_44, ssa_16, ssa_0, ssa_15) (5, 0, 34836, 32) / image_dim=Buf / / image_array=false / / format=34836 / / access=32 */ break } else { .... } Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: 3bd545764151 ("nir: Add a lowering pass for non-uniform resource access") Reviewed-by: Jason Ekstrand <[email protected]> (cherry picked from commit 366811bedb67ae7d31a02ea9b1f9fa942fb93602)
*	nir: Fix clone of nir_variable state slots	Caio Marcelo de Oliveira Filho	2019-05-21	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	When num_state_slots is 0, don't create the array. This was triggering the following assert when running vkcube with NIR_TEST_CLONE=1 vkcube: ../src/compiler/nir/nir_split_per_member_structs.c:66: split_variable: Assertion `var->state_slots == NULL' failed. Fixes: 9fbd390dd4b "nir: Add support for cloning shaders" Reviewed-by: Jason Ekstrand <[email protected]> (cherry picked from commit 005cc9ae37ca45960d87389dc9eace5ed29d1b99)
*	nir: Fix nir_opt_idiv_const when negatives are involved	Caio Marcelo de Oliveira Filho	2019-05-21	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	First, allow the case for negative powers of two. Then ensure that we use the absolute value of the non-constant value to calculate the quotient -- this was hinted in the code by the name 'uq'. This fixes an issue when 'd' is positive and 'n' is negative. The ishr will propagate the negative sign and we'll use nir_ineg() again, incorrectly. v2: First version used only ishr, but that isn't sufficient, since it never can produce a zero as a result. (Jason) Allow negative powers of two. (Caio) Fixes: 74492ebad94 "nir: Add a pass for lowering integer division by constants" Reviewed-by: Jason Ekstrand <[email protected]> (cherry picked from commit 8a995f2b5e1e3f2a2eafd32870ebfb43b5cfdf27)
*	nir: lower_non_uniform_access: iterate over instructions safely	Lionel Landwerlin	2019-05-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	This pass moves instructions around and adds control-flow in the middle of blocks. We need to use nir_foreach_instr_safe to ensure that we iterate over instructions correctly anyway. Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: 3bd545764151 ("nir: Add a lowering pass for non-uniform resource access") Reviewed-by: Jason Ekstrand <[email protected]> (cherry picked from commit e04cf0b61269ca60b3260d81d94e625965d39901)
*	nir: fix lower_non_uniform_access pass	Lionel Landwerlin	2019-05-16	1	-0/+1
\| \| \| \| \| \| \| \| \|	Obviously missing the instruction insertion into the SSA list. Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: 3bd545764151 ("nir: Add a lowering pass for non-uniform resource access") Reviewed-by: Jason Ekstrand <[email protected]> (cherry picked from commit 391a836e8fb1c84170f3aa7550f0b347d31528f3)
*	Revert "nir: add late opt to turn inot/b2f combos back to bcsel"	Ian Romanick	2019-05-15	2	-19/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 7acc8652268205a266068ea4d059eccce43e1f78. With these optimizations in place, the extra constant folding added in the next commit extends some live ranges of 0.0 and ±1.0 constants, and that causes several hundred shaders to have more spills and fills. I believe this optimization we made basically irrelevant by 7725d609387 "intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))". All Gen7.5+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17225303 -> 17224634 (<.01%) instructions in affected programs: 879402 -> 878733 (-0.08%) helped: 679 HURT: 1 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.03% max: 0.93% x̄: 0.24% x̃: 0.05% HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 HURT stats (rel) min: 0.45% max: 0.45% x̄: 0.45% x̃: 0.45% 95% mean confidence interval for instructions value: -1.02 -0.95 95% mean confidence interval for instructions %-change: -0.26% -0.22% Instructions are helped. total cycles in shared programs: 360842595 -> 360828542 (<.01%) cycles in affected programs: 110443594 -> 110429541 (-0.01%) helped: 389 HURT: 265 helped stats (abs) min: 1 max: 7525 x̄: 162.81 x̃: 28 helped stats (rel) min: <.01% max: 18.66% x̄: 1.11% x̃: 0.11% HURT stats (abs) min: 1 max: 7614 x̄: 185.96 x̃: 48 HURT stats (rel) min: <.01% max: 25.08% x̄: 0.95% x̃: 0.10% 95% mean confidence interval for cycles value: -75.65 32.67 95% mean confidence interval for cycles %-change: -0.49% -0.06% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 12159 -> 12161 (0.02%) spills in affected programs: 13 -> 15 (15.38%) helped: 0 HURT: 1 total fills in shared programs: 25207 -> 25208 (<.01%) fills in affected programs: 25 -> 26 (4.00%) helped: 0 HURT: 1 Ivy Bridge total instructions in shared programs: 12082019 -> 12082013 (<.01%) instructions in affected programs: 1033 -> 1027 (-0.58%) helped: 6 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.41% max: 0.83% x̄: 0.61% x̃: 0.59% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.78% -0.45% Instructions are helped. total cycles in shared programs: 179849270 -> 179849157 (<.01%) cycles in affected programs: 4735 -> 4622 (-2.39%) helped: 4 HURT: 0 helped stats (abs) min: 2 max: 74 x̄: 28.25 x̃: 18 helped stats (rel) min: 0.13% max: 6.53% x̄: 2.85% x̃: 2.36% 95% mean confidence interval for cycles value: -82.73 26.23 95% mean confidence interval for cycles %-change: -7.98% 2.28% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10882750 -> 10882748 (<.01%) instructions in affected programs: 266 -> 264 (-0.75%) helped: 2 HURT: 0 Iron Lake total cycles in shared programs: 188609440 -> 188609448 (<.01%) cycles in affected programs: 4320 -> 4328 (0.19%) helped: 0 HURT: 2 GM45 total cycles in shared programs: 129016868 -> 129016872 (<.01%) cycles in affected programs: 2302 -> 2306 (0.17%) helped: 0 HURT: 1 Reviewed-by: Matt Turner <[email protected]> (cherry picked from commit d2a9ba03e30602f040687da325470d72eeddef1a) [Juan: resolve trivial conflicts] Signed-off-by: Juan A. Suarez Romero <[email protected]> Conflicts: src/compiler/nir/nir_opt_algebraic.py
*	nir: Add nir_op_vec helper	Karol Herbst	2019-05-04	2	-13/+13
\| \| \| \| \| \| \| \| \|	with that we can simplify code where nir vectors are created v2: merge both lines in nir_vec Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Add a nir_builder_alu variant which takes an array of components	Karol Herbst	2019-05-04	1	-14/+36
\| \| \| \| \| \| \|	v2: rename to nir_build_alu_src_arr Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Add a SSA type gathering pass	Jason Ekstrand	2019-05-04	3	-0/+222
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This new pass (which isn't even compile-tested) attempts to determine the ALU type of all the SSA values in a function impl. It takes a greedy approach and assigns intness or floatness to everything it thinks can possibly contain an int or a float. Some values will be labled as both int and float and some will be labled as neither and it is up to the caller to decide what to do with this information. However, for a "nice" shader where the original source contained no bit-casts and no implicit bit-casts were introduced by optimizations, there shouldn't be any overlap in the two sets save for the odd CSEd zero constant. Reviewed-by: Vasily Khoruzhick <[email protected]>
*	nir/algebraic: Don't emit empty initializers for MSVC	Connor Abbott	2019-05-04	1	-0/+4
\| \| \| \| \| \| \| \| \|	Just don't emit the transform array at all if there are no transforms v2: - Don't use len(array) > 0 (Dylan) - Keep using ARRAY_SIZE to make the generated C code easier to read (Jason).
*	nir: fix lower vars to ssa for larger vector sizes.	Dave Airlie	2019-05-03	1	-4/+4
\| \| \| \| \| \| \|	This has a couple of hardcoded vec4 limits in it, change them to the proper sizing to avoid future issues. Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: fix nir tex print harder	Rob Clark	2019-05-02	1	-6/+5
\| \| \| \| \| \|	Fixes: 691d5a825a6 nir: rework tex instruction printing Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Rob Clark <[email protected]>
*	nir: add pass to lower fb reads	Rob Clark	2019-05-02	4	-6/+138
\| \| \| \| \|	Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
*	nir: fix lower_wpos_ytransform in load_frag_coord case	Rob Clark	2019-05-02	1	-10/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Apparently we never hit this path. Or at least haven't for a rather long time. But in either case (load_deref or load_frag_coord), we can just directly use the intrinsic's ssa dest. So stop passing the nir_variable (which would be NULL in the load_frag_coord case) around and instead just use &intr->dest.ssa. (This ofc means we need to setup the cursor to insert after the instruction, which seems to be another bug of the original implementation.) Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
*	nir: rework tex instruction printing	Rob Clark	2019-05-02	1	-8/+10
\| \| \| \| \| \| \|	The extra comma at the end was annoying me. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
*	nir/search: Add debugging code to dump the pattern matched	Connor Abbott	2019-05-02	1	-0/+75
\| \| \| \| \| \|	This was useful while debugging the previous commit. Reviewed-by: Jason Ekstrand <[email protected]>
*	nir/search: Add automaton-based pre-searching	Connor Abbott	2019-05-02	3	-19/+425
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nir_opt_algebraic is currently one of the most expensive NIR passes, because of the many different patterns we've added over the years. Even though patterns are already sorted by opcode, there are still way too many patterns for common opcodes like bcsel and fadd, which means that many patterns are tried but only a few actually match. One way to fix this is to add a pre-pass over the code that scans it using an automaton constructed beforehand, similar to the automatons produced by lex and yacc for parsing source code. This automaton has to walk the SSA graph and recognize possible pattern matches. It turns out that the theory to do this is quite mature already, having been developed for instruction selection as well as other non-compiler things. I followed the presentation in the dissertation cited in the code, "Tree algorithms: Two Taxonomies and a Toolkit," trying to keep the naming similar. To create the automaton, we have to perform something like the classical NFA to DFA subset construction used by lex, but it turns out that actually computing the transition table for all possible states would be way too expensive, with the dissertation reporting times of almost half an hour for an example of size similar to nir_opt_algebraic. Instead, we adopt one of the "filter" approaches explained in the dissertation, which trade much faster table generation and table size for a few more table lookups per instruction at runtime. I chose the filter which resulted the fastest table generation time, with medium table size. Right now, the table generation takes around .5 seconds, despite being implemented in pure Python, which I think is good enough. Based on the numbers in the dissertation, the other choice might make table compilation time 25x slower to get 4x smaller table size, but I don't think that's worth it. As of now, we get the following binary size before and after this patch: text data bss dec hex filename 11979455 464720 730864 13175039 c908ff before i965_dri.so text data bss dec hex filename 12037835 616244 791792 13445871 cd2aef after i965_dri.so There are a number of places where I've simplified the automaton by getting rid of details in the LHS patterns rather than complicate things to deal with them. For example, right now the automaton doesn't distinguish between constants with different values. This means that it isn't as precise as it could be, but the decrease in compile time is still worth it -- these are the compilation time numbers for a shader-db run with my (admittedly old) database on Intel skylake: Difference at 95.0% confidence -42.3485 +/- 1.375 -7.20383% +/- 0.229926% (Student's t, pooled s = 1.69843) We can always experiment with making it more precise later. Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Saturating integer arithmetic is not associative	Ian Romanick	2019-05-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In 8-bits, iadd_sat(iadd_sat(0x7f, 0x7f), -1) = iadd_sat(0x7f, -1) = 0x7e but, iadd_sat(0x7f, iadd_sat(0x7f, -1)) = iadd_sat(0x7f, 0x7e) = 0x7f Fixes: 272e927d0e9 ("nir/spirv: initial handling of OpenCL.std extension opcodes") Reviewed-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: improve convert_yuv_to_rgb	Jonathan Marek	2019-05-01	1	-15/+14
\| \| \| \| \| \| \| \| \| \| \| \|	Use a different arrangement of constants to allow more ffma. A vec4 backend will now use 3 fma for yuv_to_rgb. On freedreno/ir3, it is down from 10 to 7 alu (4 fma, 3 mul, 3 add to 7 fma). Other backends shouldn't be hurt. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Ian Romanick <[email protected]>
*	delete autotools .gitignore files	Eric Engestrom	2019-04-29	2	-8/+0
\| \| \| \| \| \| \| \|	One special case, `src/util/xmlpool/.gitignore` is not entirely deleted, as `xmlpool.pot` still gets generated (eg. by `ninja xmlpool-pot`). Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
*	nir: Add a new nir_cf_list_is_empty_block() helper.	Kenneth Graunke	2019-04-28	1	-0/+15
\| \| \| \| \| \| \|	Helper and name suggested by Eric Anholt. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	nir: add rcp(w) lowering for gl_FragCoord	Andreas Baierl	2019-04-29	3	-0/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	On some hardware (e.g. Mali400) the shader needs to apply some transformations for correct gl_FragCoord handling. The lowering actions look like the following in pseudocode: gl_FragCoord.xyz = gl_FragCoord_orig.xyz gl_FragCoord.w = 1.0 / gl_FragCoord_orig.w Add this lowering as a nir pass in preparation for using it in the driver. Signed-off-by: Andreas Baierl <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	nir: use braces around subobject in initializer	Tapani Pälli	2019-04-26	2	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Used same syntax as elsewhere with Mesa sources, verified result against MSVC with godbolt.org. fixes following warning with clang: warning: suggest braces around initialization of subobject v2: empty braces -> braces around subobject (Caio, Kristian) Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
*	nir/algebraic: Optimize integer cast-of-cast	Jason Ekstrand	2019-04-26	1	-0/+42
\| \| \| \| \| \| \|	These have been popping up more and more with the OpenCL work and other bits causing extra conversions to/from 64-bit. Reviewed-by: Karol Herbst <[email protected]>
*	nir: fix bit_size in lower indirect derefs.	Dave Airlie	2019-04-26	1	-1/+1
\| \| \| \| \| \| \| \|	This fixes a case where we are expecting 64-bit but generate 32-bit consts and validate gets angry. Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
*	freedreno/ir3: lower load_barycentric_at_offset	Rob Clark	2019-04-25	1	-0/+3
\| \| \| \| \| \| \| \| \|	Calculates i,j at specified offset within a pixel. A new load_size_ir3 intrinsic is used in conjunction with fddx/fddy to translate the offset into primitive space and adjust the i,j from load_barycentric_pixel accordingly. Signed-off-by: Rob Clark <[email protected]>
*	freedreno/ir3: lower load_barycentric_at_sample	Rob Clark	2019-04-25	1	-0/+7
\| \| \| \| \| \| \|	This lowers load_barycentric_at_sample to load_sample_pos_from_id plus load_barycentric_at_offset. Signed-off-by: Rob Clark <[email protected]>
*	nir: Add option to lower tex to txl when shader don't support implicit LOD	Caio Marcelo de Oliveira Filho	2019-04-25	2	-0/+8
\| \| \| \| \| \| \| \| \| \| \|	We already add the LOD src, so go ahead and update the texop as well when this option is set. v2: Make it an option. (Rob Clark) v3: Use a more concise name suggested by Jason. Reviewed-by: Rob Clark <[email protected]>
*	nir: fix nir_remove_unused_varyings()	Timothy Arceri	2019-04-25	1	-18/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were only setting the used mask for the first component of a varying. Since the linking opts split vectors into scalars this has mostly worked ok. However this causes an issue where for example if we split a struct on one side of the interface but not the other, then we can possibly end up removing the first components on the side that was split and then incorrectly remove the whole struct on the other side of the varying. With this change we simply mark all 4 components for each slot used by a struct. We could possibly make this more fine gained but that would require a more complex change. This fixes a bug in Strange Brigade on RADV when tessellation is enabled, all credit goes to Samuel Pitoiset for tracking down the cause of the bug. Fixes: f1eb5e639997 ("nir: add component level support to remove_unused_io_vars()") Reviewed-by: Samuel Pitoiset <[email protected]>
*	nir: Use the NIR_SRC_AS_ macro to define nir_src_as_deref	Jason Ekstrand	2019-04-22	1	-14/+4
\| \| \| \| \| \| \| \|	We have a macro for this now; no reason to hand-roll it for derefs. While we're here, move the NIR_DEFINE_CAST for derefs down to where all the other ones are. Reviewed-by: Eric Anholt <[email protected]>
*	nir: Add helpers for getting the type of an address format	Jason Ekstrand	2019-04-19	1	-0/+33
\| \| \| \|	Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	intel,nir: Lower TXD with a bindless sampler	Jason Ekstrand	2019-04-19	2	-0/+8
\| \| \| \| \| \| \| \| \|	When we have a bindless sampler, we need an instruction header. Even in SIMD8, this pushes the instruction over the sampler message size maximum of 11 registers. Instead, we have to lower TXD to TXL. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	nir/lower_io: Expose some explicit I/O lowering helpers	Jason Ekstrand	2019-04-19	2	-42/+65
\| \| \| \| \|	Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	nir_opcodes.py: Saturate to expression that doesn't overflow	Kristian H. Kristensen	2019-04-19	2	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Compiler warns about overflow when assigning UINT64_MAX to something smaller than a uin64_t: src/compiler/nir/nir_constant_expressions.c:16909:50: warning: implicit conversion from 'unsigned long long' to 'uint1_t' (aka 'unsigned char') changes value from 18446744073709551615 to 255 [-Wconstant-conversion] uint1_t dst = (src0 + src1) < src0 ? UINT64_MAX : (src0 + src1); ~~~ ^~~~~~~~~~ Shift UINT64_MAX down to the appropriate maximum value for the type being assigned to. Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Use the nir_builder _imm helpers in setting up deref offsets.	Eric Anholt	2019-04-19	1	-4/+3
\| \| \| \| \| \| \| \| \| \|	When looking at the dEQP nested_struct_array_dynamic_index_fragment code after lowering, I was horrified at the amount of adding and multiplying by 0 we were doing. The builder _imm helpers handle that for you so that the following optimization passes have less work to do. Plus, it's easier to read. Reviewed-by: Jason Ekstrand <[email protected]>