aboutsummaryrefslogtreecommitdiffstats
path: root/src/compiler
Commit message (Collapse)AuthorAgeFilesLines
* nir/spirv: return after emitting a branch in blockcros-mesa-19.0-r1-vanillachadv/cros-mesa-19.0-r1-vanillaJuan A. Suarez Romero2019-02-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | When emitting a branch in a block, it does not make sense to continue processing further instructions, as they will not be reachable. This fixes a nasty case with a loop with a branch that both then-part and else-part exits the loop: %1 = OpLabel OpLoopMerge %2 %3 None OpBranchConditional %false %2 %2 %3 = OpLabel OpBranch %1 %2 = OpLabel [...] We know that block %1 will branch always to block %2, which is the merge block for the loop. And thus a break is emitted. If we keep continuing processing further instructions, we will be processing the branch conditional and thus emitting the proper NIR conditional, which leads to instructions after the break. This fixes dEQP-VK.graphicsfuzz.continue-and-merge. CC: Jason Ekstrand <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* glsl: fix shader cache for packed param listTimothy Arceri2019-02-281-11/+4
| | | | | | | | | | | | | | | Some types of params such as some builtins are always padded. We need to keep track of this so we can restore the list correctly. Here we also remove a couple of cache entries that are not actually required as they get rebuilt by the _mesa_add_parameter() calls. This patch fixes a bunch of arb_texture_multisample and arb_sample_shading piglit tests for the radeonsi NIR backend. Fixes: edded1237607 ("mesa: rework ParameterList to allow packing") Reviewed-by: Marek Olšák <[email protected]>
* nir: Add posibility to not lower to source mod 'abs' for ops with three sourcesGert Wollny2019-02-272-1/+8
| | | | | | | | | | This is useful for r600 since there the abs source modifier is not supported for ops with three sources v2: Use correct logic to enable lowering to abs source mod (Eric Anhold) Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir/lower_tex: Add support for XYUV loweringKasireddy, Vivek2019-02-262-0/+21
| | | | | | | | | | | The memory layout associated with this format would be: Byte: 0 1 2 3 Component: V U Y X Signed-off-by: Vivek Kasireddy <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* nir: initialize value in copy_prop_vars_blockTapani Pälli2019-02-261-1/+1
| | | | | | | | | | | | | Fixes following valgrind warning: ==27561== Conditional jump or move depends on uninitialised value(s) ==27561== at 0x667856B: value_set_ssa_components (nir_opt_copy_prop_vars.c:78) ==27561== by 0x667A1C4: copy_prop_vars_block (nir_opt_copy_prop_vars.c:797) Fixes: 62332d139c8 "nir: Add a local variable-based copy propagation pass" Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Just return when asked to rewrite uses of an SSA def to itself.Eric Anholt2019-02-251-1/+2
| | | | | | | | | | The nir_builder swizzling improvement to not emit extra MOVs resulted in nir_lower_tex() trying to rewrite an SSA def to itself, triggering the assert on all texturing in v3d. There's no work to be done in this case, so just stop asserting. Fixes: 743700be1f58 ("nir/builder: Don't emit no-op swizzles") Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Use SM5 properties to optimize shift(a@32, iand(31, b))Daniel Schürmann2019-02-251-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a common pattern from HLSL->SPIRV translation and supported in HW by all current NIR backends. vkpipeline-db results anv (SKL): total instructions in shared programs: 6403130 -> 6402380 (-0.01%) instructions in affected programs: 204084 -> 203334 (-0.37%) helped: 208 HURT: 0 total cycles in shared programs: 1915629582 -> 1918198408 (0.13%) cycles in affected programs: 1158892682 -> 1161461508 (0.22%) helped: 107 HURT: 86 shader-db results on i965 (KBL): total instructions in shared programs: 15284592 -> 15284568 (<.01%) instructions in affected programs: 81683 -> 81659 (-0.03%) helped: 24 HURT: 0 total cycles in shared programs: 375013622 -> 375013932 (<.01%) cycles in affected programs: 40169618 -> 40169928 (<.01%) helped: 13 HURT: 9 Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Define shifts according to SM5 specification.Daniel Schürmann2019-02-251-4/+6
| | | | | | | SPIR-V shifts are undefined for values >= bitsize, but SM5 shifts are defined to only use the least significant bits. Reviewed-by: Jason Ekstrand <[email protected]>
* glsl: Fix function return typecheckingOscar Blumberg2019-02-251-1/+2
| | | | | | | | apply_implicit_conversion only converts and check base types but we need actual type equality for function returns, otherwise you can return a vec2 from a function declared as returning a float. Reviewed-by: Tapani Pälli <[email protected]>
* nir/builder: Don't emit no-op swizzlesJason Ekstrand2019-02-241-1/+9
| | | | | | | | | | | The nir_swizzle helper is used some on it's own but it's also called by nir_channel and nir_channels which are used everywhere. It's pretty quick to check while we're walking the swizzle anyway whether or not it's an identity swizzle. If it is, we now don't bother emitting the instruction. Sure, copy-prop will clean it up for us but there's no sense making more work for the optimizer than we have to. Reviewed-by: Ian Romanick <[email protected]>
* nir/split_vars: Don't compact vectors unnecessarilyJason Ekstrand2019-02-241-0/+6
| | | | Reviewed-by: Alejandro Piñeiro <[email protected]>
* nir: fix MSVC buildCaio Marcelo de Oliveira Filho2019-02-221-1/+1
| | | | Zero initialize struct with {0} instead of {}.
* nir/copy_prop_vars: add tests for load/store elements of vectorsCaio Marcelo de Oliveira Filho2019-02-221-0/+139
| | | | | | | Test using array deref on vectors in loads and stores. These are marked DISABLED_ as this optimization is currently not done. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: nir_build_deref_follower accept array derefs of vectorsCaio Marcelo de Oliveira Filho2019-02-221-1/+3
| | | | | | | Code itself already supports it, just make sure we can use it for those cases. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/copy_prop_vars: change test helper to get intrinsicsCaio Marcelo de Oliveira Filho2019-02-221-83/+56
| | | | | | | | | | | | | | | | | | Replace find_next_intrinsic(intrinsic, after) with get_intrinsic(intrinsic, index). This makes slightly more convenient to check the resulting loads/stores/copies, since in most tests we know which one we care about. The cost is to perform more traversals, but for such tests this is not a problem. Added the ASSERT_EQ() on count to some tests missing it, so the indices queried are always expected to find something. Also, drop two nir_print_shader leftover calls in a test. v2: Remove redundant assertions. nir_src_comp_as_uint already assert what we need. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* nir/copy_prop_vars: keep track of components in copy_entryCaio Marcelo de Oliveira Filho2019-02-221-33/+48
| | | | | | | | | | | | | When a copy_entry is SSA, store not only the nir_ssa_def* for each component, but also the source component they come from. At the moment this is always a match (i.e. 'component[i] == i'), because all the operations for a copy_entry happen using definitions with the same size. This prepares the code for array_derefs of vectors, in which 'component[i] != i'. Also, extract setting all SSA components into a function of its own. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/copy_prop_vars: add debug helpersCaio Marcelo de Oliveira Filho2019-02-221-1/+87
| | | | | | | | Disabled by default, to be used during development. Adding those so I don't rewrite some ad-hoc version of them everytime I'm working with this pass. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/copy_prop_vars: don't get confused by array_deref of vectorsCaio Marcelo de Oliveira Filho2019-02-221-0/+28
| | | | | | | | | | | | | For now these derefs are not handled, so don't let these get into the copies list -- which would cause wrong propagations. For load_derefs, do nothing. For store_derefs, invalidate whatever the store is writing to. For copy_derefs, invalidate whatever the copy is writing to. These cases will happen once derefs to SSBOs/UBOs are kept around long enough to get optimized by copy_prop_vars. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: allow nir_lower_phis_to_scalar() on more src typesTimothy Arceri2019-02-231-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than only lowering if all srcs are scalarizable we instead check that at least one src is scalarizable. We change undef type to return false otherwise it will cause regressions when it is the only scalarizable src. total instructions in shared programs: 13219105 -> 13024547 (-1.47%) instructions in affected programs: 1153797 -> 959239 (-16.86%) helped: 581 HURT: 74 total cycles in shared programs: 333968972 -> 324807922 (-2.74%) cycles in affected programs: 129809402 -> 120648352 (-7.06%) helped: 571 HURT: 131 total spills in shared programs: 57947 -> 29130 (-49.73%) spills in affected programs: 53364 -> 24547 (-54.00%) helped: 351 HURT: 0 total fills in shared programs: 51310 -> 25468 (-50.36%) fills in affected programs: 44882 -> 19040 (-57.58%) helped: 351 HURT: 0 Reviewed-by: Jason Ekstrand <[email protected]>
* nir: clone instruction set rather than removing individual entriesTimothy Arceri2019-02-221-3/+3
| | | | | | | | | | | This reduces the time spent in nir_opt_cse() by almost a half. The massif tool from callgrind reported no change in peak memory use with the large doliphin uber shaders I used for testing. Reviewed-by: Thomas Helland<[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/lower_clip_cull: Fix an incorrect assertJason Ekstrand2019-02-211-1/+1
| | | | | | | | Copy+paste error. It was supposed to test cull and not clip. Fixes: 4e69fba534e "nir: Rewrite lower_clip_cull_distance_arrays..." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109717 Reviewed-by: Lionel Landwerlin <[email protected]>
* nir: Fix a compile warningJason Ekstrand2019-02-211-1/+1
|
* nir, glsl: move pixel_center_integer/origin_upper_left to shader_info.fsAlejandro Piñeiro2019-02-2112-44/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On GLSL that info is set as a layout qualifier when redeclaring gl_FragCoord, so somehow tied to a specific variable. But in practice, they behave as a global of the shader. On ARB programs they are set using a global OPTION (defined at ARB_fragment_coord_conventions), and on SPIR-V using ExecutionModes, that are also not tied specifically to the builtin. This patch moves that info from nir variable and ir variable to nir shader and gl_program shader_info respectively, so the map is more similar to SPIR-V, and ARB programs, instead of more similar to GLSL. FWIW, shader_info.fs already had pixel_center_integer, so this change also removes some redundancy. Also, as struct gl_program also includes a shader_info, we removed gl_program::OriginUpperLeft and PixelCenterInteger, as it would be superfluous. This change was needed because recently spirv_to_nir changed the order in which execution modes and variables are handled, so the variables didn't get the correct values. Now the info is set on the shader itself, and we don't need to go back to the builtin variable to set it. Fixes: e68871f6a ("spirv: Handle constants and types before execution modes") v2: (Jason) * glsl_to_nir: get the info before glsl_to_nir, while all the rest of the info gathering is happening * prog_to_nir: gather the info on a general info-gathering pass, not on variable setup. v3: (Jason) * Squash with the patch that removes that info from ir variable * anv: assert that OriginUpperLeft is true. It should be already set by spirv_to_nir. * blorp: set origin_upper_left on its core "compile fragment shader", not just on some specific places (for this we added an helper on a previous patch). * prog_to_nir: no need to gather specifically this fragcoord modes as the full gl_program shader_info is copied. * spirv_to_nir: assert that we are a fragment shader when handling this execution modes. v4: (reported by failing gitlab pipeline #18750) * state_tracker: update too due changes on ir.h/gl_program v5: * blorp: minor change after change on previous patch * radeonsi: update due this change. v6: (Timothy Arceri) * prog_to_nir: remove extra whitespace * shader_info: don't use :1 on origin_upper_left * glsl: program.fs.origin_upper_left/pixel_center_integer can be move out of the shader list loop
* nir/xfb: Handle compact arrays in gather_xfb_infoJason Ekstrand2019-02-211-11/+22
| | | | | | | This makes us properly handle gl_ClipDistance and gl_CullDistance. Fixes: 19064b8c "nir: Add a pass for gathering transform feedback info" Reviewed-by: Alejandro Piñeiro <[email protected]>
* nir/xfb: Work in terms of components rather than slotsJason Ekstrand2019-02-211-5/+5
| | | | | | | | | | | | | | | | | | We needed to better handle cases where a chunk of a variable starts at some non-zero location_frac and rolls over into the next slot but may not be more than 4 dwords. For example, if gl_CullDistance is an array of 3 things and has location_frac = 2, it will span across two vec4s but is not, itself, bigger than a vec4. If you ignore the clip/cull special case, it's not allowed to happen for anything else because the only things that can span more than one slot is dvec3 and dvec4 and they're both bigger than a vec4. The current code uses this attrib_slot thing where we count attribute slots and iterate over them. However, that doesn't work in the case above because gl_CullDistance will have an attrib_slot count of 1 even though it does span two slots. We could fix this by adjusting attrib_slot but we already have comp_mask and it's easier to just handle it that way. Reviewed-by: Alejandro Piñeiro <[email protected]>
* nir: Rewrite lower_clip_cull_distance_arrays to do a lot less loweringJason Ekstrand2019-02-211-113/+19
| | | | | | | | | | Instead of going to all the work of to combine them into one array, just make two arrays and use location_frac to colocate them within CLIP0. Then the back-end can sort things out and stack them on top of each other. Thanks to ef99f4c8, we also don't need to set compact anymore. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/xfb: Properly align 64-bit valuesJason Ekstrand2019-02-211-0/+4
| | | | | Fixes: 19064b8c "nir: Add a pass for gathering transform feedback info" Reviewed-by: Alejandro Piñeiro <[email protected]>
* compiler/types: Add a contains_64bit helperJason Ekstrand2019-02-214-0/+29
| | | | Reviewed-by: Alejandro Piñeiro <[email protected]>
* nir: remove non-ssa support from nir_copy_prop()Timothy Arceri2019-02-211-36/+5
| | | | | | | | | Even in a very basic shader this reduces the time spent in nir_copy_prop() by ~17%. No shader-db changes for radeonsi NIR or i965. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Don't forget if-uses in new nir_opt_dead_cf liveness checkKenneth Graunke2019-02-201-0/+10
| | | | | | | | | | | | | | | | Commit 08bfd710a25c14df5f690cce9604617536d7c560. (nir/dead_cf: Stop relying on liveness analysis) introduced a new check that iterated through a SSA def's uses, to see if it's used. But it only checked normal uses, and not uses which are part of an 'if' condition. This led to it thinking more nodes were dead than possible. Fixes Piglit's variable-indexing/tcs-output-array-float-index-wr test (and related tests) with the out-of-tree Iris driver. Fixes: 08bfd710a25 nir/dead_cf: Stop relying on liveness analysis Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* compiler: Make is_64bit(GL_*) helper more broadly availableKenneth Graunke2019-02-191-26/+2
| | | | | | | | I'd like to use this in the prog_parameter.c code, so I need to move it into C, make it non-static, and so on. This probably isn't the ideal place for it, but I couldn't think of a better one. Acked-by: Timothy Arceri <[email protected]>
* nir: Don't reassociate add/mul chains containing only constantsKenneth Graunke2019-02-161-5/+5
| | | | | | | | | | | | | | | | | The idea here is to reassociate a * (b * c) into (a * c) * b, when b is a non-constant value, but a and c are constants, allowing them to be combined. But nothing was enforcing that 'b' must be non-constant, which meant that running opt_algebraic in a loop would never terminate if the IR contained non-folded constant expressions like 256 * 0.5 * 2. Normally, we call constant folding in such a loop too, but IMO it's better for nir_opt_algebraic to be robust and not rely on that. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109581 Fixes: 32e266a9a58 i965: Compile fp64 funcs only if we do not have 64-bit hardware support Reviewed-by: Ian Romanick <[email protected]>
* nir: remove simple dead if detection from nir_opt_dead_cf()Timothy Arceri2019-02-161-7/+2
| | | | | | | | | | | | | | | | | | This was probably useful when it was first written, however it looks to be no longer necessary. As far as I can tell these days dce is smart enough to remove useless instructions from if branches. Once this is done nir_opt_peephole_select() will end up removing the empty if. Removing this support reduces the dolphin uber shader compilation time spent in nir_opt_dead_cf() by a little over 7x. No shader-db changes on i965 or radeonsi. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/algebraic: Simplify comparison with sequential integers starting with 0Ian Romanick2019-02-151-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All of the affected shaders are Unreal4 demos. All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15437170 -> 15437001 (<.01%) instructions in affected programs: 21536 -> 21367 (-0.78%) helped: 43 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 3.93 x̃: 4 helped stats (rel) min: 0.68% max: 1.01% x̄: 0.80% x̃: 0.80% 95% mean confidence interval for instructions value: -4.07 -3.79 95% mean confidence interval for instructions %-change: -0.83% -0.77% Instructions are helped. total cycles in shared programs: 383007896 -> 383007378 (<.01%) cycles in affected programs: 158640 -> 158122 (-0.33%) helped: 38 HURT: 4 helped stats (abs) min: 1 max: 48 x̄: 13.89 x̃: 6 helped stats (rel) min: 0.03% max: 1.01% x̄: 0.33% x̃: 0.19% HURT stats (abs) min: 2 max: 3 x̄: 2.50 x̃: 2 HURT stats (rel) min: 0.06% max: 0.09% x̄: 0.08% x̃: 0.08% 95% mean confidence interval for cycles value: -16.90 -7.77 95% mean confidence interval for cycles %-change: -0.39% -0.19% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8213746 -> 8213745 (<.01%) instructions in affected programs: 127 -> 126 (-0.79%) helped: 1 HURT: 0 total cycles in shared programs: 187734146 -> 187734144 (<.01%) cycles in affected programs: 2132 -> 2130 (-0.09%) helped: 1 HURT: 0 Reviewed-by: Jason Ekstrand <[email protected]>
* nir/algebraic: Convert some f2u to f2iIan Romanick2019-02-151-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Section 5.4.1 (Conversion and Scalar Constructors) of the GLSL 4.60 spec says: It is undefined to convert a negative floating-point value to an uint. Assuming that (uint)some_float behaves like (uint)(int)some_float allows some optimizations in the i965 backend to proceed. This basically undoes the small amount of damage done by "intel/compiler: Avoid propagating inequality cmods if types are different". v2: Replicate part of the commit message as a comment in the code. Suggested by Jason. shader-db results compairing *before* "intel/compiler: Avoid propagating inequality cmods if types are different" and after this commit: Skylake total cycles in shared programs: 383007996 -> 383007896 (<.01%) cycles in affected programs: 85208 -> 85108 (-0.12%) helped: 13 HURT: 8 helped stats (abs) min: 2 max: 26 x̄: 10.77 x̃: 6 helped stats (rel) min: 0.09% max: 0.65% x̄: 0.28% x̃: 0.14% HURT stats (abs) min: 2 max: 12 x̄: 5.00 x̃: 3 HURT stats (rel) min: 0.04% max: 0.32% x̄: 0.12% x̃: 0.07% 95% mean confidence interval for cycles value: -9.31 -0.21 95% mean confidence interval for cycles %-change: -0.24% <.01% Cycles are helped. Broadwell total cycles in shared programs: 415251194 -> 415251370 (<.01%) cycles in affected programs: 83750 -> 83926 (0.21%) helped: 7 HURT: 13 helped stats (abs) min: 10 max: 12 x̄: 11.43 x̃: 12 helped stats (rel) min: 0.30% max: 0.30% x̄: 0.30% x̃: 0.30% HURT stats (abs) min: 2 max: 36 x̄: 19.69 x̃: 22 HURT stats (rel) min: 0.05% max: 0.89% x̄: 0.44% x̃: 0.47% 95% mean confidence interval for cycles value: 0.76 16.84 95% mean confidence interval for cycles %-change: <.01% 0.37% Inconclusive result (%-change mean confidence interval includes 0). Haswell total instructions in shared programs: 13823885 -> 13823886 (<.01%) instructions in affected programs: 2249 -> 2250 (0.04%) helped: 0 HURT: 1 total cycles in shared programs: 390094243 -> 390094001 (<.01%) cycles in affected programs: 85640 -> 85398 (-0.28%) helped: 15 HURT: 6 helped stats (abs) min: 4 max: 26 x̄: 18.53 x̃: 18 helped stats (rel) min: 0.09% max: 0.66% x̄: 0.47% x̃: 0.42% HURT stats (abs) min: 2 max: 14 x̄: 6.00 x̃: 2 HURT stats (rel) min: 0.04% max: 0.37% x̄: 0.15% x̃: 0.04% 95% mean confidence interval for cycles value: -17.36 -5.69 95% mean confidence interval for cycles %-change: -0.44% -0.14% Cycles are helped. Ivy Bridge total cycles in shared programs: 180986448 -> 180986552 (<.01%) cycles in affected programs: 34835 -> 34939 (0.30%) helped: 0 HURT: 10 HURT stats (abs) min: 2 max: 18 x̄: 10.40 x̃: 10 HURT stats (rel) min: 0.06% max: 0.36% x̄: 0.28% x̃: 0.30% 95% mean confidence interval for cycles value: 4.67 16.13 95% mean confidence interval for cycles %-change: 0.20% 0.35% Cycles are HURT. Sandy Bridge total cycles in shared programs: 154603969 -> 154603970 (<.01%) cycles in affected programs: 171514 -> 171515 (<.01%) helped: 25 HURT: 14 helped stats (abs) min: 1 max: 4 x̄: 1.80 x̃: 1 helped stats (rel) min: 0.02% max: 0.10% x̄: 0.04% x̃: 0.04% HURT stats (abs) min: 1 max: 8 x̄: 3.29 x̃: 3 HURT stats (rel) min: 0.03% max: 0.28% x̄: 0.10% x̃: 0.11% 95% mean confidence interval for cycles value: -0.91 0.96 95% mean confidence interval for cycles %-change: -0.02% 0.04% Inconclusive result (value mean confidence interval includes 0). No changes on Iron Lake or GM45. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: remove jump from two merging jump-ending blocksJuan A. Suarez Romero2019-02-151-2/+19
| | | | | | | | | | | | | | | | | | In opt_peel_initial_if optimization, when moving the continue list to end of the continue block, before the jump, could happen that the continue list itself also ends with a jump. This would mean that we would have two jump instructions in a row: the first one from the continue list and the second one from the contine block. As inserting an instruction after a jump is not allowed (and it does not make sense, as it will not be executed), remove the jump from the continue block and keep the one from continue list, as it will be executed first. CC: Jason Ekstrand <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: move ALU instruction before the jump instructionJuan A. Suarez Romero2019-02-151-1/+1
| | | | | | | | | | | | opt_split_alu_of_phi moves ALU instruction to the end of continue block. But if the continue block ends with a jump instruction (an explicit "continue" instruction) then the ALU must be inserted before the jump, as it is illegal to add instructions after the jump. CC: Ian Romanick <[email protected]> Fixes: 0881e90c099 ("nir: Split ALU instructions in loops that read phis") Reviewed-by: Ian Romanick <[email protected]>
* nir/dead_cf: Stop relying on liveness analysisJason Ekstrand2019-02-141-21/+39
| | | | | | | | | | | | | The liveness analysis pass is fairly expensive because it has to build large bit-sets and run a fix-point algorithm on them. Instead of requiring liveness for detecting if values escape a CF node, just take advantage of the structured nature of NIR and use block indices instead. This only requires the block index metadata which is the fastest we have metadata to generate. No shader-db changes on Kaby Lake Reviewed-by: Timothy Arceri <[email protected]>
* nir/dead_cf: Inline cf_node_has_side_effectsJason Ekstrand2019-02-141-41/+32
| | | | | | | | We want to handle live SSA values differently and it's going to involve walking the instructions. We can make it a single instruction walk if we combine it with cf_node_has_side_effects. Reviewed-by: Timothy Arceri <[email protected]>
* nir: Silence a couple of warnings in release buildsJason Ekstrand2019-02-142-1/+3
| | | | | | | | | | | | | | [28/716] Compiling C object 'src/compiler/nir/068b2c8@@nir@sta/nir_gather_xfb_info.c.o'. ../src/compiler/nir/nir_gather_xfb_info.c: In function ‘nir_gather_xfb_info’: ../src/compiler/nir/nir_gather_xfb_info.c:171:13: warning: variable ‘max_offset’ set but not used [-Wunused-but-set-variable] unsigned max_offset[NIR_MAX_XFB_BUFFERS] = {0}; ^~~~~~~~~~ [36/716] Compiling C object 'src/compiler/nir/068b2c8@@nir@sta/nir_instr_set.c.o'. ../src/compiler/nir/nir_instr_set.c:502:1: warning: ‘instr_each_src_and_dest_is_ssa’ defined but not used [-Wunused-function] instr_each_src_and_dest_is_ssa(nir_instr *instr) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* spirv: Eliminate dead input/output variables after translation.Kenneth Graunke2019-02-141-5/+20
| | | | | | | | | | | | | | | spirv_to_nir can generate input/output variables which are illegal for the current shader stage, which would cause nir_validate_shader to balk. After my recent commit to start decorating arrays as compact, dEQP-VK.spirv_assembly.instruction.graphics.module.same_module started hitting validation errors due to outputs in a TCS (not intended for the TCS at all) not being per-vertex arrays. Thanks to Jason Ekstrand for suggesting this approach. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109573 Fixes: ef99f4c8d17 compiler: Mark clip/cull distance arrays as compact before lowering. Reviewed-by: Juan A. Suarez <[email protected]>
* spirv: Add missing breakIan Romanick2019-02-141-0/+1
| | | | | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Fixes: c6465fec0c5 ("spirv: add SpvCapabilityInt64Atomics") CID: 1442555
* nir: Move panfrost's isign lowering to nir_opt_algebraic.Eric Anholt2019-02-142-0/+5
| | | | | | | I wanted to reuse this from v3d. Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir: turn an ssa check in nir_search into an assertTimothy Arceri2019-02-141-2/+1
| | | | | | | Everything should be in ssa form when we call this. This is a hotpath so replace the check with an assert. Reviewed-by: Connor Abbott <[email protected]>
* nir: turn ssa check into an assertTimothy Arceri2019-02-141-3/+11
| | | | | | | Everthing should be in ssa form when this is called. Checking for it here is expensive so turn this into an assert instead. Reviewed-by: Connor Abbott <[email protected]>
* nir: prehash instruction in nir_instr_set_add_or_rewrite()Timothy Arceri2019-02-141-4/+5
| | | | | | | | There is no need to hash the instruction twice, especially as we end up adding it in the majority of cases. Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: fix example in opt_peel_loop_initial_if descriptionCaio Marcelo de Oliveira Filho2019-02-121-3/+3
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* nir/opt_if: don't mark progress if nothing changesKarol Herbst2019-02-131-0/+7
| | | | | | | | | | | | | | | | | | | | if we have something like this: loop { ... if x { break; } else { continue; } } opt_if_loop_last_continue returns true marking progress allthough nothing changes. Fixes: 5921a19d4b0c6 "nir: add if opt opt_if_loop_last_continue()" Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: add option to use scaling factor when sampling planes YUV loweringTapani Pälli2019-02-122-21/+35
| | | | | | | | | Patch adds nir_lower_tex_options as parameter to sample_plane so that we don't need to extend nir_tex_instr for this. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Gather texture bitmasks in gl_nir_lower_samplers_as_deref.Kenneth Graunke2019-02-113-7/+27
| | | | | | | | | | | | | | | | | | | | | | | | Eric and I would like a bitmask of which samplers are used, similar to prog->SamplersUsed, but available in NIR. The linker uses SamplersUsed for resource limit checking, but later optimizations may eliminate more samplers. So instead of propagating it through, we gather a new one. While there, we also gather the existing textures_used_by_txf bitmask. Gathering these bitfields in nir_shader_gather_info is awkward at best. The main reason is that it introduces an ordering dependency between the two passes. If gathering runs before lower_samplers_as_deref, it can't look at var->data.binding. If the driver doesn't use the full lowering to texture_index/texture_array_size (like radeonsi), then the gathering can't use those fields. Gathering might be run early /and/ late, first to get varying info, and later to update it after variant lowering. At this point, should gathering work on pre-lowered or post-lowered code? Pre-lowered is also harder due to the presence of structure types. Just doing the gathering when we do the lowering alleviates these ordering problems. This fixes ordering issues in i965 and makes the txf info gathering work for radeonsi (though they don't use it). Reviewed-by: Eric Anholt <[email protected]>