summaryrefslogtreecommitdiffstats
path: root/src/compiler
Commit message (Collapse)AuthorAgeFilesLines
* android: nir: add a load/store vectorization passMauro Rossi2019-12-271-0/+1
| | | | | | | | | | | Fixes the following aco building error: external/mesa/src/amd/compiler/aco_instruction_selection_setup.cpp:846: error: undefined reference to 'nir_opt_load_store_vectorize' Fixes: ce9205c ("nir: add a load/store vectorization pass") Signed-off-by: Mauro Rossi <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* nir: sanitize work group intrinsics to always be 32-bit.Dave Airlie2019-12-271-0/+4
| | | | | | This saves handling them in the backend later. Reviewed-by: Karol Herbst <[email protected]>
* nir+vtn: vec8+vec16 supportRob Clark2019-12-2114-24/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces new vec8 and vec16 instructions (which are the only instructions taking more than 4 sources), in order to construct 8 and 16 component vectors. In order to avoid fixing up the non-autogenerated nir_build_alu() sites and making them pass 16 src args for the benefit of the two instructions that take more than 4 srcs (ie vec8 and vec16), nir_build_alu() is has nir_build_alu_tail() split out and re-used by nir_build_alu2() (which is used for the > 4 src args case). v2 (Karol Herbst): use nir_build_alu2 for vec8 and vec16 use python's array multiplication syntax add nir_op_vec helper simplify nir_vec nir_build_alu_tail -> nir_builder_alu_instr_finish_and_insert use nir_build_alu for opcodes with <= 4 sources v3 (Karol Herbst): fix nir_serialize v4 (Dave Airlie): fix serialization of glsl_type handle vec8/16 in lowering of bools v5 (Karol Herbst): fix load store vectorizer Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* nir/serialize: cast swizzle before shiftingKarol Herbst2019-12-211-1/+1
| | | | | | | fixes undefined behaviour with enabled vec16 Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* spirv: Implement SPV_KHR_non_semantic_infoCaio Marcelo de Oliveira Filho2019-12-191-0/+29
| | | | | | | | | | | | Do nothing for OpExtInst from extended instruction sets that name start with "NonSemantic.". Since they can be used within the "preamble" to annotate global decorations, also don't stop iterating when one of them is found. Reviewed-by: Jason Ekstrand <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3154> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3154>
* nir: fix assign_io_var_locations for vertex inputsJonathan Marek2019-12-191-3/+9
| | | | | | | | | | Also fixes fragment inputs using the wrong "base" value (which was working only because FRAG_RESULT_DATA0 is less than VARYING_SLOT_VAR0) Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3108> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3108>
* Revert "nir/lower_double_ops: relax lower mod()"Juan A. Suarez Romero2019-12-191-15/+6
| | | | | | | | | This reverts commit 8172b1fa03fe74165728bfb182c98a3e62193d2b. This commit was done taking in account Vulkan spec, but did not realize it was affecting OpenGL too. Closes: #2252
* nir/lower_double_ops: relax lower mod()Juan A. Suarez Romero2019-12-191-6/+15
| | | | | | | | | | | | | | | | Currently when lowering mod() we add an extra instruction so if mod(a,b) == b then 0 is returned instead of b, as mathematically mod(a,b) is in the interval [0, b). But Vulkan spec has relaxed this restriction, and allows the result to be in the interval [0, b]. This commit takes this in account to remove the extra instruction required to return 0 instead. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2922> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2922>
* nir: add option to lower half packing opcodesJonathan Marek2019-12-162-0/+14
| | | | | | | Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3106>
* v3d: handle writes to gl_Layer from geometry shadersIago Toral Quiroga2019-12-161-0/+4
| | | | | | | | | | | | | | | | When geometry shaders write a value to gl_Layer that doesn't correspond to an existing layer in the target framebuffer the rendering behavior is undefined according to the spec, however, there are CTS tests that trigger this scenario on purpose, probably to ensure that nothing terrible happens. For V3D, this situation is problematic because the binner uses the layer index to select the offset to write into the tile state data, and we only allocate tile state for MAX2(num_layers, 1), so we want to make sure we don't produce values that would lead to out of bounds writes. The simulator has an assert to catch this, although we haven't observed issues in actual hardware it is probably best to play safe. Reviewed-by: Alejandro Piñeiro <[email protected]>
* nir/opt_peephole_select: remove unused variablesAlejandro Piñeiro2019-12-131-4/+0
| | | | | | To avoid "unused variable" warnings. Reviewed-by: Ian Romanick <[email protected]>
* st/glsl_to_nir: use nir based program resource list builderTimothy Arceri2019-12-134-5/+12
| | | | | | | | | | | | | | | | | Here we use the NIR based builder to add everything to the resource list execpt for SSO packed varyings. Since the details of those varyings get lost during packing we leave the special handing to the GLSL IR pass for now. In order to do this we add some bools to the build resource list functions. Using the NIR based resource list builder gets us a step closer to using a native NIR based linker. It should also be faster than the GLSL IR builder, one because the NIR optimisations should mean we add less entries due to better optimisations, and two because nir gives us better lists to work with and we don't need to walk the entire IR to find the resources. Ack-by: Alejandro Piñeiro <[email protected]>
* glsl: add subroutine support to nir_build_program_resource_list()Timothy Arceri2019-12-131-2/+31
| | | | | | | This is required so we can use the NIR linker to link GLSL in addition to spirv. Reviewed-by: Alejandro Piñeiro <[email protected]>
* glsl: add support for named varyings in nir_build_program_resource_list()Timothy Arceri2019-12-131-15/+286
| | | | | | | | | | | This adds support for adding names of varying to the resource list which is required for us to use this function with the glsl linker. Support for names is optional for spirv which is why it had not been added yet. This is mostly a copy of the GLSL IR code adapted to nir. Reviewed-by: Alejandro Piñeiro <[email protected]>
* glsl: copy the new data fields when converting to nirTimothy Arceri2019-12-131-0/+4
| | | | | | | These fields added in the previous commit will be used to make use of a NIR based GLSL linker. Reviewed-by: Alejandro Piñeiro <[email protected]>
* nir: add some fields to nir_variable_dataTimothy Arceri2019-12-131-0/+28
| | | | | | These will be used to provide NIR linking functionality to GLSL. Reviewed-by: Alejandro Piñeiro <[email protected]>
* glsl: copy the how_declared field when converting to nirTimothy Arceri2019-12-131-0/+10
| | | | | | This is needed to make use of nir_build_program_resource_list(). Reviewed-by: Alejandro Piñeiro <[email protected]>
* glsl: move nir_remap_dual_slot_attributes() call out of glsl_to_nir()Timothy Arceri2019-12-131-7/+0
| | | | | | | | | In order to be able to implement a NIR based glsl linker we need to build the program resource list with NIR. This change delays the remaping so that a later commit can call the NIR based resource list builder. Reviewed-by: Alejandro Piñeiro <[email protected]>
* nir: Don't copy empty arrayTomeu Vizoso2019-12-121-2/+4
| | | | | | | | | | | | | | | | | | | | | | | It's undefined behavior UBSAN complains about, so fixing this will reduce the noise a bit. ../src/compiler/nir/nir_clone.c:710:4: runtime error: null pointer passed as argument 2, which is declared to never be null"} #0 0xac781be4 in clone_function ../src/compiler/nir/nir_clone.c:710"} #1 0xac781be4 in nir_shader_clone ../src/compiler/nir/nir_clone.c:740"} #2 0xacf99442 in panfrost_shader_compile ../src/gallium/drivers/panfrost/pan_assemble.c:54"} #3 0xacf6b268 in panfrost_bind_shader_state ../src/gallium/drivers/panfrost/pan_context.c:1960"} #4 0xaae326bc in set_fragment_shader ../src/mesa/state_tracker/st_cb_clear.c:135"} #5 0xaae326bc in clear_with_quad ../src/mesa/state_tracker/st_cb_clear.c:335"} #6 0xaae326bc in st_Clear ../src/mesa/state_tracker/st_cb_clear.c:518"} #7 0x494d0e in deqp::gles2::TestCaseWrapper::iterate(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x2ad0e)"} #8 0x7f9cf2 in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x38fcf2)"} #9 0x7fa5f0 in tcu::TestSessionExecutor::iterate() (/deqp/modules/gles2/deqp-gles2+0x3905f0)"} #10 0x7e1aac in tcu::App::iterate() (/deqp/modules/gles2/deqp-gles2+0x377aac)"} #11 0x492d4c in main (/deqp/modules/gles2/deqp-gles2+0x28d4c)"} #12 0xb64b9aa8 in __libc_start_main (/lib/arm-linux-gnueabihf/libc.so.6+0x1aaa8)"} Signed-off-by: Tomeu Vizoso <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* vtn/opencl: add shuffle/shuffle supportDave Airlie2019-12-121-1/+52
| | | | | | | This adds nir encoding for these, generating them from libclc was very expensive, and this is a lot simpler. Reviewed-by: Karol Herbst <[email protected]>
* vtn: convert vload/store to single value loopsDave Airlie2019-12-121-11/+20
| | | | | | | There is an alignment issue doing this the other way, the spec clearly says vload/store don't require alignment. Reviewed-by: Karol Herbst <[email protected]>
* nir: handle nir_deref_type_ptr_as_array in rematerialize_deref_in_blockKarol Herbst2019-12-111-0/+1
| | | | | | | | | I forgot why that was required, but it still is the correct thing to do. Hit it at some point when working on implementing more CL features. Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* spirv: add OpLifetime*Rob Clark2019-12-111-0/+4
| | | | | | | | | These are just hints so we can ignore them. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* spirv: handle UniformConstant for OpenCL kernelsKarol Herbst2019-12-113-2/+19
| | | | | | | | | | | | | | | The caller is responsible for setting up the ubo_addr_format value as contrary to shared and global, it's not controlled by the spirv. Right now clovers implementation of CL constant memory uses a 24/8 bit format to encode the buffer index and offset, but that code is dead as all backends treat constants as global memory to workaround annoying issues within OpenCL. Maybe that will change, maybe not. But just in case somebody wants to look at it, add a toggle for this inside vtn. Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* nir/tests: MSVC build fixKarol Herbst2019-12-111-14/+11
| | | | | | Fixes: 11f736a6f9c "nir/tests: add serializer tests" Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* nir/tests: add serializer testsKarol Herbst2019-12-112-0/+299
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/serialize: fix vec8 and vec16Karol Herbst2019-12-111-12/+17
| | | | | | | | | | | | | | | | | Nir serializes uses nir_ssa_alu_instr_src_components in a few places to determine how many components a src has, but that's not what this function returns. It simply returns how many channels are used, which is still fine for most of the code. This was breaking code like this: vec16 32 ssa_1 = intrinsic load_global vec1 32 ssa_2 = fmax ssa_1.a, ssa_2.b v2: make the 16bit encoding work for identify swizzles again Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* compiler/spirv: Fix uses of gnu struct = {} extensionPierre Moreau2019-12-111-1/+1
| | | | | | | Fixes: a24d6fbae60 ("meson: Add -Werror=gnu-empty-initializer to MSVC compat args") Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Tested-by: Vinson Lee <[email protected]> Signed-off-by: Pierre Moreau <[email protected]>
* glsl/nir: iterate the system values list when adding varyingsTimothy Arceri2019-12-051-25/+36
| | | | | | | | | | | | Iterate the system values list when adding varyings to the program resource list in the NIR linker. This is needed to avoid CTS regressions when using the NIR to build the GLSL resource list in an upcoming series. Presumably it also fixes a bug with the current ARB_gl_spirv support. Fixes: ffdb44d3a0a2 ("nir/linker: Add inputs/outputs to the program resource list") Reviewed-by: Alejandro Piñeiro <[email protected]>
* glsl/tests: Use splitlines() instead of strip()Michel Dänzer2019-12-051-2/+2
| | | | | | | | | strip() removes leading and trailing newlines, but leaves newlines between multiple lines in the string. This could cause failures when comparing the output of cross-compiled Windows binaries (producing Windows-style newlines) to the expected output with Unix-style newlines. Reviewed-by: Dylan Baker <[email protected]>
* glsl: make use of active_shader_mask when building resource listTimothy Arceri2019-12-051-12/+1
| | | | | | | This allows us to avoid walking the entire IR looking for used uniforms. Reviewed-by: Tapani Pälli <[email protected]>
* glsl: don't set uniform block as used when its notTimothy Arceri2019-12-052-2/+10
| | | | | | | | | | | The spec requires unused uniform block to be set as active in the program resource list. To support this we tell opt dead code not to remove them. However we can mark them as unused internally and avoid unnecessarily state changes. This change is also required for the folowing clean-up patch. Reviewed-by: Tapani Pälli <[email protected]>
* glsl: move calculate_array_size_and_stride() to link_uniforms.cppTimothy Arceri2019-12-052-216/+218
| | | | | | | | This is where all the other uniform values are populated so it makes much more sense here. Moving it will also allow us to better share code between the NIR and GLSL IR resource list builders. Reviewed-by: Tapani Pälli <[email protected]>
* nir/lower_clip: Fix incorrect driver loc for clipdist outputsRob Clark2019-12-041-0/+11
| | | | | | | | | | | Somehow adjusting maxloc based on existing outputs got lost, resulting in the clipdist varying clobbering the position varying. Causing a shader that had no position output in freedreno/ir3, which triggers GPU hangs in neverball. Fixes: d0f746b6458 ("nir: Save nir_variable pointers in nir_lower_clip_vs rather than locs.") Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* glsl: additional interface redeclaration check for SSO programsTapani Pälli2019-12-041-0/+54
| | | | | | | | | | | Patch adds additional linker check for SSO programs to make sure they are redeclaring built-in blocks as required by the desktop spec. This fixes following Piglit tests: arb_separate_shader_objects/linker/pervertex-* Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* nir/load_store_vectorize: fix combining stores with aliasing loads betweenRhys Perry2019-12-042-2/+16
| | | | | | | | | v2: add test Fixes: ce9205c03bd ('nir: add a load/store vectorization pass') Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> (v1) Reviewed-by: Connor Abbott <[email protected]> (v2)
* nir/algebraic: Rearrange bcsel sequences generated by nir_opt_peephole_selectIan Romanick2019-12-021-0/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reviewed-by: Matt Turner <[email protected]> All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 14660366 -> 14653437 (-0.05%) instructions in affected programs: 316166 -> 309237 (-2.19%) helped: 905 HURT: 10 helped stats (abs) min: 1 max: 36 x̄: 7.67 x̃: 6 helped stats (rel) min: 0.13% max: 18.75% x̄: 4.28% x̃: 3.60% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.10% max: 1.33% x̄: 0.70% x̃: 0.97% 95% mean confidence interval for instructions value: -7.91 -7.23 95% mean confidence interval for instructions %-change: -4.46% -3.99% Instructions are helped. total cycles in shared programs: 228571646 -> 228549759 (<.01%) cycles in affected programs: 56239919 -> 56218032 (-0.04%) helped: 681 HURT: 216 helped stats (abs) min: 1 max: 5156 x̄: 45.49 x̃: 10 helped stats (rel) min: <.01% max: 10.45% x̄: 1.29% x̃: 0.65% HURT stats (abs) min: 1 max: 320 x̄: 42.09 x̃: 14 HURT stats (rel) min: <.01% max: 37.04% x̄: 1.38% x̃: 0.49% 95% mean confidence interval for cycles value: -41.51 -7.29 95% mean confidence interval for cycles %-change: -0.80% -0.49% Cycles are helped. LOST: 1 GAINED: 0
* nir/algebraic: Simplify some Inf and NaN avoidance codeIan Romanick2019-12-021-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since a is non-negative, neither fsqrt nor frsq should return NaN. frsq should only return Inf when fsqrt returns 0. The changes are pretty small, but this turns a few hundred hurt shaders in the next patch into helped shaders. An alternative to the intBitsToFloat is to import numpy and do np.finfo(np.float32).max. That's more explicit, but we may also want to have specific bit encodings of float values later. I could be convinced either way, but intBitsToFloat(0x7f7fffff) was what I implemented first. Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Matt Turner <[email protected]> All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 14661140 -> 14661104 (<.01%) instructions in affected programs: 7520 -> 7484 (-0.48%) helped: 36 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.32% max: 0.61% x̄: 0.49% x̃: 0.52% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.52% -0.47% Instructions are helped. total cycles in shared programs: 228585416 -> 228584806 (<.01%) cycles in affected programs: 56321 -> 55711 (-1.08%) helped: 32 HURT: 0 helped stats (abs) min: 2 max: 98 x̄: 19.06 x̃: 10 helped stats (rel) min: 0.08% max: 6.41% x̄: 1.09% x̃: 0.65% 95% mean confidence interval for cycles value: -28.32 -9.80 95% mean confidence interval for cycles %-change: -1.63% -0.54% Cycles are helped. Sandy Bridge total cycles in shared programs: 152991077 -> 152991075 (<.01%) cycles in affected programs: 11525 -> 11523 (-0.02%) helped: 2 HURT: 2 helped stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3 helped stats (rel) min: 0.07% max: 0.11% x̄: 0.09% x̃: 0.09% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08% 95% mean confidence interval for cycles value: -5.27 4.27 95% mean confidence interval for cycles %-change: -0.16% 0.15% Inconclusive result (value mean confidence interval includes 0). No changes on Iron Lake or GM45.
* nir/opt_peephole_select: Don't count some unary operationsIan Romanick2019-12-021-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In many cases, fsat, fneg, fabs, ineg, and iabs will get folded into another instruction as either source or destination modifiers. Counting them as instructions means that some if-statements won't get converted to selects. For example, vec1 32 ssa_25 = flt32 ssa_0, ssa_23.x /* succs: block_1 block_2 */ if ssa_25 { block block_1: /* preds: block_0 */ vec1 32 ssa_26 = fabs ssa_24 vec1 32 ssa_27 = fneg ssa_26 vec1 32 ssa_28 = fabs ssa_20 vec1 32 ssa_29 = fneg ssa_28 vec1 32 ssa_30 = fmul ssa_27, ssa_29 vec1 32 ssa_31 = fsat ssa_30 /* succs: block_3 */ } else { block block_2: /* preds: block_0 */ /* succs: block_3 */ } block block_3: /* preds: block_1 block_2 */ block_1 isn't really 6 instructions, but it will be counted that way. Most callers of the peephole_select pass use either 1 or 8. It's very easy to blow way past either of these limits with things that are really only one or two actual instructions. I also tried some fancier things like making sure the fsat was of another SSA def from the same block, but the simple test was actually better. The i965 back-end SEL peephole pass still helps ~700 shaders in shader-db with this change. Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Matt Turner <[email protected]> All Gen6+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 14743694 -> 14738910 (-0.03%) instructions in affected programs: 156575 -> 151791 (-3.06%) helped: 1204 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 3.97 x̃: 3 helped stats (rel) min: 0.15% max: 19.57% x̄: 5.15% x̃: 4.55% 95% mean confidence interval for instructions value: -4.12 -3.82 95% mean confidence interval for instructions %-change: -5.35% -4.95% Instructions are helped. total cycles in shared programs: 231749141 -> 231602916 (-0.06%) cycles in affected programs: 2818975 -> 2672750 (-5.19%) helped: 876 HURT: 322 helped stats (abs) min: 2 max: 788 x̄: 180.99 x̃: 220 helped stats (rel) min: <.01% max: 43.82% x̄: 20.75% x̃: 19.44% HURT stats (abs) min: 1 max: 1188 x̄: 38.27 x̃: 20 HURT stats (rel) min: 0.09% max: 102.67% x̄: 5.17% x̃: 1.70% 95% mean confidence interval for cycles value: -130.47 -113.64 95% mean confidence interval for cycles %-change: -14.85% -12.72% Cycles are helped. total sends in shared programs: 730495 -> 730491 (<.01%) sends in affected programs: 46 -> 42 (-8.70%) helped: 2 HURT: 0 Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8122757 -> 8122617 (<.01%) instructions in affected programs: 14716 -> 14576 (-0.95%) helped: 46 HURT: 1 helped stats (abs) min: 1 max: 8 x̄: 3.07 x̃: 3 helped stats (rel) min: 0.36% max: 10.00% x̄: 2.54% x̃: 1.06% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.59% max: 1.59% x̄: 1.59% x̃: 1.59% 95% mean confidence interval for instructions value: -3.42 -2.54 95% mean confidence interval for instructions %-change: -3.28% -1.62% Instructions are helped. total cycles in shared programs: 188510100 -> 188509780 (<.01%) cycles in affected programs: 58994 -> 58674 (-0.54%) helped: 32 HURT: 1 helped stats (abs) min: 2 max: 96 x̄: 10.06 x̃: 6 helped stats (rel) min: 0.05% max: 15.29% x̄: 1.37% x̃: 0.31% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.68% max: 0.68% x̄: 0.68% x̃: 0.68% 95% mean confidence interval for cycles value: -16.34 -3.06 95% mean confidence interval for cycles %-change: -2.46% -0.15% Cycles are helped.
* nir/lower_io_to_vector: don't create arrays when not neededRhys Perry2019-12-021-1/+7
| | | | | | | | | | | | Some backends require that there are no array varyings. If there were no arrays in the input shader, the pass shouldn't have to create new ones. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2103 Fixes: bcd14756eec ('nir/lower_io_to_vector: add flat mode') Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/samplers: don't zero samplers_used/txf.Dave Airlie2019-12-021-3/+0
| | | | | | | | | | | This allows this pass to be run multiple times and the results are just or'ed together. It fixes on test on llvmpipe nir, and regresses none. Suggested by Kenneth Reviewed-by: Marek Olšák <[email protected]>
* glsl: handle max uniform limits with lower_const_arrays_to_uniformsTapani Pälli2019-11-283-5/+40
| | | | | | | Fixes arb_tessellation_shader-large-uniforms Piglit test. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* driconf, glsl: Add a vs_position_always_invariant optionKenneth Graunke2019-11-271-0/+6
| | | | | | | | | | | | | | | | | | | | Many applications use multi-pass rendering and require their vertex shader position to be computed the same way each time. Optimizations may consider, say, fusing a multiply-add based on global usage of an expression in a shader. But a second shader with the same expression may have different code, causing that optimization to make the other choice the second time around. The correct solution is for applications to mark their VS outputs 'invariant', indicating they need multiple shaders to compute that output in the same manner. However, most applications fail to do so. So, we add a new driconf option - vs_position_always_invariant - which forces the gl_Position output in vertex shaders to be marked invariant. Fixes: 7025dbe794b ("nir: Skip emitting no-op movs from the builder.") Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir: Make algebraic backtrack and reprocess after a replacement.Eric Anholt2019-11-262-22/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The algebraic pass was exhibiting O(n^2) behavior in dEQP-GLES2.functional.uniform_api.random.3 and dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 (along with other code-generated tests, and likely real-world loop-unroll cases). In the process of using fmul(b2f(x), b2f(x)) -> b2f(iand(x, y)) to transform: result = b2f(a == b); result *= b2f(c == d); ... result *= b2f(z == w); -> temp = (a == b) temp = temp && (c == d) ... temp = temp && (z == w) result = b2f(temp); nir_opt_algebraic, proceeding bottom-to-top, would match and convert the top-most fmul(b2f(), b2f()) case each time, leaving the new b2f to be matched by the next fmul down on the next time algebraic got run by the optimization loop. Back in 2016 in 7be8d0773229 ("nir: Do opt_algebraic in reverse order."), Matt changed algebraic to go bottom-to-top so that we would match the biggest patterns first. This helped his cases, but I believe introduced this failure mode. Instead of reverting that, now that we've got the automaton, we can update the automaton's state recursively and just re-process any instructions whose state has changed (indicating that they might match new things). There's a small chance that the state will hash to the same value and miss out on this round of algebraic, but this seems to be good enough to fix dEQP. Effects with NIR_VALIDATE=0 (improvement is better with validation enabled): Intel shader-db runtime -0.954712% +/- 0.333844% (n=44/46, obvious throttling outliers removed) dEQP-GLES2.functional.uniform_api.random.3 runtime -65.3512% +/- 4.22369% (n=21, was 1.4s) dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 runtime -68.8066% +/- 6.49523% (was 4.8s) v2: Use two worklists, suggested by @cwabbott, to cut out a bunch of tricky code. Runtime of uniform_api.random.3 down -0.790299% +/- 0.244213% compred to v1. v3: Re-add the nir_instr_remove() that I accidentally dropped in v2, fixing infinite loops. Reviewed-by: Connor Abbott <[email protected]>
* nir: Refactor algebraic's block walkEric Anholt2019-11-261-31/+31
| | | | | | | | | My motivation was to clarify the changes in the following commit, but incidentally, it reduces runtime of dEQP-GLES2.functional.uniform_api.random.3 (an algebraic-heavy testcase) by -5.39524% +/- 2.21179% (n=15) Reviewed-by: Connor Abbott <[email protected]>
* nir: Maintain the algebraic automaton's state as we work.Connor Abbott2019-11-262-38/+78
| | | | | | | | In order to have nir_opt_algebraic be able to do further algebraic work on the output of a replacement, we need to maintain the automaton's state. Reviewed-by: Eric Anholt <[email protected]>
* nir: Add a scheduler pass to reduce maximum register pressure.Eric Anholt2019-11-254-0/+1093
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is similar to a scheduler I've written for vc4 and i965, but this time written at the NIR level so that hopefully it's reusable. A notable new feature it has is Goodman/Hsu's heuristic of "once we've started processing the uses of a value, prioritize processing the rest of their uses", which should help avoid the heuristic otherwise making such systematically bad choices around getting texture results consumed. Results for v3d: total instructions in shared programs: 6497588 -> 6518242 (0.32%) total threads in shared programs: 154000 -> 152828 (-0.76%) total uniforms in shared programs: 2119629 -> 2068681 (-2.40%) total spills in shared programs: 4984 -> 472 (-90.53%) total fills in shared programs: 6418 -> 1546 (-75.91%) Acked-by: Alyssa Rosenzweig <[email protected]> (v1) Reviewed-by: Alejandro Piñeiro <[email protected]> (v2) v2: Use the DAG datastructure, fold in the scheduling-for-parallelism patch, include SSA defs in live values so we can switch to bottom-up if we want. v3: Squash in improvements from Alejandro Piñeiro for getting V3D to successfully register allocate on GLES3.1 dEQP. Make sure that discards don't move after store_output. Comment spelling fix.
* nir: add load/store vectorizer testsRhys Perry2019-11-252-0/+1763
| | | | | | | | | | | v7: run nir_opt_algebraic v9: rework the callback function v9: update alignment on all loads/stores, even if they're not vectorized v10: add tests for 64-bit offsets v10: add tests for signed offsets Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Connor Abbott <[email protected]> (v9)
* nir: add a load/store vectorization passRhys Perry2019-11-253-0/+1313
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This pass combines intersecting, adjacent and identical loads/stores into potentially larger ones and will be used by ACO to greatly reduce the number of memory operations. v2: handle nir_deref_type_ptr_as_array v3: assume explicitly laid out types for derefs v4: create less deref casts v4: fix shared boolean vectorization v4: fix copy+paste error in resources_different v4: fix extract_subvector() to pass nir_load_store_vectorize_test.ssbo_load_intersecting_32_32_64 v4: rebase v5: subtract from deref/offset instead of scheduling offset calculations v5: various non-functional changes/cleanups v5: require less metadata and preserve more v5: rebase v6: cleanup and improve dependency handling v6: emit less deref casts v6: pass undef to components not set in the write_mask for new stores v7: fix 8-bit extract_vector() with 64-bit input v7: cleanup creation of store write data v7: update align correctly for when the bit size of load/store increases v7: rename extract_vector to extract_component and update comment v8: prevent combining of row-major matrix column acceses v9: rework process_block() to be able to vectorize more v9: rework the callback function v9: update alignment on all loads/stores, even if they're not vectorized v9: remove entry::store_value, since it will not be updated if it's was from a vectorized load v9: fix bug in subtract_deref(), causing artifacts in Dishonored 2 v9: handle nir_intrinsic_scoped_memory_barrier v10: use nir_ssa_scalar v10: handle non-32-bit offsets v10: use signed offsets for comparison v10: improve create_entry_key_from_offset() v10: support load_shared/store_shared v10: remove strip_deref_casts() v10: don't ever pass NULL to memcmp v10: remove recursion in gcd() v10: fix outdated comment v11: use the new nir_extract_bits() v12: remove use of nir_src_as_const_value in resources_different v13: make entry key hash function deterministic v13: simplify mask_sign_extend() v14: add comment in hash_entry_key() about hashing pointers Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Connor Abbott <[email protected]> (v9)
* nir: add nir_num_variable_modes and nir_var_mem_push_constRhys Perry2019-11-252-2/+9
| | | | | | | | | | These will be useful in the upcoming load/store vectorizer. v11: rebase Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>