aboutsummaryrefslogtreecommitdiffstats
path: root/src/compiler
Commit message (Collapse)AuthorAgeFilesLines
* nir: make nir_get_texture_size/lod available outside nir_lower_texGert Wollny2020-01-043-110/+117
| | | | | | | | This functions can be useful in other places. Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3286>
* spirv: Fix glsl type assert in spir2nir.Bas Nieuwenhuizen2020-01-041-0/+4
| | | | | | Fixes: 624789e3708 "compiler/glsl: handle case where we have multiple users for types" Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* nir/zink: move clip_halfz-lowering to common codeErik Faye-Lund2020-01-034-0/+81
| | | | | | | | Etnaviv also does the same thing, so let's try to avoid repetition here, and use the same for it code as well. Reviewed-by: Jonathan Marek <[email protected]> Tested-by: Paul Cercueil <[email protected]>
* st/nir: Optionally unify inputs_read/outputs_written when linking.Kenneth Graunke2020-01-031-0/+6
| | | | | | | | | | | | | i965 and iris use inputs_read/outputs_written for a shader stage to determine the layout of input and output storage. Adjacent stages must agree on the layout, so adjacent input/output bitfields must match. This patch adds a new nir_shader_compiler_options::unify_interfaces flag which asks the linker to unify the input/output interfaces between adjacent stages. Reviewed-by: Timothy Arceri <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3249>
* nir: print non-uniform tex fields.Bas Nieuwenhuizen2020-01-021-0/+8
| | | | | | | | To ease debugging in the future. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3246> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3246>
* nir: Add clone/hash/serialize support for non-uniform tex instructions.Bas Nieuwenhuizen2020-01-023-1/+12
| | | | | | | | | | | | | These were missed when the fields got added. Added it everywhere where texture_index got used and it made sense. Found this in "The Surge 2", where the inliner does not copy the fields, resulting in corruption and hangs. Fixes: 3bd54576415 "nir: Add a lowering pass for non-uniform resource access" Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1203 Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3246>
* glsl: Set .flat for gl_FrontFacingAlyssa Rosenzweig2019-12-301-4/+7
| | | | | | | | | It is a boolean. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3237> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3237>
* android: nir: add a load/store vectorization passMauro Rossi2019-12-271-0/+1
| | | | | | | | | | | Fixes the following aco building error: external/mesa/src/amd/compiler/aco_instruction_selection_setup.cpp:846: error: undefined reference to 'nir_opt_load_store_vectorize' Fixes: ce9205c ("nir: add a load/store vectorization pass") Signed-off-by: Mauro Rossi <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* nir: sanitize work group intrinsics to always be 32-bit.Dave Airlie2019-12-271-0/+4
| | | | | | This saves handling them in the backend later. Reviewed-by: Karol Herbst <[email protected]>
* nir+vtn: vec8+vec16 supportRob Clark2019-12-2114-24/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces new vec8 and vec16 instructions (which are the only instructions taking more than 4 sources), in order to construct 8 and 16 component vectors. In order to avoid fixing up the non-autogenerated nir_build_alu() sites and making them pass 16 src args for the benefit of the two instructions that take more than 4 srcs (ie vec8 and vec16), nir_build_alu() is has nir_build_alu_tail() split out and re-used by nir_build_alu2() (which is used for the > 4 src args case). v2 (Karol Herbst): use nir_build_alu2 for vec8 and vec16 use python's array multiplication syntax add nir_op_vec helper simplify nir_vec nir_build_alu_tail -> nir_builder_alu_instr_finish_and_insert use nir_build_alu for opcodes with <= 4 sources v3 (Karol Herbst): fix nir_serialize v4 (Dave Airlie): fix serialization of glsl_type handle vec8/16 in lowering of bools v5 (Karol Herbst): fix load store vectorizer Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* nir/serialize: cast swizzle before shiftingKarol Herbst2019-12-211-1/+1
| | | | | | | fixes undefined behaviour with enabled vec16 Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* spirv: Implement SPV_KHR_non_semantic_infoCaio Marcelo de Oliveira Filho2019-12-191-0/+29
| | | | | | | | | | | | Do nothing for OpExtInst from extended instruction sets that name start with "NonSemantic.". Since they can be used within the "preamble" to annotate global decorations, also don't stop iterating when one of them is found. Reviewed-by: Jason Ekstrand <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3154> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3154>
* nir: fix assign_io_var_locations for vertex inputsJonathan Marek2019-12-191-3/+9
| | | | | | | | | | Also fixes fragment inputs using the wrong "base" value (which was working only because FRAG_RESULT_DATA0 is less than VARYING_SLOT_VAR0) Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3108> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3108>
* Revert "nir/lower_double_ops: relax lower mod()"Juan A. Suarez Romero2019-12-191-15/+6
| | | | | | | | | This reverts commit 8172b1fa03fe74165728bfb182c98a3e62193d2b. This commit was done taking in account Vulkan spec, but did not realize it was affecting OpenGL too. Closes: #2252
* nir/lower_double_ops: relax lower mod()Juan A. Suarez Romero2019-12-191-6/+15
| | | | | | | | | | | | | | | | Currently when lowering mod() we add an extra instruction so if mod(a,b) == b then 0 is returned instead of b, as mathematically mod(a,b) is in the interval [0, b). But Vulkan spec has relaxed this restriction, and allows the result to be in the interval [0, b]. This commit takes this in account to remove the extra instruction required to return 0 instead. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2922> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2922>
* nir: add option to lower half packing opcodesJonathan Marek2019-12-162-0/+14
| | | | | | | Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3106>
* v3d: handle writes to gl_Layer from geometry shadersIago Toral Quiroga2019-12-161-0/+4
| | | | | | | | | | | | | | | | When geometry shaders write a value to gl_Layer that doesn't correspond to an existing layer in the target framebuffer the rendering behavior is undefined according to the spec, however, there are CTS tests that trigger this scenario on purpose, probably to ensure that nothing terrible happens. For V3D, this situation is problematic because the binner uses the layer index to select the offset to write into the tile state data, and we only allocate tile state for MAX2(num_layers, 1), so we want to make sure we don't produce values that would lead to out of bounds writes. The simulator has an assert to catch this, although we haven't observed issues in actual hardware it is probably best to play safe. Reviewed-by: Alejandro Piñeiro <[email protected]>
* nir/opt_peephole_select: remove unused variablesAlejandro Piñeiro2019-12-131-4/+0
| | | | | | To avoid "unused variable" warnings. Reviewed-by: Ian Romanick <[email protected]>
* st/glsl_to_nir: use nir based program resource list builderTimothy Arceri2019-12-134-5/+12
| | | | | | | | | | | | | | | | | Here we use the NIR based builder to add everything to the resource list execpt for SSO packed varyings. Since the details of those varyings get lost during packing we leave the special handing to the GLSL IR pass for now. In order to do this we add some bools to the build resource list functions. Using the NIR based resource list builder gets us a step closer to using a native NIR based linker. It should also be faster than the GLSL IR builder, one because the NIR optimisations should mean we add less entries due to better optimisations, and two because nir gives us better lists to work with and we don't need to walk the entire IR to find the resources. Ack-by: Alejandro Piñeiro <[email protected]>
* glsl: add subroutine support to nir_build_program_resource_list()Timothy Arceri2019-12-131-2/+31
| | | | | | | This is required so we can use the NIR linker to link GLSL in addition to spirv. Reviewed-by: Alejandro Piñeiro <[email protected]>
* glsl: add support for named varyings in nir_build_program_resource_list()Timothy Arceri2019-12-131-15/+286
| | | | | | | | | | | This adds support for adding names of varying to the resource list which is required for us to use this function with the glsl linker. Support for names is optional for spirv which is why it had not been added yet. This is mostly a copy of the GLSL IR code adapted to nir. Reviewed-by: Alejandro Piñeiro <[email protected]>
* glsl: copy the new data fields when converting to nirTimothy Arceri2019-12-131-0/+4
| | | | | | | These fields added in the previous commit will be used to make use of a NIR based GLSL linker. Reviewed-by: Alejandro Piñeiro <[email protected]>
* nir: add some fields to nir_variable_dataTimothy Arceri2019-12-131-0/+28
| | | | | | These will be used to provide NIR linking functionality to GLSL. Reviewed-by: Alejandro Piñeiro <[email protected]>
* glsl: copy the how_declared field when converting to nirTimothy Arceri2019-12-131-0/+10
| | | | | | This is needed to make use of nir_build_program_resource_list(). Reviewed-by: Alejandro Piñeiro <[email protected]>
* glsl: move nir_remap_dual_slot_attributes() call out of glsl_to_nir()Timothy Arceri2019-12-131-7/+0
| | | | | | | | | In order to be able to implement a NIR based glsl linker we need to build the program resource list with NIR. This change delays the remaping so that a later commit can call the NIR based resource list builder. Reviewed-by: Alejandro Piñeiro <[email protected]>
* nir: Don't copy empty arrayTomeu Vizoso2019-12-121-2/+4
| | | | | | | | | | | | | | | | | | | | | | | It's undefined behavior UBSAN complains about, so fixing this will reduce the noise a bit. ../src/compiler/nir/nir_clone.c:710:4: runtime error: null pointer passed as argument 2, which is declared to never be null"} #0 0xac781be4 in clone_function ../src/compiler/nir/nir_clone.c:710"} #1 0xac781be4 in nir_shader_clone ../src/compiler/nir/nir_clone.c:740"} #2 0xacf99442 in panfrost_shader_compile ../src/gallium/drivers/panfrost/pan_assemble.c:54"} #3 0xacf6b268 in panfrost_bind_shader_state ../src/gallium/drivers/panfrost/pan_context.c:1960"} #4 0xaae326bc in set_fragment_shader ../src/mesa/state_tracker/st_cb_clear.c:135"} #5 0xaae326bc in clear_with_quad ../src/mesa/state_tracker/st_cb_clear.c:335"} #6 0xaae326bc in st_Clear ../src/mesa/state_tracker/st_cb_clear.c:518"} #7 0x494d0e in deqp::gles2::TestCaseWrapper::iterate(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x2ad0e)"} #8 0x7f9cf2 in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x38fcf2)"} #9 0x7fa5f0 in tcu::TestSessionExecutor::iterate() (/deqp/modules/gles2/deqp-gles2+0x3905f0)"} #10 0x7e1aac in tcu::App::iterate() (/deqp/modules/gles2/deqp-gles2+0x377aac)"} #11 0x492d4c in main (/deqp/modules/gles2/deqp-gles2+0x28d4c)"} #12 0xb64b9aa8 in __libc_start_main (/lib/arm-linux-gnueabihf/libc.so.6+0x1aaa8)"} Signed-off-by: Tomeu Vizoso <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* vtn/opencl: add shuffle/shuffle supportDave Airlie2019-12-121-1/+52
| | | | | | | This adds nir encoding for these, generating them from libclc was very expensive, and this is a lot simpler. Reviewed-by: Karol Herbst <[email protected]>
* vtn: convert vload/store to single value loopsDave Airlie2019-12-121-11/+20
| | | | | | | There is an alignment issue doing this the other way, the spec clearly says vload/store don't require alignment. Reviewed-by: Karol Herbst <[email protected]>
* nir: handle nir_deref_type_ptr_as_array in rematerialize_deref_in_blockKarol Herbst2019-12-111-0/+1
| | | | | | | | | I forgot why that was required, but it still is the correct thing to do. Hit it at some point when working on implementing more CL features. Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* spirv: add OpLifetime*Rob Clark2019-12-111-0/+4
| | | | | | | | | These are just hints so we can ignore them. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* spirv: handle UniformConstant for OpenCL kernelsKarol Herbst2019-12-113-2/+19
| | | | | | | | | | | | | | | The caller is responsible for setting up the ubo_addr_format value as contrary to shared and global, it's not controlled by the spirv. Right now clovers implementation of CL constant memory uses a 24/8 bit format to encode the buffer index and offset, but that code is dead as all backends treat constants as global memory to workaround annoying issues within OpenCL. Maybe that will change, maybe not. But just in case somebody wants to look at it, add a toggle for this inside vtn. Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* nir/tests: MSVC build fixKarol Herbst2019-12-111-14/+11
| | | | | | Fixes: 11f736a6f9c "nir/tests: add serializer tests" Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* nir/tests: add serializer testsKarol Herbst2019-12-112-0/+299
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/serialize: fix vec8 and vec16Karol Herbst2019-12-111-12/+17
| | | | | | | | | | | | | | | | | Nir serializes uses nir_ssa_alu_instr_src_components in a few places to determine how many components a src has, but that's not what this function returns. It simply returns how many channels are used, which is still fine for most of the code. This was breaking code like this: vec16 32 ssa_1 = intrinsic load_global vec1 32 ssa_2 = fmax ssa_1.a, ssa_2.b v2: make the 16bit encoding work for identify swizzles again Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* compiler/spirv: Fix uses of gnu struct = {} extensionPierre Moreau2019-12-111-1/+1
| | | | | | | Fixes: a24d6fbae60 ("meson: Add -Werror=gnu-empty-initializer to MSVC compat args") Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Tested-by: Vinson Lee <[email protected]> Signed-off-by: Pierre Moreau <[email protected]>
* glsl/nir: iterate the system values list when adding varyingsTimothy Arceri2019-12-051-25/+36
| | | | | | | | | | | | Iterate the system values list when adding varyings to the program resource list in the NIR linker. This is needed to avoid CTS regressions when using the NIR to build the GLSL resource list in an upcoming series. Presumably it also fixes a bug with the current ARB_gl_spirv support. Fixes: ffdb44d3a0a2 ("nir/linker: Add inputs/outputs to the program resource list") Reviewed-by: Alejandro Piñeiro <[email protected]>
* glsl/tests: Use splitlines() instead of strip()Michel Dänzer2019-12-051-2/+2
| | | | | | | | | strip() removes leading and trailing newlines, but leaves newlines between multiple lines in the string. This could cause failures when comparing the output of cross-compiled Windows binaries (producing Windows-style newlines) to the expected output with Unix-style newlines. Reviewed-by: Dylan Baker <[email protected]>
* glsl: make use of active_shader_mask when building resource listTimothy Arceri2019-12-051-12/+1
| | | | | | | This allows us to avoid walking the entire IR looking for used uniforms. Reviewed-by: Tapani Pälli <[email protected]>
* glsl: don't set uniform block as used when its notTimothy Arceri2019-12-052-2/+10
| | | | | | | | | | | The spec requires unused uniform block to be set as active in the program resource list. To support this we tell opt dead code not to remove them. However we can mark them as unused internally and avoid unnecessarily state changes. This change is also required for the folowing clean-up patch. Reviewed-by: Tapani Pälli <[email protected]>
* glsl: move calculate_array_size_and_stride() to link_uniforms.cppTimothy Arceri2019-12-052-216/+218
| | | | | | | | This is where all the other uniform values are populated so it makes much more sense here. Moving it will also allow us to better share code between the NIR and GLSL IR resource list builders. Reviewed-by: Tapani Pälli <[email protected]>
* nir/lower_clip: Fix incorrect driver loc for clipdist outputsRob Clark2019-12-041-0/+11
| | | | | | | | | | | Somehow adjusting maxloc based on existing outputs got lost, resulting in the clipdist varying clobbering the position varying. Causing a shader that had no position output in freedreno/ir3, which triggers GPU hangs in neverball. Fixes: d0f746b6458 ("nir: Save nir_variable pointers in nir_lower_clip_vs rather than locs.") Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* glsl: additional interface redeclaration check for SSO programsTapani Pälli2019-12-041-0/+54
| | | | | | | | | | | Patch adds additional linker check for SSO programs to make sure they are redeclaring built-in blocks as required by the desktop spec. This fixes following Piglit tests: arb_separate_shader_objects/linker/pervertex-* Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* nir/load_store_vectorize: fix combining stores with aliasing loads betweenRhys Perry2019-12-042-2/+16
| | | | | | | | | v2: add test Fixes: ce9205c03bd ('nir: add a load/store vectorization pass') Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Daniel Schürmann <[email protected]> (v1) Reviewed-by: Connor Abbott <[email protected]> (v2)
* nir/algebraic: Rearrange bcsel sequences generated by nir_opt_peephole_selectIan Romanick2019-12-021-0/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reviewed-by: Matt Turner <[email protected]> All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 14660366 -> 14653437 (-0.05%) instructions in affected programs: 316166 -> 309237 (-2.19%) helped: 905 HURT: 10 helped stats (abs) min: 1 max: 36 x̄: 7.67 x̃: 6 helped stats (rel) min: 0.13% max: 18.75% x̄: 4.28% x̃: 3.60% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.10% max: 1.33% x̄: 0.70% x̃: 0.97% 95% mean confidence interval for instructions value: -7.91 -7.23 95% mean confidence interval for instructions %-change: -4.46% -3.99% Instructions are helped. total cycles in shared programs: 228571646 -> 228549759 (<.01%) cycles in affected programs: 56239919 -> 56218032 (-0.04%) helped: 681 HURT: 216 helped stats (abs) min: 1 max: 5156 x̄: 45.49 x̃: 10 helped stats (rel) min: <.01% max: 10.45% x̄: 1.29% x̃: 0.65% HURT stats (abs) min: 1 max: 320 x̄: 42.09 x̃: 14 HURT stats (rel) min: <.01% max: 37.04% x̄: 1.38% x̃: 0.49% 95% mean confidence interval for cycles value: -41.51 -7.29 95% mean confidence interval for cycles %-change: -0.80% -0.49% Cycles are helped. LOST: 1 GAINED: 0
* nir/algebraic: Simplify some Inf and NaN avoidance codeIan Romanick2019-12-021-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since a is non-negative, neither fsqrt nor frsq should return NaN. frsq should only return Inf when fsqrt returns 0. The changes are pretty small, but this turns a few hundred hurt shaders in the next patch into helped shaders. An alternative to the intBitsToFloat is to import numpy and do np.finfo(np.float32).max. That's more explicit, but we may also want to have specific bit encodings of float values later. I could be convinced either way, but intBitsToFloat(0x7f7fffff) was what I implemented first. Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Matt Turner <[email protected]> All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 14661140 -> 14661104 (<.01%) instructions in affected programs: 7520 -> 7484 (-0.48%) helped: 36 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.32% max: 0.61% x̄: 0.49% x̃: 0.52% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.52% -0.47% Instructions are helped. total cycles in shared programs: 228585416 -> 228584806 (<.01%) cycles in affected programs: 56321 -> 55711 (-1.08%) helped: 32 HURT: 0 helped stats (abs) min: 2 max: 98 x̄: 19.06 x̃: 10 helped stats (rel) min: 0.08% max: 6.41% x̄: 1.09% x̃: 0.65% 95% mean confidence interval for cycles value: -28.32 -9.80 95% mean confidence interval for cycles %-change: -1.63% -0.54% Cycles are helped. Sandy Bridge total cycles in shared programs: 152991077 -> 152991075 (<.01%) cycles in affected programs: 11525 -> 11523 (-0.02%) helped: 2 HURT: 2 helped stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3 helped stats (rel) min: 0.07% max: 0.11% x̄: 0.09% x̃: 0.09% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08% 95% mean confidence interval for cycles value: -5.27 4.27 95% mean confidence interval for cycles %-change: -0.16% 0.15% Inconclusive result (value mean confidence interval includes 0). No changes on Iron Lake or GM45.
* nir/opt_peephole_select: Don't count some unary operationsIan Romanick2019-12-021-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In many cases, fsat, fneg, fabs, ineg, and iabs will get folded into another instruction as either source or destination modifiers. Counting them as instructions means that some if-statements won't get converted to selects. For example, vec1 32 ssa_25 = flt32 ssa_0, ssa_23.x /* succs: block_1 block_2 */ if ssa_25 { block block_1: /* preds: block_0 */ vec1 32 ssa_26 = fabs ssa_24 vec1 32 ssa_27 = fneg ssa_26 vec1 32 ssa_28 = fabs ssa_20 vec1 32 ssa_29 = fneg ssa_28 vec1 32 ssa_30 = fmul ssa_27, ssa_29 vec1 32 ssa_31 = fsat ssa_30 /* succs: block_3 */ } else { block block_2: /* preds: block_0 */ /* succs: block_3 */ } block block_3: /* preds: block_1 block_2 */ block_1 isn't really 6 instructions, but it will be counted that way. Most callers of the peephole_select pass use either 1 or 8. It's very easy to blow way past either of these limits with things that are really only one or two actual instructions. I also tried some fancier things like making sure the fsat was of another SSA def from the same block, but the simple test was actually better. The i965 back-end SEL peephole pass still helps ~700 shaders in shader-db with this change. Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Matt Turner <[email protected]> All Gen6+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 14743694 -> 14738910 (-0.03%) instructions in affected programs: 156575 -> 151791 (-3.06%) helped: 1204 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 3.97 x̃: 3 helped stats (rel) min: 0.15% max: 19.57% x̄: 5.15% x̃: 4.55% 95% mean confidence interval for instructions value: -4.12 -3.82 95% mean confidence interval for instructions %-change: -5.35% -4.95% Instructions are helped. total cycles in shared programs: 231749141 -> 231602916 (-0.06%) cycles in affected programs: 2818975 -> 2672750 (-5.19%) helped: 876 HURT: 322 helped stats (abs) min: 2 max: 788 x̄: 180.99 x̃: 220 helped stats (rel) min: <.01% max: 43.82% x̄: 20.75% x̃: 19.44% HURT stats (abs) min: 1 max: 1188 x̄: 38.27 x̃: 20 HURT stats (rel) min: 0.09% max: 102.67% x̄: 5.17% x̃: 1.70% 95% mean confidence interval for cycles value: -130.47 -113.64 95% mean confidence interval for cycles %-change: -14.85% -12.72% Cycles are helped. total sends in shared programs: 730495 -> 730491 (<.01%) sends in affected programs: 46 -> 42 (-8.70%) helped: 2 HURT: 0 Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8122757 -> 8122617 (<.01%) instructions in affected programs: 14716 -> 14576 (-0.95%) helped: 46 HURT: 1 helped stats (abs) min: 1 max: 8 x̄: 3.07 x̃: 3 helped stats (rel) min: 0.36% max: 10.00% x̄: 2.54% x̃: 1.06% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.59% max: 1.59% x̄: 1.59% x̃: 1.59% 95% mean confidence interval for instructions value: -3.42 -2.54 95% mean confidence interval for instructions %-change: -3.28% -1.62% Instructions are helped. total cycles in shared programs: 188510100 -> 188509780 (<.01%) cycles in affected programs: 58994 -> 58674 (-0.54%) helped: 32 HURT: 1 helped stats (abs) min: 2 max: 96 x̄: 10.06 x̃: 6 helped stats (rel) min: 0.05% max: 15.29% x̄: 1.37% x̃: 0.31% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.68% max: 0.68% x̄: 0.68% x̃: 0.68% 95% mean confidence interval for cycles value: -16.34 -3.06 95% mean confidence interval for cycles %-change: -2.46% -0.15% Cycles are helped.
* nir/lower_io_to_vector: don't create arrays when not neededRhys Perry2019-12-021-1/+7
| | | | | | | | | | | | Some backends require that there are no array varyings. If there were no arrays in the input shader, the pass shouldn't have to create new ones. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2103 Fixes: bcd14756eec ('nir/lower_io_to_vector: add flat mode') Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/samplers: don't zero samplers_used/txf.Dave Airlie2019-12-021-3/+0
| | | | | | | | | | | This allows this pass to be run multiple times and the results are just or'ed together. It fixes on test on llvmpipe nir, and regresses none. Suggested by Kenneth Reviewed-by: Marek Olšák <[email protected]>
* glsl: handle max uniform limits with lower_const_arrays_to_uniformsTapani Pälli2019-11-283-5/+40
| | | | | | | Fixes arb_tessellation_shader-large-uniforms Piglit test. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* driconf, glsl: Add a vs_position_always_invariant optionKenneth Graunke2019-11-271-0/+6
| | | | | | | | | | | | | | | | | | | | Many applications use multi-pass rendering and require their vertex shader position to be computed the same way each time. Optimizations may consider, say, fusing a multiply-add based on global usage of an expression in a shader. But a second shader with the same expression may have different code, causing that optimization to make the other choice the second time around. The correct solution is for applications to mark their VS outputs 'invariant', indicating they need multiple shaders to compute that output in the same manner. However, most applications fail to do so. So, we add a new driconf option - vs_position_always_invariant - which forces the gl_Position output in vertex shaders to be marked invariant. Fixes: 7025dbe794b ("nir: Skip emitting no-op movs from the builder.") Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]>