summaryrefslogtreecommitdiffstats
path: root/src/compiler
Commit message (Collapse)AuthorAgeFilesLines
* nir: fix s/&&/||/ typoEric Engestrom2019-06-071-1/+1
| | | | | | Fixes: cd73b6174b093b75f581 "nir/lower_to_source_mods: Stop turning add, sat, and neg into mov" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir_algebraic: Add basic optimizations for umul_low and imadsh_mix16Eduardo Lima Mitev2019-06-072-0/+55
| | | | | | | | | | | | | | | | | | For umul_low (al * bl), zero is returned if the low 16-bits word of either source is zero. for imadsh_mix16 (ah * bl << 16 + c), c is returned if either 'ah' or 'bl' is zero. A couple of nir_search_helpers are added: is_upper_half_zero() returns true if the highest word of all components of an integer NIR alu src are zero. is_lower_half_zero() returns true if the lowest word of all components of an integer nir alu src are zero. Reviewed-by: Eric Anholt <[email protected]>
* nir/opcodes: Add new 'umul_low' and 'imadsh_mix16' opcodesEduardo Lima Mitev2019-06-071-1/+14
| | | | | | | | | | | | | 'umul_low' is the low 32-bits of unsigned integer multiply. It maps directly to ir3's MULL_U. 'imadsh_mix16' is multiply add with shift and mix, an ir3 specific instruction that maps directly to ir3's IMADSH_M16. Both are necessary for the lowering of integer multiplication on Freedreno, which will be introduced later in this series. Reviewed-by: Eric Anholt <[email protected]>
* glsl/loop_analysis: Don't search for NULL variables in the hash tableJason Ekstrand2019-06-061-0/+3
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* nir/propagate_invariant: Don't add NULL vars to the hash tableJason Ekstrand2019-06-061-1/+10
| | | | | | | Fixes: 8410cf66d "nir/propagate_invariant: Skip unknown vars" Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: Combine lower_fmod16/32 back into a single lower_fmod.Kenneth Graunke2019-06-052-5/+4
| | | | | | | | | | | | | | We originally had a single lower_fmod option. In commit 2ab2d2e5, Sam split 32 and 64-bit lowering into separate flags, with the rationale that some drivers might want different options there. This left 16-bit unhandled, so Iago added a lower_fmod16 option in commit ca31df6f. Now that lower_fmod64 is gone (in favor of nir_lower_doubles and nir_lower_dmod), we re-combine lower_fmod16 and lower_fmod32 into a single lower_fmod flag again. I'm not aware of any hardware which need lowering for one bitsize and not the other. Reviewed-by: Marek Olšák <[email protected]>
* nir: Drop lower_fmod64 option.Kenneth Graunke2019-06-052-2/+0
| | | | | | | | nir_lower_doubles offers a wide variety of fp64 lowering, including lowering fmod@64. The version there also better handles imprecisions due to lowered frcp@64. Let's consolidate on one version. Reviewed-by: Marek Olšák <[email protected]>
* nir: Don't replace the nir_shader when NIR_TEST_SERIALIZE=1Jason Ekstrand2019-06-052-10/+16
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108957 Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* nir: Don't replace the nir_shader when NIR_TEST_CLONE=1Jason Ekstrand2019-06-052-2/+42
| | | | | | | | | | | | Instead, we add a new helper which stomps one nir_shader and replaces it with another. The new helper effectively just changes which pointer gets used for the base nir_shader. It should be 99% as good at testing cloning but without requiring that everything handle having the shader swapped out from under it constantly. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108957 Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* nir/algebraic: Simplify max(abs(a), 0.0) -> abs(a)Alyssa Rosenzweig2019-06-041-0/+1
| | | | | | | | | | | This pattern was noticed in glmark's jellyfish scene. v2: Add inexact qualifier due to NaN behaviour. Minimal shader-db changes (slightly helped). Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Elie Tournier <[email protected]>
* spirv: Update the OpenCL.std.h headerCaio Marcelo de Oliveira Filho2019-06-042-144/+339
| | | | | | | | | | | | This corresponds to commit 8b911bd2ba37677037b38c9bd286c7c05701bcda on GitHub. We previously tweaked OpenCL.std.h from upstream to be included in C code. Now upstream header can be included, however the symbol names are slightly different (include an OpenCLstd_ prefix), so this patch also fixes vtn_opencl.c to use those. Reviewed-by: Karol Herbst <[email protected]>
* spirv: Implement SPV_EXT_fragment_shader_interlockJason Ekstrand2019-06-042-0/+38
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* spirv: Update the headers from latest Khronos masterJason Ekstrand2019-06-042-3/+330
| | | | | | | This corresponds to 8b911bd2ba37677037b38c9bd286c7c05701bcda in https://github.com/KhronosGroup/SPIRV-Headers. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* spirv: Like Uniform, do nothing for UniformIdCaio Marcelo de Oliveira Filho2019-06-032-0/+3
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* spirv: Implement SpvOpCopyLogicalCaio Marcelo de Oliveira Filho2019-06-031-0/+2
| | | | | | | | This is the same as SpvOpCopyObject but without the type checking, which is how vtn_composite_copy works, so we just need to hook the operation. Reviewed-by: Jason Ekstrand <[email protected]>
* spirv: Generalize OpSelectCaio Marcelo de Oliveira Filho2019-06-031-38/+48
| | | | | | | | | | | | | | | SPIR-V 1.4 supports OpSelect over any composite type, and also allows scalar boolean condition for vector types -- a case which we already handled to support old GLSLang. Added a helper function to recursively perform nir_bcsel, that makes easier to support structs. v2: Replace asserts() with vtn_fail_if(). (Jason) v3: Simplify Condition and Result types verifications. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* spirv: Move OpSelect handling to a functionCaio Marcelo de Oliveira Filho2019-06-031-60/+66
| | | | | | This will make a later change easier to review. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/vars_to_ssa: Handle UNDEF_NODE in more placesCaio Marcelo de Oliveira Filho2019-06-031-4/+8
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110832 Fixes: 911ea2c66fc "nir/vars_to_ssa: Use a non-null UNDEF_NODE pointer" Reviewed-by: Jason Ekstrand <[email protected]>
* spirv: Implement OpPtrEqual, OpPtrNotEqual and OpPtrDiffCaio Marcelo de Oliveira Filho2019-06-031-0/+64
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add functions to subtract and compare addressesCaio Marcelo de Oliveira Filho2019-06-032-0/+54
| | | | | | | v2: Fix comparing addresses from formats that have more than one component by using nir_ball_iequal(). (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add nir_ball_iequal() helperCaio Marcelo de Oliveira Filho2019-06-031-0/+13
| | | | | | Similar to nir_bany_inequal(). Suggested by Jason. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: copy intrinsic type when lowering load input/uniform and store outputJonathan Marek2019-06-031-0/+2
| | | | | | | | | Fixes: c1275052 "nir: add type information to load uniform/input and store output intrinsics" Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Erico Nunes <[email protected]> Tested-by: Erico Nunes <[email protected]> Tested-by: Andreas Baierl <[email protected]>
* nir: Return nir_type_invalid for non-numeric base typesCaio Marcelo de Oliveira Filho2019-05-311-2/+14
| | | | | | | | | Now that the type gathering function look at instructions that might have other types, return invalid type instead of crashing. That invalid will be properly ignored later. Fixes: c12750527b7 "nir: add type information to load uniform/input and store output intrinsics" Reviewed-by: Jason Ekstrand <[email protected]>
* nir: remove bool lowering from lower_int_to_floatJonathan Marek2019-05-311-71/+42
| | | | | | | | | | | | | | Removes the bool_to_float logic from the int_to_float pass, so that both can be used separately. By having separate passes we have better validation and it makes it possible to use with the lower_ftrunc option (int lowering generates ftrunc, but lower_ftrunc generates bools, ftrunc lowering should probably be reworked). For now we always expect lower_bool to come after lower_int. Also fixes f2i32 to become ftrunc and adds u2f/f2u cases. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: fix lower_{int,bool}_to_float for new mov opcodeJonathan Marek2019-05-312-0/+2
| | | | | | | It is treated like the vecN instructions which also have no type. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: add lower_bitshift optionJonathan Marek2019-05-312-3/+8
| | | | | | | | | Add a "lower_bitshift" option, which disables optimizations introducing bitshifts and lowers ishl by constant to a multiply, so that we don't have to deal with bitshifts in int_to_float lowering. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: fix gather_ssa_typesJonathan Marek2019-05-311-36/+52
| | | | | | | | | | | | | | Consts and undefs can be used as different types (common with "0" constant) so don't copy types from consts/undefs, only to them. It doesn't entirely solve the problem that the type given to the const could be wrong , but now the only realistic case is with "0" which is the same when casted to float, so it doesn't matter for lower_int_to_float. The other change is to get type information for load input/uniform and store output, and use that to get correct results. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: add type information to load uniform/input and store output intrinsicsJonathan Marek2019-05-314-10/+42
| | | | | | | This type information will be used by gather_ssa_types to get usable results Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: improvements to native_integers removalJonathan Marek2019-05-311-10/+0
| | | | | | | | | Improvements related to the patch that removed native_integers: * In glsl_to_nir, special cases for i2f,u2f,etc are no longer needed * In prog_to_nir, use sge/slt and let lower_scmp lower it if needed Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/instr_set: Use _mesa_set_search_or_add()Connor Abbott2019-05-311-5/+3
| | | | | | | | | | | | | | | | | Before this change, we were searching for each instruction twice, once when checking if it exists and once when figuring out where to insert it. By using the new function, we can do everything we need to do in one operation. Compilation time numbers for my shader-db database: Difference at 95.0% confidence -4.04706 +/- 0.669508 -0.922142% +/- 0.151948% (Student's t, pooled s = 0.95824) Reviewed-by: Eric Anholt <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
* nir: Rematerialize compare instructionsIan Romanick2019-05-314-0/+185
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On some architectures, Boolean values used to control conditional branches or condtional selection must be propagated into a flag. This generally means that a stored Boolean value must be compared with zero. Rather than force the generation of extra compares with zero, re-emit the original comparison instruction. This can save register pressure by not needing to store the Boolean value. There are several possible ares for future improvement to this pass: 1. Be more conservative. If both sources to the comparison instruction are non-constants, it may be better for register pressure to emit the extra compare. The current shader-db results on Intel GPUs (next commit) lead me to believe that this is not currently a problem. 2. Be less conservative. Currently the pass requires that all users of the comparison match the pattern. The idea is that after the pass is complete, no instruction will use the resulting Boolean value. The only uses will be of the flag value. It may be beneficial to relax this requirement in some cases. 3. Be less conservative. Also try to rematerialize comparisons used for discard_if intrinsics. After changing the way the Intel compiler generates cod e for discard_if (see MR!935), I tried implementing this already. The changes were pretty small. Instructions were helped in 19 shaders, but, overall, cycles were hurt. A commit "nir: Rematerialize comparisons for nir_intrinsic_discard_if too" is on my fd.o cgit. 4. Copy the preceeding ALU instruction. If the comparison is a comparison with zero, and it is the only user of a particular ALU instruction (e.g., (a+b) != 0.0), it may be a further improvment to also copy the preceeding ALU instruction. On Intel GPUs, this may enable cmod propagation to make additional progress. v2: Use much simpler method to get the prev_block for an if-statement. Suggested by Tim. Reviewed-by: Matt Turner <[email protected]>
* nir: Add a shallow clone function for nir_alu_instrIan Romanick2019-05-312-0/+23
| | | | | | Reviewed-by: Jason Ekstrand <[email protected]> Suggested-by: Jason Ekstrand <[email protected]> Suggested-by: Matt Turner <[email protected]>
* nir: Actually propagate progress in nir_opt_move_load_ubo.Bas Nieuwenhuizen2019-05-311-1/+1
| | | | | | | | | Found with Jasons new metadata rework (https://gitlab.freedesktop.org/mesa/mesa/merge_requests/950). Fixes: af355aaa071 "nir: add nir_opt_move_load_ubo() optimization pass" Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* nir/split_vars: Properly bail in the presence of complex derefsJason Ekstrand2019-05-311-9/+106
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/vars_to_ssa: Properly ignore variables with complex derefsJason Ekstrand2019-05-311-14/+64
| | | | | | | | | | | | Because the core principle of the vars_to_ssa pass is that it globally (within a function) looks at all of the uses of a never-indirected path and does a full into-SSA on that path, it can't handle a path which has any chance of having aliasing. If a function_temp variable has a cast or anything else which may cause aliasing, we have to assume that all paths to that variable may alias and ignore the entire variable. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/vars_to_ssa: Use a non-null UNDEF_NODE pointerJason Ekstrand2019-05-311-3/+5
| | | | | | | | We're about to change the meaning of get_deref_node returning NULL so we need a non-NULL value to mean properly undefined. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/deref: Add a has_complex_use helperJason Ekstrand2019-05-312-0/+79
| | | | | | | | This lets passes easily detect derefs which have uses that fall outside the standard load/store/copy pattern so they can bail appropriately. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/dead_cf: Call instructions aren't deadJason Ekstrand2019-05-311-1/+1
| | | | | | | | | | | When we inlined cf_node_has_side_effects into node_is_dead, all the conditions flipped and we forgot to flip one. Fortunately, it doesn't matter right now because no one uses this pass on shaders with more than one function. Fixes: b50465d197 "nir/dead_cf: Inline cf_node_has_side_effects" Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* vtn: create cast with type stride.Dave Airlie2019-05-311-1/+1
| | | | | | | | | When creating function parameters, we create pointers from ssa values, this creates nir casts with stride 0, however we have no where else to get this value from. Later passes to lower explicit io need this stride value to do the right thing. Reviewed-by: Karol Herbst <[email protected]>
* nir: Accept nir_var_mem_global in derefs used by phisCaio Marcelo de Oliveira Filho2019-05-301-1/+2
| | | | | | | This mode is used by PhysicalStorageBufferEXT storage class. Fixes: 8bdf5a008b3 "nir: Allow derefs to be used as phi sources" Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: silence three compiler warnings seen with MinGWBrian Paul2019-05-293-5/+3
| | | | | | | Silence two unused var warnings. And init elem_size, elem_align to zero to silence "maybe uninitialized" warnings. Reviewed-by: Kristian H. Kristensen <[email protected]>
* spirv: Change spirv_to_nir() to return a nir_shaderCaio Marcelo de Oliveira Filho2019-05-293-12/+13
| | | | | | | | | | | | | | | spirv_to_nir() returned the nir_function corresponding to the entrypoint, as a way to identify it. There's now a bool is_entrypoint in nir_function and also a helper function to get the entry_point from a nir_shader. The return type reflects better what the function name suggests. It also helps drivers avoid the mistake of reusing internal shader references after running NIR_PASS on it. When using NIR_TEST_CLONE or NIR_TEST_SERIALIZE, those would be invalidated right in the first pass executed. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: Allow derefs to be used as phi sourcesCaio Marcelo de Oliveira Filho2019-05-293-2/+17
| | | | | | | | | | | | | | | | | | It is possible and valid for a pointer to be selected based on a conditional before used, and depending on the mode, those cases will result in a phi with derefs as sources. To achieve this, we don't rematerialize derefs that are used by phis. As a consequence, when converting from SSA to regs, we may have phis that come from different blocks and are used by phis. We now convert those to regs too. Validation was added to ensure only derefs of certain modes can be used as phi sources. No extra validation is needed for the presence of cast, any instruction that uses derefs will validate the deref-chain is complete (ending in a cast or a var). Reviewed-by: Jason Ekstrand <[email protected]>
* nir/lower_non_uniform: safely iterate over blocksLionel Landwerlin2019-05-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a problem where the same instruction gets replaced twice. This was happening when the replaced instruction would be at the end of a block. Replacement of : if ssa_8 { .... intrinsic bindless_image_store (ssa_44, ssa_16, ssa_0, ssa_15) (5, 0, 34836, 32) /* image_dim=Buf */ /* image_array=false */ /* format=34836 */ /* access=32 */ } Would be : if ssa_8 { loop { vec1 32 ssa_47 = intrinsic read_first_invocation (ssa_44) () vec1 1 ssa_48 = ieq ssa_47, ssa_44 if ssa_48 { loop { vec1 32 ssa_49 = intrinsic read_first_invocation (ssa_44) () vec1 1 ssa_50 = ieq ssa_49, ssa_44 if ssa_50 { intrinsic bindless_image_store (ssa_44, ssa_16, ssa_0, ssa_15) (5, 0, 34836, 32) /* image_dim=Buf */ /* image_array=false */ /* format=34836 */ /* access=32 */ break } else { .... } Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: 3bd545764151 ("nir: Add a lowering pass for non-uniform resource access") Reviewed-by: Jason Ekstrand <[email protected]>
* st/nir: Re-vectorize shader IOKenneth Graunke2019-05-281-0/+6
| | | | | | | | | | | | | | | | | | | | We scalarize IO to enable further optimizations, such as propagating constant components across shaders, eliminating dead components, and so on. This patch attempts to re-vectorize those operations after the varying optimizations are done. Intel GPUs are a scalar architecture, but IO operations work on whole vec4's at a time, so we'd prefer to have a single IO load per vector rather than 4 scalar IO loads. This re-vectorization can help a lot. Broadcom GPUs, however, really do want scalar IO. radeonsi may want this, or may want to leave it to LLVM. So, we make a new flag in the NIR compiler options struct, and key it off of that, allowing drivers to pick. (It's a bit awkward because we have per-stage settings, but this is about IO between two stages...but I expect drivers to globally prefer one way or the other. We can adjust later if needed.) Reviewed-by: Marek Olšák <[email protected]>
* nir: Drop imov/fmov in favor of one mov instructionJason Ekstrand2019-05-2425-54/+42
| | | | | | | | | | | | | | | | The difference between imov and fmov has been a constant source of confusion in NIR for years. No one really knows why we have two or when to use one vs. the other. The real reason is that they do different things in the presence of source and destination modifiers. However, without modifiers (which many back-ends don't have), they are identical. Now that we've reworked nir_lower_to_source_mods to leave one abs/neg instruction in place rather than replacing them with imov or fmov instructions, we don't need two different instructions at all anymore. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]> Acked-by: Rob Clark <[email protected]>
* nir/builder: Merge nir_[if]mov_alu into one nir_mov_alu helperJason Ekstrand2019-05-244-30/+12
| | | | | | | | Unless source modifiers are present, fmov and imov are the same. There's no good reason for having two helpers. Reviewed-by: Kristian H. Kristensen <[email protected]> Acked-by: Alyssa Rosenzweig <[email protected]>
* nir/lower_to_source_mods: Stop turning add, sat, and neg into movJason Ekstrand2019-05-242-40/+19
| | | | | Reviewed-by: Kristian H. Kristensen <[email protected]> Acked-by: Alyssa Rosenzweig <[email protected]>
* nir/source_mods: Add a helpers for setting source modifiersJason Ekstrand2019-05-241-6/+18
| | | | | | | | It's potentially a tiny bit less efficient but the helpers make it much easier to sort out the rules for updating source modifiers. Reviewed-by: Kristian H. Kristensen <[email protected]> Acked-by: Alyssa Rosenzweig <[email protected]>
* nir/builder: Remove the use_fmov parameter from nir_swizzleJason Ekstrand2019-05-248-33/+32
| | | | | | | | | | This flag has caused more confusion than good in most cases. You can validly use imov for floats or fmov for integers because, without source modifiers, neither modify their input in any way. Using imov for floats is more reliable so we go that direction. Reviewed-by: Kristian H. Kristensen <[email protected]> Acked-by: Alyssa Rosenzweig <[email protected]>