summaryrefslogtreecommitdiffstats
path: root/src/compiler
Commit message (Collapse)AuthorAgeFilesLines
* glsl: Fix out of bounds read in shader_cache_read_program_metadataKenneth Graunke2019-06-171-3/+2
| | | | | | | | | | | | | | | | | | The VaryingNames array has NumVaryings entries. But BufferStride is a small array of MAX_FEEDBACK_BUFFERS (4) entries. Programs with more than 4 varyings would read out of bounds. Also, BufferStride is set based on the shader itself, which means that it's inherently already included in the hash, and doesn't need to be included again. At the point when shader_cache_read_program_metadata is called, the linker hasn't even set those fields yet. So, just drop it entirely. Fixes valgrind errors in KHR-GL45.transform_feedback.linking_errors_test. Fixes: 6d830940f78 glsl/shader_cache: Allow shader cache usage with transform feedback Reviewed-by: Timothy Arceri <[email protected]>
* glsl: Set default precision on record membersNeil Roberts2019-06-141-1/+10
| | | | | | | | | | | | | | Record types have their own slot to store the precision for each member in glsl_struct_field. Previously if the member didn’t have an explicit precision qualifier this was being left as GLSL_PRECISION_NONE. This patch makes it take into account the type’s default precision qualifier like it does for regular variables in apply_type_qualifier_to_variable. This has the additional benefit of correctly reporting an error when a float type is used in a struct without declaring the default type. Reviewed-by: Eric Anholt <[email protected]>
* glsl/linker: Make precision matching optional in intrastage_matchNeil Roberts2019-06-143-8/+24
| | | | | | | | | | This function is confusingly also used to match interstage interfaces as well as intrastage. In the interstage case it needs to avoid comparing the precisions. This patch adds a parameter to specify whether to take the precision into account or not so that it can be used for both cases. Reviewed-by: Eric Anholt <[email protected]>
* glsl/linker: Don’t check precision for shader interfaceNeil Roberts2019-06-141-2/+5
| | | | | | | | | | | | | On GLES, the interface between vertex and fragment shaders doesn’t need to have matching precision. Section 4.3.10 of the GLSL ES 3.00 spec: “The type of vertex outputs and fragment inputs with the same name must match, otherwise the link command will fail. The precision does not need to match.” Reviewed-by: Eric Anholt <[email protected]>
* compiler/types: Making comparing record precision optionalNeil Roberts2019-06-142-5/+53
| | | | | | | | | | | | | | | On GLES, the interface between vertex and fragment shaders doesn’t need to have matching precision. This adds an extra argument to glsl_types::record_compare to disable the precision comparison. This will later be used for the shader interface check. In order to make this work this patch also adds a helper function to recursively compare types while ignoring the precision. v2: Call record_compare from within compare_no_precision to avoid duplicating code (Eric Anholt). Reviewed-by: Eric Anholt <[email protected]>
* nir: detect more dynamically uniform expressionsIago Toral Quiroga2019-06-141-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shader-db results for v3d: total instructions in shared programs: 9132728 -> 9119238 (-0.15%) instructions in affected programs: 596886 -> 583396 (-2.26%) helped: 1118 HURT: 224 total threads in shared programs: 234298 -> 234308 (<.01%) threads in affected programs: 10 -> 20 (100.00%) helped: 5 HURT: 0 total uniforms in shared programs: 3022949 -> 3022622 (-0.01%) uniforms in affected programs: 29163 -> 28836 (-1.12%) helped: 108 HURT: 37 total max-temps in shared programs: 1328030 -> 1327762 (-0.02%) max-temps in affected programs: 10097 -> 9829 (-2.65%) helped: 263 HURT: 15 total spills in shared programs: 3793 -> 3777 (-0.42%) spills in affected programs: 432 -> 416 (-3.70%) helped: 16 HURT: 0 total fills in shared programs: 4380 -> 4266 (-2.60%) fills in affected programs: 828 -> 714 (-13.77%) helped: 16 HURT: 0 Reviewed-by: Eric Anholt <[email protected]>
* nir: Don't manually index intrinsic index enumConnor Abbott2019-06-131-20/+20
| | | | | | | This fixes a rebase fail in ea51275e07b, and prevents it from happening again. There's no reason to do this manually. Reviewed-by: Jason Ekstrand <[email protected]>
* spirv/nir: add support for AMD_shader_ballot and Groups capabilityDaniel Schürmann2019-06-135-9/+136
| | | | | | | | This commit also renames existing AMD capabilities: - gcn_shader -> amd_gcn_shader - trinary_minmax -> amd_trinary_minmax Reviewed-by: Connor Abbott <[email protected]>
* nir: add intrinsics for AMD_shader_ballotDaniel Schürmann2019-06-133-0/+31
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir/spirv: add support for the SubgroupBallotKHR SPIR-V capabilityDaniel Schürmann2019-06-132-7/+13
| | | | | | This capability is required for the VK_EXT_shader_subgroup_ballot extension. Reviewed-by: Connor Abbott <[email protected]>
* nir/spirv: add support for the SubgroupVoteKHR SPIR-V capabilityDaniel Schürmann2019-06-132-4/+20
| | | | | | This capability is required for the VK_EXT_shader_subgroup_vote extension. Reviewed-by: Connor Abbott <[email protected]>
* glsl: Check order and uniqueness of interlock functionsCaio Marcelo de Oliveira Filho2019-06-104-4/+35
| | | | | | | | With this commit all remaining compilation tests in Piglit for ARB_fragment_shader_interlock will pass. Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Plamena Manolova <[email protected]>
* glsl: Make interlock builtins follow same compiler rules as barriersCaio Marcelo de Oliveira Filho2019-06-101-5/+10
| | | | | | | | | | | Generalize the barrier code to provide correct error messages for other builtins. Fixes most of piglit compilation tests for ARB_fragment_shader_interlock. Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Plamena Manolova <[email protected]>
* nir/opt_algebraic: Fix rules for imadsh_mix16Eduardo Lima Mitev2019-06-101-2/+2
| | | | | | | | | | | | | | | | | | The rules added in patch 3addd7c are inverted: It should be: (al * bh) << 16 + c instead of: (ah * bl) << 16 + c Fixes a number of regressions under dEQP-GLES31.functional.draw_indirect.compute_interop.large.* on Freedreno. Reviewed-by: Rob Clark <[email protected]>
* nir: fix s/&&/||/ typoEric Engestrom2019-06-071-1/+1
| | | | | | Fixes: cd73b6174b093b75f581 "nir/lower_to_source_mods: Stop turning add, sat, and neg into mov" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir_algebraic: Add basic optimizations for umul_low and imadsh_mix16Eduardo Lima Mitev2019-06-072-0/+55
| | | | | | | | | | | | | | | | | | For umul_low (al * bl), zero is returned if the low 16-bits word of either source is zero. for imadsh_mix16 (ah * bl << 16 + c), c is returned if either 'ah' or 'bl' is zero. A couple of nir_search_helpers are added: is_upper_half_zero() returns true if the highest word of all components of an integer NIR alu src are zero. is_lower_half_zero() returns true if the lowest word of all components of an integer nir alu src are zero. Reviewed-by: Eric Anholt <[email protected]>
* nir/opcodes: Add new 'umul_low' and 'imadsh_mix16' opcodesEduardo Lima Mitev2019-06-071-1/+14
| | | | | | | | | | | | | 'umul_low' is the low 32-bits of unsigned integer multiply. It maps directly to ir3's MULL_U. 'imadsh_mix16' is multiply add with shift and mix, an ir3 specific instruction that maps directly to ir3's IMADSH_M16. Both are necessary for the lowering of integer multiplication on Freedreno, which will be introduced later in this series. Reviewed-by: Eric Anholt <[email protected]>
* glsl/loop_analysis: Don't search for NULL variables in the hash tableJason Ekstrand2019-06-061-0/+3
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* nir/propagate_invariant: Don't add NULL vars to the hash tableJason Ekstrand2019-06-061-1/+10
| | | | | | | Fixes: 8410cf66d "nir/propagate_invariant: Skip unknown vars" Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: Combine lower_fmod16/32 back into a single lower_fmod.Kenneth Graunke2019-06-052-5/+4
| | | | | | | | | | | | | | We originally had a single lower_fmod option. In commit 2ab2d2e5, Sam split 32 and 64-bit lowering into separate flags, with the rationale that some drivers might want different options there. This left 16-bit unhandled, so Iago added a lower_fmod16 option in commit ca31df6f. Now that lower_fmod64 is gone (in favor of nir_lower_doubles and nir_lower_dmod), we re-combine lower_fmod16 and lower_fmod32 into a single lower_fmod flag again. I'm not aware of any hardware which need lowering for one bitsize and not the other. Reviewed-by: Marek Olšák <[email protected]>
* nir: Drop lower_fmod64 option.Kenneth Graunke2019-06-052-2/+0
| | | | | | | | nir_lower_doubles offers a wide variety of fp64 lowering, including lowering fmod@64. The version there also better handles imprecisions due to lowered frcp@64. Let's consolidate on one version. Reviewed-by: Marek Olšák <[email protected]>
* nir: Don't replace the nir_shader when NIR_TEST_SERIALIZE=1Jason Ekstrand2019-06-052-10/+16
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108957 Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* nir: Don't replace the nir_shader when NIR_TEST_CLONE=1Jason Ekstrand2019-06-052-2/+42
| | | | | | | | | | | | Instead, we add a new helper which stomps one nir_shader and replaces it with another. The new helper effectively just changes which pointer gets used for the base nir_shader. It should be 99% as good at testing cloning but without requiring that everything handle having the shader swapped out from under it constantly. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108957 Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* nir/algebraic: Simplify max(abs(a), 0.0) -> abs(a)Alyssa Rosenzweig2019-06-041-0/+1
| | | | | | | | | | | This pattern was noticed in glmark's jellyfish scene. v2: Add inexact qualifier due to NaN behaviour. Minimal shader-db changes (slightly helped). Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Elie Tournier <[email protected]>
* spirv: Update the OpenCL.std.h headerCaio Marcelo de Oliveira Filho2019-06-042-144/+339
| | | | | | | | | | | | This corresponds to commit 8b911bd2ba37677037b38c9bd286c7c05701bcda on GitHub. We previously tweaked OpenCL.std.h from upstream to be included in C code. Now upstream header can be included, however the symbol names are slightly different (include an OpenCLstd_ prefix), so this patch also fixes vtn_opencl.c to use those. Reviewed-by: Karol Herbst <[email protected]>
* spirv: Implement SPV_EXT_fragment_shader_interlockJason Ekstrand2019-06-042-0/+38
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* spirv: Update the headers from latest Khronos masterJason Ekstrand2019-06-042-3/+330
| | | | | | | This corresponds to 8b911bd2ba37677037b38c9bd286c7c05701bcda in https://github.com/KhronosGroup/SPIRV-Headers. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* spirv: Like Uniform, do nothing for UniformIdCaio Marcelo de Oliveira Filho2019-06-032-0/+3
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* spirv: Implement SpvOpCopyLogicalCaio Marcelo de Oliveira Filho2019-06-031-0/+2
| | | | | | | | This is the same as SpvOpCopyObject but without the type checking, which is how vtn_composite_copy works, so we just need to hook the operation. Reviewed-by: Jason Ekstrand <[email protected]>
* spirv: Generalize OpSelectCaio Marcelo de Oliveira Filho2019-06-031-38/+48
| | | | | | | | | | | | | | | SPIR-V 1.4 supports OpSelect over any composite type, and also allows scalar boolean condition for vector types -- a case which we already handled to support old GLSLang. Added a helper function to recursively perform nir_bcsel, that makes easier to support structs. v2: Replace asserts() with vtn_fail_if(). (Jason) v3: Simplify Condition and Result types verifications. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* spirv: Move OpSelect handling to a functionCaio Marcelo de Oliveira Filho2019-06-031-60/+66
| | | | | | This will make a later change easier to review. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/vars_to_ssa: Handle UNDEF_NODE in more placesCaio Marcelo de Oliveira Filho2019-06-031-4/+8
| | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110832 Fixes: 911ea2c66fc "nir/vars_to_ssa: Use a non-null UNDEF_NODE pointer" Reviewed-by: Jason Ekstrand <[email protected]>
* spirv: Implement OpPtrEqual, OpPtrNotEqual and OpPtrDiffCaio Marcelo de Oliveira Filho2019-06-031-0/+64
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add functions to subtract and compare addressesCaio Marcelo de Oliveira Filho2019-06-032-0/+54
| | | | | | | v2: Fix comparing addresses from formats that have more than one component by using nir_ball_iequal(). (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add nir_ball_iequal() helperCaio Marcelo de Oliveira Filho2019-06-031-0/+13
| | | | | | Similar to nir_bany_inequal(). Suggested by Jason. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: copy intrinsic type when lowering load input/uniform and store outputJonathan Marek2019-06-031-0/+2
| | | | | | | | | Fixes: c1275052 "nir: add type information to load uniform/input and store output intrinsics" Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Erico Nunes <[email protected]> Tested-by: Erico Nunes <[email protected]> Tested-by: Andreas Baierl <[email protected]>
* nir: Return nir_type_invalid for non-numeric base typesCaio Marcelo de Oliveira Filho2019-05-311-2/+14
| | | | | | | | | Now that the type gathering function look at instructions that might have other types, return invalid type instead of crashing. That invalid will be properly ignored later. Fixes: c12750527b7 "nir: add type information to load uniform/input and store output intrinsics" Reviewed-by: Jason Ekstrand <[email protected]>
* nir: remove bool lowering from lower_int_to_floatJonathan Marek2019-05-311-71/+42
| | | | | | | | | | | | | | Removes the bool_to_float logic from the int_to_float pass, so that both can be used separately. By having separate passes we have better validation and it makes it possible to use with the lower_ftrunc option (int lowering generates ftrunc, but lower_ftrunc generates bools, ftrunc lowering should probably be reworked). For now we always expect lower_bool to come after lower_int. Also fixes f2i32 to become ftrunc and adds u2f/f2u cases. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: fix lower_{int,bool}_to_float for new mov opcodeJonathan Marek2019-05-312-0/+2
| | | | | | | It is treated like the vecN instructions which also have no type. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: add lower_bitshift optionJonathan Marek2019-05-312-3/+8
| | | | | | | | | Add a "lower_bitshift" option, which disables optimizations introducing bitshifts and lowers ishl by constant to a multiply, so that we don't have to deal with bitshifts in int_to_float lowering. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: fix gather_ssa_typesJonathan Marek2019-05-311-36/+52
| | | | | | | | | | | | | | Consts and undefs can be used as different types (common with "0" constant) so don't copy types from consts/undefs, only to them. It doesn't entirely solve the problem that the type given to the const could be wrong , but now the only realistic case is with "0" which is the same when casted to float, so it doesn't matter for lower_int_to_float. The other change is to get type information for load input/uniform and store output, and use that to get correct results. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: add type information to load uniform/input and store output intrinsicsJonathan Marek2019-05-314-10/+42
| | | | | | | This type information will be used by gather_ssa_types to get usable results Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: improvements to native_integers removalJonathan Marek2019-05-311-10/+0
| | | | | | | | | Improvements related to the patch that removed native_integers: * In glsl_to_nir, special cases for i2f,u2f,etc are no longer needed * In prog_to_nir, use sge/slt and let lower_scmp lower it if needed Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/instr_set: Use _mesa_set_search_or_add()Connor Abbott2019-05-311-5/+3
| | | | | | | | | | | | | | | | | Before this change, we were searching for each instruction twice, once when checking if it exists and once when figuring out where to insert it. By using the new function, we can do everything we need to do in one operation. Compilation time numbers for my shader-db database: Difference at 95.0% confidence -4.04706 +/- 0.669508 -0.922142% +/- 0.151948% (Student's t, pooled s = 0.95824) Reviewed-by: Eric Anholt <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
* nir: Rematerialize compare instructionsIan Romanick2019-05-314-0/+185
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On some architectures, Boolean values used to control conditional branches or condtional selection must be propagated into a flag. This generally means that a stored Boolean value must be compared with zero. Rather than force the generation of extra compares with zero, re-emit the original comparison instruction. This can save register pressure by not needing to store the Boolean value. There are several possible ares for future improvement to this pass: 1. Be more conservative. If both sources to the comparison instruction are non-constants, it may be better for register pressure to emit the extra compare. The current shader-db results on Intel GPUs (next commit) lead me to believe that this is not currently a problem. 2. Be less conservative. Currently the pass requires that all users of the comparison match the pattern. The idea is that after the pass is complete, no instruction will use the resulting Boolean value. The only uses will be of the flag value. It may be beneficial to relax this requirement in some cases. 3. Be less conservative. Also try to rematerialize comparisons used for discard_if intrinsics. After changing the way the Intel compiler generates cod e for discard_if (see MR!935), I tried implementing this already. The changes were pretty small. Instructions were helped in 19 shaders, but, overall, cycles were hurt. A commit "nir: Rematerialize comparisons for nir_intrinsic_discard_if too" is on my fd.o cgit. 4. Copy the preceeding ALU instruction. If the comparison is a comparison with zero, and it is the only user of a particular ALU instruction (e.g., (a+b) != 0.0), it may be a further improvment to also copy the preceeding ALU instruction. On Intel GPUs, this may enable cmod propagation to make additional progress. v2: Use much simpler method to get the prev_block for an if-statement. Suggested by Tim. Reviewed-by: Matt Turner <[email protected]>
* nir: Add a shallow clone function for nir_alu_instrIan Romanick2019-05-312-0/+23
| | | | | | Reviewed-by: Jason Ekstrand <[email protected]> Suggested-by: Jason Ekstrand <[email protected]> Suggested-by: Matt Turner <[email protected]>
* nir: Actually propagate progress in nir_opt_move_load_ubo.Bas Nieuwenhuizen2019-05-311-1/+1
| | | | | | | | | Found with Jasons new metadata rework (https://gitlab.freedesktop.org/mesa/mesa/merge_requests/950). Fixes: af355aaa071 "nir: add nir_opt_move_load_ubo() optimization pass" Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* nir/split_vars: Properly bail in the presence of complex derefsJason Ekstrand2019-05-311-9/+106
| | | | | Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/vars_to_ssa: Properly ignore variables with complex derefsJason Ekstrand2019-05-311-14/+64
| | | | | | | | | | | | Because the core principle of the vars_to_ssa pass is that it globally (within a function) looks at all of the uses of a never-indirected path and does a full into-SSA on that path, it can't handle a path which has any chance of having aliasing. If a function_temp variable has a cast or anything else which may cause aliasing, we have to assume that all paths to that variable may alias and ignore the entire variable. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/vars_to_ssa: Use a non-null UNDEF_NODE pointerJason Ekstrand2019-05-311-3/+5
| | | | | | | | We're about to change the meaning of get_deref_node returning NULL so we need a non-NULL value to mean properly undefined. Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>