summaryrefslogtreecommitdiffstats
path: root/src/compiler/nir
Commit message (Collapse)AuthorAgeFilesLines
* nir: Optimize umod loweringSagar Ghuge2019-07-261-25/+23
| | | | | | | | | We don't have calculate final quotient in order to calculate unsigned modulo result. Once we are done with error correction we have partial result which can be used to find out modulo operation result Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: add access to image_deref intrinsicsLionel Landwerlin2019-07-261-0/+3
| | | | | | | | | | | SPIRV added the ability to access variables and have expressions non dynamically uniform and because spirv_to_nir generates deref instructions, we'll need to have that access there. Signed-off-by: Lionel Landwerlin <[email protected]> Cc: <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/algebraic: add scmp algebraic optimizationsJonathan Marek2019-07-241-0/+16
| | | | | | | | | | When 'x' is the result of a scmp op: x != 0.0 or x == 1.0: passthrough x == 0.0 or x != 1.0: invert Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir/algebraic: add option to lower fall_equalN/fany_nequalNJonathan Marek2019-07-242-0/+9
| | | | | | | | | Add generic lowerings for fall_equalN/fany_nequalN. These should be optimal for vec4 backends that doesn't have any special instructions for it, as long as they support saturate. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir/algebraic: add fdot2 optimizationsJonathan Marek2019-07-241-0/+3
| | | | | | | | | Add simple fdot2 optimizations that are missing. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Thomas Helland <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir/algebraic: add option to lower fdphJonathan Marek2019-07-242-1/+6
| | | | | | | | For backends that don't have a 'fdph' instructions Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Thomas Helland <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: replace lower_sincos with algebraic optJonathan Marek2019-07-244-142/+12
| | | | | | | | This version has less ops for the same precision. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]> Acked-by: Matt Turner <[email protected]>
* nir/algebraic: allow swizzle in nir_algebraic replace expressionJonathan Marek2019-07-244-6/+22
| | | | | | | | This is to allow optimizations in nir_opt_algebraic not otherwise possible Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Acked-by: Matt Turner <[email protected]>
* nir,intel: lower if (cond) demote() to new intrinsic demote_if(cond)Daniel Schürmann2019-07-244-21/+35
| | | | | | | This will effectively enable the optimization in anv. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/lower_subgroups: Properly lower masks when subgroup_size == 0Jason Ekstrand2019-07-241-5/+11
| | | | | | | | | | | | Instead of building a constant mask (which depends on knowing the subgroup size), we build an expression. Because the pass uses the nir_shader_lower_instructions helper, subgroup lowering will be run on any newly emitted instructions as well as the previously existing instructions. In particular, if the subgroup size is known, the newly emitted subgroup_size intrinsic will get turned into a constant and a later constant folding pass will clean it up. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Add lowering for nir_op_irem and nir_op_imodSagar Ghuge2019-07-241-2/+16
| | | | | | | | | | Tested on Gen > 9. v2: 1) Fix lowering 2) Keep a consistent i/u order (Matt Turner) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir/lower_io: Return SSA defs from helpersJason Ekstrand2019-07-231-25/+42
| | | | | | | | I can't find a single place where nir_lower_io is called after going out of SSA which is the only real reason why you wouldn't do this. Returning SSA defs is more idiomatic and is required for the next commit. Reviewed-by: Matt Turner <[email protected]>
* nir/gather_info: Look for uses of helper invocationsJason Ekstrand2019-07-231-0/+19
| | | | | | | | | The one obvious omission here is gl_HelperInvocation itself. However, the spec doesn't require that we generate then when gl_HelperInvocation is used, it merely mandates that we report them if they are there. Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/gather_info: Move setting uses_64bit out of the switchJason Ekstrand2019-07-231-5/+6
| | | | | | | | | Otherwise, as we add things to the switch, we're going to forget and add some 64-bit op at some point in the future and it'll stop getting flagged. There's no reason why we can't do the check for derivatives. Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Add a nir_tex_instr_has_implicit_derivatives helperJason Ekstrand2019-07-232-11/+14
| | | | | Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Move nir_alu_instr_is_comparison to the ALU sectionJason Ekstrand2019-07-231-23/+23
| | | | | Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: use | instead of || operatorAndrii Simiklit2019-07-231-1/+1
| | | | | | | | | | | warning: use of logical '||' with constant operand note: use '|' for a bitwise operation Fixes: 758fdce9fee ("nir: Add some generic helpers for writing lowering passes") Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Signed-off-by: Andrii Simiklit <[email protected]>
* nir: don't return voidEric Engestrom2019-07-231-1/+2
| | | | | | Fixes: 14531d676b11999123c0 ("nir: make nir_const_value scalar") Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nir: Remove a bunch of large stack arraysJason Ekstrand2019-07-224-6/+15
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: Only rematerialize comparisons with all SSA sourcesJason Ekstrand2019-07-191-0/+15
| | | | | | | | | | | Otherwise, you may end up moving a register read and that could result in an incorrect shader. This commit fixes a rendering issue in Elite: Dangerous. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111152 Fixes: 3ee2e84c60 "nir: Rematerialize compare instructions" Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir: use a switch when printing intrinsic indicesCaio Marcelo de Oliveira Filho2019-07-191-8/+32
| | | | Reviewed-by: Eric Engestrom <[email protected]>
* nir/algebraic: mark a few comparison simplifications as preciseRhys Perry2019-07-191-2/+2
| | | | | | | | No vkpipeline-db changes found. Signed-off-by: Rhys Perry <[email protected]> Reveiewed-by: Alyssa Rosenzweig [email protected] Reviewed-by: Connor Abbott <[email protected]>
* nir/algebraic: optimize contradictory iand operandsRhys Perry2019-07-191-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | Some of these were found in a few GTAV, Rise of the Tomb Raider and Shadow of the Tomb Raider shaders. Results from vkpipeline-db run with ACO: Totals from affected shaders: SGPRS: 376 -> 376 (0.00 %) VGPRS: 220 -> 220 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 13492 -> 11560 (-14.32 %) bytes LDS: 6 -> 6 (0.00 %) blocks Max Waves: 69 -> 69 (0.00 %) Wait states: 0 -> 0 (0.00 %) v2: use False instead of 0 Signed-off-by: Rhys Perry <[email protected]> Reveiewed-by: Alyssa Rosenzweig [email protected] Reviewed-by: Connor Abbott <[email protected]>
* nir/lower_clip: add support for geometry shadersTimothy Arceri2019-07-192-0/+58
| | | | | | | This will be used to enabled compat profile support for geometry shaders. Reviewed-by: Kenneth Graunke <[email protected]>
* nir/lower_clip: add lower_clip_outputs() helperTimothy Arceri2019-07-191-42/+51
| | | | | | | This will be reused in the following patch to add support for clip vertex lowering in geometry shaders. Reviewed-by: Kenneth Graunke <[email protected]>
* nir/lower_clip: add create_clipdist_vars() helperTimothy Arceri2019-07-191-16/+18
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* nir/lower_clip: add a find_clipvertex_and_position_outputs() helperTimothy Arceri2019-07-191-24/+35
| | | | | | | | This will allow code sharing in a following patch that adds support for lowering in geometry shaders. It also allows us to exit early if there is no lowering to do which allows a small code tidy up. Reviewed-by: Kenneth Graunke <[email protected]>
* nir/large_constants: De-duplicate constantsCaio Marcelo de Oliveira Filho2019-07-181-21/+75
| | | | | | | | | | | | | | | | | | | | | If a function has a constant and is called more than once, after inlining we may end up with different variables representing the same constant. This commit look into the data and de-duplicate them. The first pass now will collect the constant data in a per variable buffer, then de-duplication happens (by sorting then linear walk), and the second pass will use the data in var->data.location. One side-effect of the current implementation is that constants will be reordered. If this turns out to be a problem is something that can be fixed. An alternative strategy considered was to perform this in a per-function basis and then merge the results, the problem is that we would have to fix up the offsets during the merge. Given the data we have, the current patch is good enough. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/large_constants: Use ralloc for var_infosCaio Marcelo de Oliveira Filho2019-07-181-3/+3
| | | | | | | | This will be used later on to allocate constant data for each variable (and then deduplicate). Also drop initializing found_read, as it is already implicitly false in the literal. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Allow internal changes to the instr in nir_shader_lower_instructions().Eric Anholt2019-07-182-1/+11
| | | | | | | | | v3d's NIR txf_ms lowering wants to swizzle around the input coordinates in NIR, but doesn't generate a new txf_ms instructions as replacement. It's pretty easy to allow that in nir_shader_lower_instructions, and it may be common in lowering passes. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add gl_PointCoord system valueAndreas Baierl2019-07-183-0/+6
| | | | | | | | | | gl_PointCoord handling needs some special bits set in lima/ppir code generation. Treating gl_PointCoord as a system value makes it easier to distinguish from a regular varying. Signed-off-by: Andreas Baierl <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir/lower_viewport: Check variable mode firstConnor Abbott2019-07-181-1/+2
| | | | | | | | | | The location is unused for shader_temp and function_temp variables, and due to the way we nir_lower_io_to_temproraries demotes shader_out variables to shader_temp variables, it happened to equal VARYING_SLOT_POS for the gl_Position temporary, which made this pass fail with the offline compiler due to this coming before vars_to_ssa. Reviewed-by: Qiang Yu <[email protected]>
* nir: add a V3D-specific intrinsic for per-sample color writesIago Toral Quiroga2019-07-181-0/+9
| | | | | | | | | | | For per-sample color writes we need the output intrinsic to pack the sample index, which is not provided with regular store_output intrinsics unless we figured out a way to encode it into the base or the offset. v2: - Drop the writemask (Eric) Reviewed-by: Eric Anholt <[email protected]>
* nir/large_constants: Use dominance information to find more constantsCaio Marcelo de Oliveira Filho2019-07-171-6/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Relax the restriction that all the writes need to be in the first block: now accept variables that have all the writes in the same block, and all the reads are dominated by that block. This let the pass identify large constants that are local to a helper function. The writes will be at the place that the function is inlined, possibly not in the first block (but still all in the same block). Results for vkpipeline-db in SKL: total instructions in shared programs: 3624891 -> 3623145 (-0.05%) instructions in affected programs: 79416 -> 77670 (-2.20%) helped: 16 HURT: 0 total cycles in shared programs: 1458149667 -> 1458147273 (<.01%) cycles in affected programs: 30154164 -> 30151770 (<.01%) helped: 14 HURT: 2 total loops in shared programs: 2437 -> 2437 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 8813 -> 8745 (-0.77%) spills in affected programs: 2894 -> 2826 (-2.35%) helped: 8 HURT: 0 total fills in shared programs: 23470 -> 23392 (-0.33%) fills in affected programs: 12248 -> 12170 (-0.64%) helped: 6 HURT: 2 LOST: 0 GAINED: 0 Results for shader-db in SKL with Iris: total instructions in shared programs: 15379442 -> 15379392 (<.01%) instructions in affected programs: 837 -> 787 (-5.97%) helped: 2 HURT: 2 helped stats (abs) min: 27 max: 27 x̄: 27.00 x̃: 27 helped stats (rel) min: 10.47% max: 10.67% x̄: 10.57% x̃: 10.57% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 1.23% max: 1.23% x̄: 1.23% x̃: 1.23% 95% mean confidence interval for instructions value: -39.14 14.14 95% mean confidence interval for instructions %-change: -15.51% 6.17% Inconclusive result (value mean confidence interval includes 0). total loops in shared programs: 4880 -> 4880 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 370677237 -> 370676567 (<.01%) cycles in affected programs: 17852 -> 17182 (-3.75%) helped: 2 HURT: 1 helped stats (abs) min: 338 max: 356 x̄: 347.00 x̃: 347 helped stats (rel) min: 13.98% max: 14.64% x̄: 14.31% x̃: 14.31% HURT stats (abs) min: 24 max: 24 x̄: 24.00 x̃: 24 HURT stats (rel) min: 0.18% max: 0.18% x̄: 0.18% x̃: 0.18% total spills in shared programs: 11772 -> 11772 (0.00%) spills in affected programs: 0 -> 0 helped: 0 HURT: 0 total fills in shared programs: 24948 -> 24948 (0.00%) fills in affected programs: 0 -> 0 helped: 0 HURT: 0 LOST: 0 GAINED: 0 Reviewed-by: Jason Ekstrand <[email protected]>
* nir/algebraic: Optimize comparisons and up-castsJason Ekstrand2019-07-171-0/+67
| | | | | | | | | | | | | | | | | | | | These seem like obvious enough optimizations in the world of multiple integer bit sizes. The only known thing which hits these at the moment is some Vulkan CTS tests for 16-bit SSBO values which like to up-cast and check for equality. However, it's something that's bound to come up as we start seeing more integers in shaders. The optimizations of comparisons of casted values with constants are something which we would ideally do with range analysis. However, lacking that, we can do it in opt_algebraic as long as one side is a constant. In dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13, this commit, along with the previous commit, reduce the number of instructions emitted on Skylake from 55328 to 44546, a reduction of 20%. Acked-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir/algebraic: Optimize comparing unpacked valuesJason Ekstrand2019-07-171-0/+8
| | | | | | | | | We could, in theory, add the same optimization for 64-bit unpack operations but that's likely to fight with 64-bit integer lowering on platforms which require it so it will require more infrastructure before that will be a good idea. Reviewed-by: Matt Turner <[email protected]>
* nir/algebraic: Print out the list of transforms in the C fileJason Ekstrand2019-07-171-0/+7
| | | | | | | | This helps greatly when debugging algebraic transform generators because you can now actually see the output and verify that your transforms are getting generated. Acked-by: Matt Turner <[email protected]>
* nir: Fix nir_lower_alu_to_scalar's instr filtering.Eric Anholt2019-07-171-1/+1
| | | | | | | | | | | | | It was checking if the dest or src[0] SSA values were vectors, rather than whether the ALU op was using the source as a vector resulting in a nir_fdot4 making it through to vc4 and v3d: vec1 32 ssa_6 = fdot4 ssa_4.xxxx, ssa_5 Fixes: c1cffa4249ca ("nir/alu_to_scalar: Use the new NIR lowering framework") v2: Use Jason's recommendation to look at input_sizes. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/regs_to_ssa: Handle regs in phi sources properlyJason Ekstrand2019-07-161-2/+32
| | | | | | | | | | | | | Sources of phi instructions act as if they occur at the very end of the predecessor block not the block in which the phi lives. In order to handle them correctly, we have to skip phi sources on the normal instruction walk and handle them as a separate walk over the successor phis. While registers in phi instructions is a bit of an oddity it can happen when we temporarily go out-of-SSA for control-flow manipulations. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111075 Cc: [email protected] Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir/lower_doubles: Handle fdiv and fsub directlyJason Ekstrand2019-07-162-2/+17
| | | | Reviewed-by: Eric Anholt <[email protected]>
* nir/lower_doubles: Use the new NIR lowering frameworkJason Ekstrand2019-07-161-72/+65
| | | | | | | One advantage of this is that we no longer need to run in a loop because the new framework handles lowering instructions added by lowering. Reviewed-by: Eric Anholt <[email protected]>
* nir/lower_doubles: Use "alu" for the nir_alu_instrJason Ekstrand2019-07-161-15/+15
| | | | Reviewed-by: Eric Anholt <[email protected]>
* nir/lower_int64: Use the core NIR lowering frameworkJason Ekstrand2019-07-161-74/+49
| | | | | | | One advantage of this is that we no longer need to run in a loop because the new framework handles lowering instructions added by lowering. Reviewed-by: Eric Anholt <[email protected]>
* nir/alu_to_scalar: Use the new NIR lowering frameworkJason Ekstrand2019-07-161-93/+54
| | | | Reviewed-by: Eric Anholt <[email protected]>
* nir/alu_to_scalar: Use "alu" as the name for the nir_alu_instrJason Ekstrand2019-07-161-50/+50
| | | | Reviewed-by: Eric Anholt <[email protected]>
* nir/lower_system_values: Support lowering more intrinsicsJason Ekstrand2019-07-161-87/+83
| | | | | | | | | | Instead of only lowering system from variables, lower most to intrinsics and let the lowering framework immediately lower the intrinsic. This will result in a bit more instruction churn but it means that NIR code builders can just use intrinsics instead of everything having to go through variables. Reviewed-by: Eric Anholt <[email protected]>
* nir/lower_system_values: Drop the context-aware builder functionsJason Ekstrand2019-07-161-97/+96
| | | | | | | | | | Instead of having context-aware builder functions, just provide lowering for the system value intrinsics and let nir_shader_lower_instructions handle the recursion for us. This makes everything a bit simpler and means that the lowering can also be used if something comes in as a system value intrinsic rather than a load_deref. Reviewed-by: Eric Anholt <[email protected]>
* nir/lower_system_values: Use the new generic NIR lowering helpersJason Ekstrand2019-07-161-96/+55
| | | | Reviewed-by: Eric Anholt <[email protected]>
* nir/lower_subgroups: Use the new generic NIR lowering helpersJason Ekstrand2019-07-161-45/+14
| | | | Reviewed-by: Eric Anholt <[email protected]>
* nir: Add some generic helpers for writing lowering passesJason Ekstrand2019-07-162-0/+192
| | | | Reviewed-by: Eric Anholt <[email protected]>