summaryrefslogtreecommitdiffstats
path: root/src/compiler/nir
Commit message (Collapse)AuthorAgeFilesLines
* nir: Support deref instructions in lower_var_copiesJason Ekstrand2018-06-223-5/+143
| | | | | | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Add a deref path helper structJason Ekstrand2018-06-223-0/+113
| | | | | | | | | | | | | | | | | | This commit introduces a new nir_deref.h header for helpers that are less common and really only needed by a few heavy-duty passes. In this header is a new struct for representing a full deref path which can be walked in either direction. v2 (Jason Ekstrand): - Assert that deref != NULL (Caio) - Fill _short_path with 0xdeadbeef in debug builds when not used (Caio) - Make nir_deref_path a typedef (Rob) Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Support deref instructions in lower_io_to_temporariesJason Ekstrand2018-06-221-0/+2
| | | | | | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Support deref instructions in lower_global_vars_to_localJason Ekstrand2018-06-221-23/+42
| | | | | | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Add a pass for fixing deref modesJason Ekstrand2018-06-222-0/+32
| | | | | | | | | | | This will be needed by anything which changes variable modes without rewriting derefs. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Support deref instructions in remove_dead_variablesJason Ekstrand2018-06-221-2/+92
| | | | | | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: add deref lowering sanity checkingRob Clark2018-06-2232-0/+78
| | | | | | | | | | | | | This will be removed at the end of the transition, but add some tracking plus asserts to help ensure that lowering passes are called at the correct point (pre or post deref instruction lowering) as passes are converted and the point where lower_deref_instrs() is called is moved. Signed-off-by: Rob Clark <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/deref: Add some deref cleanup functionsJason Ekstrand2018-06-222-0/+57
| | | | | | | | | | | Sometimes it's useful for a pass to be able to clean up its own derefs instead of waiting for DCE. This little helper makes it very easy. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Add helpers for working with deref instructionsJason Ekstrand2018-06-224-0/+362
| | | | | | | | | | | This commit adds a pass for lowering deref instructions to deref chains as well as some smaller helpers to ease the transition. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Add deref sources to texture instructionsJason Ekstrand2018-06-222-0/+8
| | | | | | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Add _deref versions of all of the _var intrinsicsJason Ekstrand2018-06-224-1/+144
| | | | | | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/builder: Add deref building helpersJason Ekstrand2018-06-221-0/+106
| | | | | | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Add a deref instruction typeJason Ekstrand2018-06-229-13/+580
| | | | | | | | | | | | This commit adds a new instruction type to NIR for handling derefs. Nothing uses it yet but this adds the data structure as well as all of the code to validate, print, clone, and [de]serialize them. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/validate: Rework intrinsic type validationJason Ekstrand2018-06-221-31/+31
| | | | | | | | | | | | | This moves the switch statement for specific intrinsics above source and destination validation. We also rework the source and destination validation to use different bit_size values for each source and/or destination. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Acked-by: Rob Clark <[email protected]> Acked-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Add explicit_binding to nir_variableNeil Roberts2018-06-211-0/+5
| | | | | | | This is copied from the corresponding value in ir_variable. The intention is to eventually use it in a pure-NIR linker. Reviewed-by: Timothy Arceri <[email protected]>
* nir: add pass to move load_constRob Clark2018-06-193-0/+143
| | | | | | | | | | | | Run this pass late (after opt loop) to move load_const instructions back into the basic blocks which use the result, in cases where a load_const is only consumed in a single block. This helps reduce register usage in cases where the backend driver cannot lower the load_const to a uniform. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir: add comment for loop_unroll passRob Clark2018-06-191-0/+4
| | | | | | | | Save the next person from digging through the code to figure out what the indirect_mask parameter actually does. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* nir: Document a couple instances of parent_instrIan Romanick2018-06-151-0/+2
| | | | | | | | | | | | | | nir_ssa_def::parent_instr and nir_src::parent_instr have the same name, but they mean really different things. I choose to save the next person the hour+ that I just spent figuring that out. Even now that I know, I doubt I'd notice in code review that someone typed foo->parent_instr when they actually meant foo->ssa->parent_instr. v2: Minor wording tweak in nir_ssa_def::parent_instr. Suggested by Jason. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add global invocation id intrinsic.Plamena Manolova2018-06-072-0/+5
| | | | | | | | Add the missing nir intrinsic for the gl_GlobalInvocationID compute shader variable. Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* nir: add opt_if_loop_terminator()Timothy Arceri2018-06-071-0/+68
| | | | | | | | | | | | | | | | | | | | | | | | This pass detects potential loop terminators and moves intructions from the non breaking branch after the if-statement. This enables both the new opt_if_simplification() pass and loop unrolling to potentially progress further. Unexpectedly this change speed up shader-db run times by ~3% Ivy Bridge shader-db results (all changes in dolphin/ubershaders): total instructions in shared programs: 9995662 -> 9995338 (-0.00%) instructions in affected programs: 87845 -> 87521 (-0.37%) helped: 27 HURT: 0 total cycles in shared programs: 230931495 -> 230925015 (-0.00%) cycles in affected programs: 56391385 -> 56384905 (-0.01%) helped: 27 HURT: 0 Reviewed-by: Ian Romanick <[email protected]>
* nir: move ends_in_break() helper to nir_loop_analyze.hTimothy Arceri2018-06-072-13/+13
| | | | | | | We will use the helper while simplifying potential loop terminators in the following patch. Reviewed-by: Ian Romanick <[email protected]>
* nir: Look into uniform structs for samplers when counting num_textures.Eric Anholt2018-06-061-12/+44
| | | | | | | | | | | | | | mesa/st decides whether to update samplers after a program change based on whether num_textures is nonzero. By not counting samplers in a uniform struct, we would segfault in KHR-GLES3.shaders.struct.uniform.sampler_vertex if it was run in the same context after a non-vertex-shader-uniform testcase (as is the case during a full conformance run). v2: Implement using two separate pure functions instead of updating pointers. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add lowering for nir_op_bit_count.Eric Anholt2018-06-062-0/+38
| | | | | | | | | This is basically the same as the GLSL lowering path. v2: Fix typo in the link Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir: Add lowering for nir_op_bitfield_reverse.Eric Anholt2018-06-062-1/+48
| | | | | | | This is basically the same as the GLSL lowering path. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir: Add an ALU lowering pass for mul_high.Eric Anholt2018-06-063-0/+169
| | | | | | | | This is based on the glsl/lower_instructions.cpp implementation, but should be much more readable. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir: Add lowering for find_lsb.Eric Anholt2018-06-062-0/+6
| | | | | | | There is a fairly simple relation to turn this into ufind_msb. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir: Add lowering for ifind_msb to ufind_msb.Eric Anholt2018-06-062-0/+6
| | | | | | | | ufind_msb is easily expressed in terms of clz, and we can reduce ifind_msb to that. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir: Add lowering from ibitfield_extract/ubitfield_extract to shifts.Eric Anholt2018-06-062-0/+19
| | | | | | | | V3D doesn't have opcodes for ibfe/ubfe, so we need to lower similarly to glsl/lower_instructions.cpp. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir: Add lowering for bitfieldInsert without using bfi.Eric Anholt2018-06-062-0/+19
| | | | | | | | | | | If you don't have HW to do bfi, then lowering bitfieldInsert to bfi makes things harder than keeping the "bits" argument around. This still uses bfm, but I've added the obvious lowering of bfm if you need it. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir: implement the GLSL equivalent of if simplication in nir_opt_ifSamuel Pitoiset2018-06-041-5/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This pass turns: if (cond) { } else { do_work(); } into: if (!cond) { do_work(); } else { } Here's the vkpipeline-db stats (from affected shaders) on Polaris10: Totals from affected shaders: SGPRS: 17272 -> 17296 (0.14 %) VGPRS: 18712 -> 18740 (0.15 %) Spilled SGPRs: 1179 -> 1142 (-3.14 %) Code Size: 1503364 -> 1515176 (0.79 %) bytes Max Waves: 916 -> 911 (-0.55 %) This pass only affects Serious Sam 2017 (Vulkan) on my side. The stats are not really good for now. Some shaders look quite dumb but this will be improved with further NIR passes, like ifs combination. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir: make is_comparison() a non-static helper functionSamuel Pitoiset2018-06-042-25/+25
| | | | | | | | | Rename and change the prototype for consistency regarding nir_tex_instr_is_query(). This function will be used in the following patch. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir: use num_components wrappers in print/validate.Dave Airlie2018-06-042-15/+5
| | | | | | These wrappers were introduces, so start using them. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Lower !f2b(x) to x == 0.0Ian Romanick2018-06-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | Some trivial help now, but it also prevents ~40 regressions caused by Samuel's "nir: implement the GLSL equivalent of if simplication in nir_opt_if" patch. All Gen4+ platforms had similar results. (Skylake shown) total instructions in shared programs: 14369557 -> 14369555 (<.01%) instructions in affected programs: 442 -> 440 (-0.45%) helped: 2 HURT: 0 total cycles in shared programs: 532425772 -> 532425743 (<.01%) cycles in affected programs: 6086 -> 6057 (-0.48%) helped: 2 HURT: 0 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* nir: Add some missing "optimization undo" patternsIan Romanick2018-06-011-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | d8d18516b0a and 03fb13f6467 added some patterns to undo conversions like (('ior', ('flt', a, b), ('flt', a, c)), ('flt', a, ('fmax', b, c))) If further optimization cause some of the operands to either be the same or be constants, undoing the transformation can lead to further savings. I don't know why these patterns were not added in those patches. I did not check to see which specific patterns actually helped. I just added all of them for symmetry. This prevents some loop unrolling regressions Plane Shift caused by Samuel's "nir: implement the GLSL equivalent of if simplication in nir_opt_if" patch. Skylake and Broadwell had similar results. (Skylake shown) total instructions in shared programs: 14369768 -> 14369557 (<.01%) instructions in affected programs: 44076 -> 43865 (-0.48%) helped: 141 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.50 x̃: 1 helped stats (rel) min: 0.07% max: 1.52% x̄: 0.66% x̃: 0.60% 95% mean confidence interval for instructions value: -1.67 -1.32 95% mean confidence interval for instructions %-change: -0.72% -0.59% Instructions are helped. total cycles in shared programs: 532430629 -> 532425772 (<.01%) cycles in affected programs: 1170832 -> 1165975 (-0.41%) helped: 101 HURT: 5 helped stats (abs) min: 1 max: 160 x̄: 48.54 x̃: 32 helped stats (rel) min: <.01% max: 8.49% x̄: 2.76% x̃: 2.03% HURT stats (abs) min: 2 max: 22 x̄: 9.20 x̃: 4 HURT stats (rel) min: <.01% max: 0.05% x̄: 0.02% x̃: <.01% 95% mean confidence interval for cycles value: -53.64 -38.00 95% mean confidence interval for cycles %-change: -3.06% -2.20% Cycles are helped. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* mesa: Add GL/GLSL plumbing for ARB_fragment_shader_interlock.Plamena Manolova2018-06-011-0/+2
| | | | | | | | | | | | | This extension provides new GLSL built-in functions beginInvocationInterlockARB() and endInvocationInterlockARB() that delimit a critical section of fragment shader code. For pairs of shader invocations with "overlapping" coverage in a given pixel, the OpenGL implementation will guarantee that the critical section of the fragment shader will be executed for only one fragment at a time. Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* nir: optimize iand(ieq(a, 0), ieq(b, 0)) to ieq(ior(a, b), 0)Samuel Pitoiset2018-05-311-0/+2
| | | | | | | | | | | | | | Totals from affected shaders: SGPRS: 80 -> 80 (0.00 %) VGPRS: 48 -> 48 (0.00 %) Code Size: 2120 -> 2096 (-1.13 %) bytes Max Waves: 16 -> 16 (0.00 %) Only two Rise of Tomb Raider shaders are affected on my side. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* nir: add unsigned comparison simplificationsTimothy Arceri2018-05-301-0/+2
| | | | | | | This avoids loop unrolling regressions in Wolfenstein II on DXVK with an upcoming optimisation series from Samuel. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir/print: fix printing of 8/16 bit constant variablesKarol Herbst2018-05-291-0/+31
| | | | | | | v2 (Jose Maria Casanova Crespo <[email protected]>): add float16 support Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jose Maria Casanova Crespo <[email protected]>
* nir: Implement optional b2f->iand loweringAlyssa Rosenzweig2018-05-182-1/+7
| | | | | | | | | | | | | | | | | | This pass is required by the Midgard compiler; our instruction set uses NIR-style booleans (~0 for true) but lacks a dedicated b2f instruction. Normally, this lowering pass would be implemented in a backend-specific algebraic pass, but this conflicts with the existing iand->b2f pass in nir_opt_algebraic.py, hanging the compiler. This patch thus makes the existing pass optional (default on -- all other backends should remain unaffected), adding an optional pass for lowering the opposite direction. v2: Defer lowering until late algebraic optimisations to allow optimising the b2f instruction itself. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* anv,nir: add generated files to .gitignore(s)Rhys Perry2018-05-121-0/+2
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* nir/format_convert: Add code for bitcasting vectorsJason Ekstrand2018-05-091-0/+53
| | | | | | | This is a fairly direct port from blorp. The only real change is that the nir_format_convert version doesn't assume that everything is a vec4. Reviewed-by: Topi Pohjolainen <[email protected]>
* nir/format_convert: Add a function to pack RGB9_E5 formatsJason Ekstrand2018-05-091-0/+64
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* nir/format_convert: Add pack/unpack for R11F_G11F_B10FJason Ekstrand2018-05-091-0/+38
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* nir/format_convert: Add linear <-> sRGB helpersJason Ekstrand2018-05-091-0/+26
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* nir: Add the start of a format conversion helper headerJason Ekstrand2018-05-092-0/+107
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* nir: Transform discard_if(true) into discardMatt Turner2018-05-071-1/+16
| | | | | | | | | | | | | | | | Noticed while reviewing Tim Arceri's NIR inlining series. Without his series: instructions in affected programs: 16 -> 14 (-12.50%) helped: 2 With his series: instructions in affected programs: 196 -> 174 (-11.22%) helped: 22 Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/lower_64bit_packing: rename the pass to be more genericIago Toral Quiroga2018-05-033-5/+5
| | | | | | It can do 32-bit packing too now. Reviewed-by: Jason Ekstrand <[email protected]>
* nir/lower_64bit_packing: extend the pass to handle packing from / to 16-bit.Iago Toral Quiroga2018-05-031-5/+59
| | | | | | | With 16-bit support we can now do 32-bit packing, a follow-up patch will rename the pass to something more generic. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: add opcodes for 16-bit packing and unpackingIago Toral Quiroga2018-05-031-0/+19
| | | | | | | | | | Noitice that we don't need 'split' versions of the 64-bit to / from 16-bit opcodes which we require during pack lowering to implement these operations. This is because these operations can be expressed as a collection of 32-bit from / to 16-bit and 64-bit to / from 32-bit operations, so we don't need new opcodes specifically for them. Reviewed-by: Jason Ekstrand <[email protected]>
* compiler/nir: add a lowering pass to convert the bit size of ALU operationsIago Toral Quiroga2018-05-033-0/+133
| | | | | | | | | | | | | | | | Not all bit-sizes may be supported natively in hardware for all operations. This pass allows drivers to lower such operations to a bit-size that is actually supported and then converts the result back to the original bit-size. Compiler backends control which operations and wich bit-sizes require the lowering through a callback function. v2: generalize this pass and make it available in NIR core (Rob, Jason) v3: remove some temporaries and reduce nesting in instruction loop using a continue statement (Jason) Reviewed-by: Jason Ekstrand <[email protected]>