aboutsummaryrefslogtreecommitdiffstats
path: root/src/intel/compiler/brw_shader.cpp
Commit message (Collapse)AuthorAgeFilesLines
* intel: Use SATURATEAlyssa Rosenzweig2020-05-261-2/+2
| | | | | | Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5100>
* intel/ir: Use brw::performance object instead of CFG cycle counts for ↵Francisco Jerez2020-04-281-2/+5
| | | | | | | | | | | | codegen stats. These should be more accurate than the current cycle counts, since among other things they consider the effect of post-scheduling passes like the software scoreboard on TGL. In addition it will enable us to clean up some of the now redundant cycle-count estimation functionality in the instruction scheduler. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: CSEL can do saturateIan Romanick2020-04-171-0/+1
| | | | | Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4582>
* intel/fs: Allow multiple slots for positionCaio Marcelo de Oliveira Filho2020-04-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | Change brw_compute_vue_map() to also take the number of pos slots. If more than one slot is used, the VARYING_SLOT_POS is treated as an array. When using Primitive Replication, instead of a single position, the VUE must contain an array of positions. Padding might be necessary (after clip distance) to ensure rest of attributes start aligned. v2: Add note about array in the commit message and assert that pos_slots >= 1 to make clear 0 is invalid. (Jason) Move padding to be after the clip distance. v3: Apply the correct offset when gathering the sources from outputs. Reviewed-by: Jason Ekstrand <[email protected]> [v2] Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2313>
* intel/compiler: Pass shader_stats for each SIMD modeMatt Turner2020-03-091-3/+2
| | | | | | | | | | | | | | | | | | | | Passing shader_stats to the fs_generator constructor means that the SIMD8 shader stats from the visitor (such as the scheduler mode) will be reported out for the SIMD16/SIMD32 versions as well. As you can see, we are now passing 'shader_stats' and 'stats' to generate_code(), which is obviously odd looking. Ian rebased and committed an old patch of mine which added the shader_stats struct on July 30 in commit dabb5d4bee07 (i965/fs: Add a shader_stats struct.) and shortly after on August 12 Jason added the brw_compile_stats struct in commit 134607760ac2 (intel/compiler: Fill a compiler statistics struct). I'd like to combine the two, but I'm not sure how. shader_stats is an input to generate_code() while brw_compile_stats is an output and is only used by the Vulkan driver. Leave it as is for now... Reviewed-by: Ian Romanick <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4093>
* intel/compiler: Pass backend_shader * to cfg_t()Matt Turner2020-03-091-1/+1
| | | | | | | | As you can see, not having a pointer to the backend_shader from within the class makes for some weird looking code. Reviewed-by: Ian Romanick <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4093>
* intel/compiler: Mark some methods and parameters constMatt Turner2020-03-091-2/+2
| | | | | Reviewed-by: Ian Romanick <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4093>
* intel/compiler: Move idom tree calculation and related logic into analysis ↵Francisco Jerez2020-03-061-1/+2
| | | | | | | | | | | | | object This only does half of the work. The actual representation of the idom tree is left untouched, but the computation algorithm is moved into a separate analysis result class wrapped in a BRW_ANALYSIS object, along with the intersect() and dump_domtree() auxiliary functions in order to keep things tidy. Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
* intel/compiler: Drop invalidate_live_intervals()Francisco Jerez2020-03-061-2/+0
| | | | | Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
* intel/compiler: Introduce backend_shader method to propagate IR changes to ↵Francisco Jerez2020-03-061-0/+7
| | | | | | | | | | | | | | | | | analysis passes The invalidate_analysis() method knows what analysis passes there are in the back-end and calls their invalidate() method to report changes in the IR. For the moment it just calls invalidate_live_intervals() (which will eventually be fully replaced by this function) if anything changed. This makes all optimization passes invalidate DEPENDENCY_EVERYTHING, which is clearly far from ideal -- The dependency classes passed to invalidate_analysis() will be refined in a future commit. Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
* intel/fs: Add virtual instruction to load mask of live channels into flag ↵Francisco Jerez2020-02-141-0/+3
| | | | | | | register. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Cc: 20.0 <[email protected]>
* intel/compiler: Add names for SHADER_OPCODE_[IU]SUB_SATCaio Marcelo de Oliveira Filho2020-01-241-0/+4
| | | | | | | | Fixes: 58907568ec5 ("intel/fs: Add SHADER_OPCODE_[IU]SUB_SAT pseudo-ops") Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3558> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3558>
* intel/fs: Add FS_OPCODE_SCHEDULING_FENCECaio Marcelo de Oliveira Filho2020-01-211-0/+3
| | | | | | | | | | | | Like a SHADER_OPCODE_MEMORY_FENCE but doesn't doesn't generate any assembly code. Will be used when the compiler shouldn't reorder certain instructions but there's no need to generate code for the HW to do it -- as the ordering will be guaranteed by other means. Reviewed-by: Francisco Jerez <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>
* intel/fs: Add DWord scattered read/write opcodesJason Ekstrand2019-11-111-0/+6
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/ir/gen12: Add SYNC hardware instruction.Francisco Jerez2019-10-111-0/+1
| | | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/ir: Drop hard-coded correspondence between IR and HW opcodes.Francisco Jerez2019-10-111-1/+1
| | | | | | | | | | | | | Having the IR opcodes locked to their hardware representation is risky because it causes opcodes as different as BRC and IFF to compare equal at the IR level (luckily the back-end only ever uses one opcode from each group, right now), and it prevents us from supporting instructions that change their hardware representation across generations, which will become a problem on Gen12+ platforms. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs/generator: add new opcode to set float controls modes in control ↵Samuel Iglesias Gonsálvez2019-09-171-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | register Before this commit, we had only FPRoundingMode decoration (the per instruction one) that is applied during the SPIR-V handling. In vtn_alu we find out the rounding mode, and generate the code accordingly that later will be used to look for the respective nir_op_f2f16_{rtz,rtne}. Per-instruction gets prioritized because we make them explicit conversions (with RTZ or RTNE nir opcodes) and they will override the default execution mode defined with float controls. However, we need to come back to the mode defined by float controls after the execution of the FP Rounding instruction. Therefore, the new SHADER_OPCODE_FLOAT_CONTROL_MODE opcode will be used to set the default rounding mode and denorms treatment in the whole shader while the pre-existent SHADER_OPCODE_RND_MODE, will be used as prioritized rounding mode in a per-instruction basis. v2: - Fix bug in defining BRW_CR0_FP_MODE_MASK. v3: - Update comment (Caio). v4: - Split the patch into the helper and the new opcode (this one) (Caio). v5: - Add an explanation on the actual purpose and priority of the newly introduced opcode in the commit log (Caio). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: Drop the gl_program from fs_visitorJason Ekstrand2019-08-251-2/+1
| | | | | | | | | It's not used by anything anymore now that so much lowering has been moved into NIR. Sadly, we still need on in brw_compile_gs() for geometry shaders on Sandy Bridge. Short of a lot of pointless work, that one's probably not going away. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: Fill a compiler statistics structJason Ekstrand2019-08-121-2/+3
| | | | | | | | | This commit is all annoying plumbing work which just adds support for a new brw_compile_stats struct. This struct provides a binary driver readable form of the same statistics we dump out to stderr when we INTEL_DEBUG is set with a shader stage. Reviewed-by: Lionel Landwerlin <[email protected]>
* i965/fs: Add a shader_stats struct.Matt Turner2019-07-301-1/+1
| | | | | | | | | | It'll grow further, and we'd like to avoid adding an additional parameter to fs_generator() for each new piece of data. v2 (idr): Rebase on 17 months. Track a visitor instead of a cfg. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/compiler: Be more conservative about subgroup sizes in GLJason Ekstrand2019-07-241-1/+1
| | | | | | | | | | | The rules for gl_SubgroupSize in Vulkan require that it be a constant that can be queried through the API. However, all GL requires is that it's a uniform. Instead of always claiming that the subgroup size in the shader is 32 in GL like we have to do for Vulkan, claim 8 for geometry stages, the maximum for fragment shaders, and the actual size for compute. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/nir: Make brw_nir_apply_sampler_key more genericJason Ekstrand2019-07-241-1/+1
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Add a "base class" for program keysJason Ekstrand2019-07-101-2/+2
| | | | | | | | | Right now, all keys have two things in common: a program string ID and a sampler_prog_key_data. I'd like to add another thing or two and need a place to put it. This commit adds a new brw_base_prog_key struct which contains those two common bits. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/nir: Stop returning the shader from helpersJason Ekstrand2019-06-051-2/+2
| | | | | | | | Now that NIR_TEST_* doesn't swap the shader out from under us, it's sufficient to just modify the shader rather than having to return in case we're testing serialization or cloning. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Add an UNDEF instruction to avoid excess live rangesJason Ekstrand2019-06-041-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | With 8 and 16-bit types and anything where we have to use non-trivial strides registersto deal with restrictions, we end up with things that look like partial writes even though we don't care about any values in the register except those written by that instruction. This is particularly important when dealing with loops because liveness sees is_partial_write and the fact that an old version from a previous loop iteration may be valid at that point and extends all purely partially written values to the entire loop. This commit adds a new UNDEF instruction which does nothing (the generator doesn't emit anything) but which does a fake write to the register. This informs liveness that we don't care about any values before that point so it won't consider those registers to be falsely live. We can safely emit UNDEF instructions for all SSA values that come in from NIR and nearly all temporaries generated by various stages of the compiler. In particular, we need to insert UNDEF instructions when we handle region restrictions because the newly allocated registers are almost guaranteed to be partially written. No shader-db changes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110432 Reviewed-by: Matt Turner <[email protected]>
* anv: Implement VK_KHR_shader_atomic_int64Jason Ekstrand2019-04-191-0/+3
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: implement is_zero, is_one, is_negative_one for 8-bit/16-bitIago Toral Quiroga2019-04-181-0/+26
| | | | | | | | | | | | There are no 8-bit immediates, so assert in that case. 16-bit immediates are replicated in each word of a 32-bit immediate, so we only need to check the lower 16-bits. v2: - Fix is_zero with half-float to consider -0 as well (Jason). - Fix is_negative_one for word type. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/nir: Take a nir_tex_instr and src index in brw_texture_offsetJason Ekstrand2019-04-141-8/+16
| | | | | This makes things a bit simpler and it's also more robust because it no longer has a hard dependency on the offset being a 32-bit value.
* intel/common: move gen_debug to intel/devMark Janes2019-04-101-1/+1
| | | | | | | | | libintel_common depends on libintel_compiler, but it contains debug functionality that is needed by libintel_compiler. Break the circular dependency by moving gen_debug files to libintel_dev. Suggested-by: Kenneth Graunke <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: handle GLSL_TYPE_INTERFACE as GLSL_TYPE_STRUCTCaio Marcelo de Oliveira Filho2019-03-231-1/+1
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: Re-prefix non-logical surface opcodes with VEC4Jason Ekstrand2019-02-281-6/+6
| | | | | | | | The scalar back-end uses SHADER_OPCODE_SEND for all surface messages so we no longer need the non-logical opcodes there. Prefix them VEC4 so it's clear that they're only used by the vec4 back-end. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Drop unused surface opcodesJason Ekstrand2019-02-281-18/+0
| | | | | | | | | The unused typed surface read/write support in the vec4 back-end has been dropped and the fs back-end now uses SHADER_OPCODE_SEND for all image and buffer ops. There's no reason to keep these opcodes around anymore. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: Get rid of the IMAGE_SIZE opcodeJason Ekstrand2019-02-281-2/+0
| | | | | | | Since switching to SHADER_OPCODE_SEND for image operations, we no longer need the non-logical opcode. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: Implement nir_intrinsic_global_atomic_*Jason Ekstrand2019-02-011-0/+6
| | | | eviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Implement load/store_global with A64 untyped messagesJason Ekstrand2019-02-011-0/+12
| | | | eviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Use SHADER_OPCODE_SEND for varying UBO pulls on gen7+Jason Ekstrand2019-01-291-2/+0
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/fs: Use a logical opcode for IMAGE_SIZEJason Ekstrand2019-01-291-0/+2
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/fs: Add a generic SEND opcodeJason Ekstrand2019-01-291-0/+9
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/fs: Remove FS_OPCODE_UNPACK_HALF_2x16_SPLIT opcodes.Francisco Jerez2019-01-091-4/+0
| | | | | | | | These are broken on a future platform, but it turns out we don't need to fix them, since they're just type-converting moves with strided source. Kill them. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965: Do NIR shader cloning in the caller.Kenneth Graunke2018-11-201-2/+1
| | | | | | | | | | | | This moves nir_shader_clone() to the driver-specific compile function, rather than the shared src/intel/compiler code. This allows i965 to do key-specific passes before calling brw_compile_*. Vulkan should not need this cloning as it doesn't compile multiple variants. We do need to continue cloning in the compute shader code because we lower various things in NIR based on the SIMD width. Reviewed-by: Alejandro Piñeiro <[email protected]>
* intel: Use TXS for image_size when we have a typed surfaceJason Ekstrand2018-08-291-0/+3
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: Implement untyped atomic float min, max, and compare-swap ↵Ian Romanick2018-08-221-0/+6
| | | | | | | | | | dataport messages v2: Split changes to the message type field to another patch. Suggested by Caio. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Silence unused parameter warnings brw_nir.cIan Romanick2018-07-021-1/+1
| | | | | | | | | | | | | | src/intel/compiler/brw_nir.c: In function ‘brw_nir_lower_vue_outputs’: src/intel/compiler/brw_nir.c:464:32: warning: unused parameter ‘is_scalar’ [-Wunused-parameter] bool is_scalar) ^~~~~~~~~ src/intel/compiler/brw_nir.c: In function ‘lower_bit_size_callback’: src/intel/compiler/brw_nir.c:610:57: warning: unused parameter ‘data’ [-Wunused-parameter] lower_bit_size_callback(const nir_alu_instr *alu, void *data) ^~~~ Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: Get rid of MOV_DISPATCH_TO_FLAGSJason Ekstrand2018-06-281-2/+0
| | | | | | We can just emit the MOV in the two places where we use this. Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Emit LINE+MAC for LINTERP with unaligned coordinatesJason Ekstrand2018-06-281-1/+2
| | | | | | | | | | | | | | | | | | | | | On g4x through Sandy Bridge, src1 (the coordinates) of the PLN instruction is required to be an even register number. When it's odd (which can happen with SIMD32), we have to emit a LINE+MAC combination instead. Unfortunately, we can't just fall through to the gen4 case because the input registers are still set up for PLN which lays out the four src1 registers differently in SIMD16 than LINE. v2 (Jason Ekstrand): - Take advantage of both accumulators and emit LINE LINE MAC MAC (Based on a patch from Francisco Jerez) - Unify the gen4 and gen4x-6 cases using a loop v3 (Jason Ekstrand): - Don't unify gen4 with gen4x-6 as this turns out to be more fragile than first thought without reworking the gen4 barycentric coordinate layout. Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Mark LINTERP opcode as writing accumulator on platforms without PLNJason Ekstrand2018-06-281-1/+2
| | | | | | | | | | | | | | When we don't have PLN (gen4 and gen11+), we implement LINTERP as either LINE+MAC or a pair of MADs. In both cases, the accumulator is written by the first of the two instructions and read by the second. Even though the accumulator value isn't actually ever used from a logical instruction perspective, it is trashed so we need to make the scheduler aware. Otherwise, the scheduler could end up re-ordering instructions and putting a LINTERP between another an instruction which writes the accumulator and another which tries to use that result. Cc: [email protected] Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Remove program key argument from generator.Francisco Jerez2018-06-281-1/+1
| | | | | Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/fs: FS_OPCODE_REP_FB_WRITE has side effectsJason Ekstrand2018-06-281-0/+1
| | | | | | | It doesn't matter since we don't ever run replicated write shaders through the optimizer but it's good to be complete. Reviewed-by: Matt Turner <[email protected]>
* i965: Add ARB_fragment_shader_interlock support.Plamena Manolova2018-06-011-0/+4
| | | | | | | | | Adds suppport for ARB_fragment_shader_interlock. We achieve the interlock and fragment ordering by issuing a memory fence via sendc. Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* intel/fs: Replace the CINTERP opcode with a simple MOVFrancisco Jerez2018-05-291-5/+1
| | | | | | | | | | | | | The only reason it was it's own opcode was so that we could detect it and adjust the source register based on the payload setup. Now that we're using the ATTR file for FS inputs, there's no point in having a magic opcode for this. v2 (Jason Ekstrand): - Break the bit which removes the CINTERP opcode into its own patch Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>