summaryrefslogtreecommitdiffstats
path: root/src/freedreno/ir3
Commit message (Collapse)AuthorAgeFilesLines
* freedreno/ir3: turn on [iu]mul_highRob Clark2019-03-081-0/+4
| | | | | | | | | | Which also requires uadd_carry lowering Until recently this was lowered in glsl ir so it went unnoticed that we weren't lowering it. Fixes: 1d8994a63b5 glsl: [u/i]mulExtended optimization for GLSL Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: track register pressure in schedRob Clark2019-03-032-9/+90
| | | | | | | | | | | | | | | Not a perfect solution, and the "pressure" target is hard-coded. But it doesn't really seem to much in the common case, and avoids exploding register usage in dEQP ssbo tests. So this should serve as a stop-gap solution until I have time to re- write the scheduler. Hurts slightly in instruction count, but gains (reduces) slightly the register usage in shader-db. Fixes ~150 dEQP-GLES31.functional.ssbo.* that were failing due to RA fail. Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: add Sethi–Ullman numbering passRob Clark2019-03-037-4/+141
| | | | Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: include nopN in expanded instruction countRob Clark2019-03-031-1/+1
| | | | Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno: Fix a couple of warningsKristian H. Kristensen2019-02-281-1/+1
| | | | Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
* freedreno/ir3: gsampler2DMSArray fixesRob Clark2019-02-262-30/+36
| | | | | | | | | Array index should come before sample-id. And exclude all isam variants (which take integer texel coords) from adding of offset. Fixes dEQP-GLES31.functional.texture.multisample.samples_1.use_texture_*_2d_array Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3/a6xx: fix atomic shader outputsRob Clark2019-02-261-0/+8
| | | | | | | | | | We also need to put in the output mov. Possibly we could just fixup the output register to read it directly from the dummy, but that is more work and I guess dEQP is probably the only time you encounter this. Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.atomic_counter.const_literal_fragment Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/a6xx: vertex_id is not _zero_basedRob Clark2019-02-261-0/+23
| | | | | | Fixes dEQP-GLES31.functional.draw_base_vertex.draw_elements_base_vertex.builtin_variable.vertex_id Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3/a6xx: fix non-ssa atomic dstRob Clark2019-02-261-0/+5
| | | | | | | | | | We weren't propagating the array info for cases where result of atomic is array/reg. This can happen, for example, if result is part of a phi web lowered to regs. Fixes dEQP-GLES31.functional.ssbo.atomic.compswap.* Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: use nopN encoding when possibleRob Clark2019-02-263-6/+35
| | | | | | | | | | | | | Use the (nopN) encoding for slightly denser shaders.. this lets us fold nop instructions into the previous alu instruction in certain cases. Shouldn't change the # of cycles a shader takes to execute, but reduces the size. (ex: glmark2 refract goes from 168 to 116 instructions) Currently only enabled for a6xx, but I think we could enable this for a5xx and possibly a4xx. Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: don't hardcode wrmaskRob Clark2019-02-221-5/+6
| | | | | | | | | Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.const_literal.vertex.samplercubeshadow and few other similar tests that do multiple texture fetches into individual components of a packet output. Mostly works around the issue mentioned in ra_block_find_definers(). Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/a6xx: samplerBuffer fixesRob Clark2019-02-201-2/+11
| | | | | | | | | | | | | | Use the 'UNK31' bit (which should probably be called 'BUFFER') for samplerBuffer case, which increases the size of supported buffer texture beyond 2^15 elements. Also need to fix the 2nd coord injected to handle the tex instructions that take integer coords. Fixes dEQP-GLES31.functional.texture.texture_buffer.render.as_fragment_texture.buffer_size_131071 and similar Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3/a6xx: use ldib for ssbo readsRob Clark2019-02-201-24/+10
| | | | | | | | | | | | | ... instead of isam. It seems like when using isam, plus atomics, we can have the problem of old data being in the texture cache. Plus this way we don't have to load a component at a time. Note that blob still seems to use isam in some cases. I suppose it might be preferable in the case of loading a single component, when atomics are not in the picture (or that the ssbo does not need to otherwise be coherent). Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: sync instr/disasm and add ldib encodingRob Clark2019-02-204-14/+42
| | | | | | | | | | Resync disasm and instr header from envytools, and add ldib encoding. This replaces an opcode from a3xx which was never seen in practice, since that seemed easier than dealing with the same opc # meaning a different thing on a6xx. (Not really sure if 'sti' was actually a real thing, I think it was only seen in fuzzing.) Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3/a6xx: fix load_ssbo barrier type.Rob Clark2019-02-201-2/+2
| | | | | | | Silly copy/pasta bug, since load_image is actually the same instruction but different barrier class. Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: rename put_dst()Rob Clark2019-02-203-9/+9
| | | | | | | This was overlooked when it moved to ir3_context.c and ceased to be static.. Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno: fix crash w/ masked non-SSA dstRob Clark2019-02-201-0/+2
| | | | | | | | | Fixes dEQP-GLES3.functional.shaders.indexing.varying_array.vec3_dynamic_write_dynamic_loop_read regression. Fixes: c1a27ba9baf freedreno/ir3: HIGH reg w/a for a6xx Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: fix crash in compile fail caseRob Clark2019-02-201-1/+1
| | | | | | | The variant will be NULL if RA failed. Which isn't ideal, but at least lets not segfault and bring down the rest of the dEQP run with us. Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: fix legalize for vecN inputsRob Clark2019-02-202-0/+3
| | | | | | | | | The wrmask is handled in regmask_get()/regmask_set(), but it wasn't being propagated from SSA src to dst. So for example, an SSBO read value that is passed in as src2.y component to atomic op, wasn't getting the (sy) flag set. Causing lots of fail. Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: handle quirky atomic dst for a6xxRob Clark2019-02-181-3/+12
| | | | | | | | | | | | | The new encoding returns a value via the 2nd src. The legalize pass needs to be aware of this to set the correct needs_sy flag, otherwise we can, in cases where the atomic dst is not used, overwrite the register that hardware will asynchronously load result into without (sy) flag, so it gets clobbered by the atomic result. This fixes a whole lot of rando ssbo+atomic fails, like dEQP-GLES31.functional.ssbo.layout.single_basic_type.packed.highp_vec4. Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: HIGH reg w/a for a6xxRob Clark2019-02-164-3/+26
| | | | | | | | | | | | | It seems like some instructions (noticed this w/ cat3), cannot read HIGH regs.. cat1 (mov/cov) can, and possibly some/all of cat2. The blob seems to stick w/ an extra mov into low regs. So lets do the same. This fixes WGID on a6xx, which unsurprisingly is related to a lot of deqp compute fails. Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: add a6xx+ SSBO/image supportRob Clark2019-02-166-2/+483
| | | | Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: add a6xx instruction encodingRob Clark2019-02-161-0/+90
| | | | | | For the handful of instructions that use a new encoding. Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: add image/ssbo <-> ibo/tex mappingRob Clark2019-02-166-25/+141
| | | | | | | | | | Images and SSBOs don't map directly to the hw. They end up being part texture and part something else. Starting with a6xx, the hack used for a5xx to smash the image tex state into hw texture state starting from MAX counting down won't work, because we start using tex state also for SSBO read. Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: fix ncomp for _store_image() srcRob Clark2019-02-161-2/+3
| | | | Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: split out a4xx+ instructionsRob Clark2019-02-166-332/+393
| | | | | | | Note that image/ssbo support is currently only implemented for a5xx. But the instruction encoding is the same for a4xx. Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: split out image helpersRob Clark2019-02-165-183/+251
| | | | Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: fix varying packing vs. tex sharp edgeRob Clark2019-02-161-2/+30
| | | | | | | | We probably need to rethink how we detect which instruction first defines higher register classes. But for now, this at least fixes the symptom. Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno: Use the NIR lowering for isign.Eric Anholt2019-02-142-14/+1
| | | | | | | I think this will save an instruction and hopefully not increase any other costs (possibly the immediate -1 and 1?), but I haven't actually tested. Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
* mesa: add MESA_SHADER_KERNELKarol Herbst2019-01-211-0/+1
| | | | | | | | used for CL kernels Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* nir: rename nir_var_function to nir_var_function_tempKarol Herbst2019-01-191-1/+1
| | | | | | | | Signed-off-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* nir: rename global/local to private/function memoryKarol Herbst2019-01-081-1/+1
| | | | | | | | | | | | | | | | | | the naming is a bit confusing no matter how you look at it. Within SPIR-V "global" memory is memory accessible from all threads. glsl "global" memory normally refers to shader thread private memory declared at global scope. As we already use "shared" for memory shared across all thrads of a work group the solution where everybody could be happy with is to rename "global" to "private" and use "global" later for memory usually stored within system accessible memory (be it VRAM or system RAM if keeping SVM in mind). glsl "local" memory is memory only accessible within a function, while SPIR-V "local" memory is memory accessible within the same workgroup. v2: rename local to function as well v3: rename vtn_variable_mode_local as well Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* freedreno/ir3: don't treat all inputs/outputs as vec4Rob Clark2018-12-222-14/+38
| | | | | | | | | This was a hold-over from the early TGSI days, and mostly not needed with NIR. This avoids burning an entire 4 consecutive scalar regs for vec3 outputs, for example. Which fixes a few places that we were doing worse that we should on register usage. Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: fix fallout of extra assertRob Clark2018-12-211-1/+1
| | | | | | | | | | | | | | | Fixes the following crash that happened after d6110d4d The problem happens if we first compile a "vanilla" shader with nothing lowered in NIR, which perform the final lowering passes on so->shader-> nir (including nir_lower_locals_to_regs()), and then later we have compile a shader with some lowering. The second time through we would have already done nir_lower_locals_to_regs(). Arguably this was already a bug, just one we hadn't noticed yet. Fixes: d6110d4d547 intel/compiler: move nir_lower_bool_to_int32 before nir_lower_locals_to_regs Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: Handle GL_NONE in get_num_components_for_glformat()Eduardo Lima Mitev2018-12-191-3/+8
| | | | | | | | | | An earlier patch that introduced the function failed to handle the case where an image format layout qualifier is not specified, which is allowed on desktop GL profiles. In these cases, nir_variable's image format is GL_NONE, and we don't need to print a debug message for those. Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: Make imageStore use num components from image formatEduardo Lima Mitev2018-12-181-2/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | emit_intrinsic_store_image() is always using 4 components when collecting registers for the value. When image has less than 4 components (e.g, r32f, rg32i, etc) this results in extra mov instructions. This patch uses the actual number of components from the image format. For example, in a shader like: layout (r32f, binding=0) writeonly uniform imageBuffer u_image; ... void main(void) { ... imageStore (u_image, some_offset, vec4(1.0)); ... } instruction count is reduced in at least 3 instructions (note image format is r32f, 1 component only). This obviously reduces register pressure as well. v2: - Added support for image formats from NV_image_format extension (Ilia Mirkin). - Return 4 components by default instead of asserting. (Rob Clark). v3: Added more missing formats (Ilia Mirkin). v4: Added a debug message for unknown image formats (Rob Clark). Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Rob Clark <robdclark@gmail.com>
* nir/opt_peephole_select: Don't peephole_select expensive math instructionsIan Romanick2018-12-171-1/+1
| | | | | | | | | | | | | | | | On some GPUs, especially older Intel GPUs, some math instructions are very expensive. On those architectures, don't reduce flow control to a csel if one of the branches contains one of these expensive math instructions. This prevents a bunch of cycle count regressions on pre-Gen6 platforms with a later patch (intel/compiler: More peephole select for pre-Gen6). v2: Remove stray #if block. Noticed by Thomas. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
* nir/opt_peephole_select: Don't try to remove flow control around indirect loadsIan Romanick2018-12-171-1/+1
| | | | | | | | | | | | | | | | | | | That flow control may be trying to avoid invalid loads. On at least some platforms, those loads can also be expensive. No shader-db changes on any Intel platform (even with the later patch "intel/compiler: More peephole select"). v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested by Rob. See also the big comment in src/intel/compiler/brw_nir.c. v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from nir_lower_io_arrays_to_elements.c). v4: Fix inverted condition in brw_nir.c. Noticed by Lionel. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
* nir: Add a bool to int32 lowering passJason Ekstrand2018-12-161-0/+1
| | | | | | | | We also enable it in all of the NIR drivers. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* nir: Rename Boolean-related opcodes to include 32 in the nameJason Ekstrand2018-12-161-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a squash of a bunch of individual changes: nir/builder: Generate 32-bit bool opcodes transparently nir/algebraic: Remap Boolean opcodes to the 32-bit variant Use 32-bit opcodes in the NIR producers and optimizations Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c Use 32-bit opcodes in the NIR back-ends Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* freedreno/ir3: don't remove unused input componentsRob Clark2018-12-131-1/+7
| | | | | Fixes: 0d240c22141 freedreno/ir3: don't fetch unused tex components Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: fix crashRob Clark2018-12-131-14/+8
| | | | | | | Fixes a crash in dEQP-GLES3.functional.shaders.fragdepth.compare.fragcoord_z Fixes: 0d240c22141 freedreno/ir3: don't fetch unused tex components Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno: debug GEM obj namesRob Clark2018-12-131-1/+3
| | | | | | | With a recent enough kernel, set debug names for GEM BOs, which will show up in $debugfs/gem Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: track max flow control depth for a5xx/a6xxRob Clark2018-12-073-0/+33
| | | | | | Rather than just hard-coding BRANCHSTACK size. Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: code-motionRob Clark2018-12-075-838/+940
| | | | | | | Split up ir3_compiler_nir.c a bit before starting to add new stuff for a6xx SSBO/image instructions. Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: sync instr/disasmRob Clark2018-12-073-24/+131
| | | | Signed-off-by: Rob Clark <robdclark@gmail.com>
* freedreno/ir3: don't fetch unused tex componentsRob Clark2018-12-072-0/+29
| | | | | | | Detect when a component of an (for example) texture fetch is unused and propagate the updated wrmask back to the parent instruction. Signed-off-by: Rob Clark <robdclark@gmail.com>
* nir: Make boolean conversions sized just like the othersJason Ekstrand2018-12-051-4/+7
| | | | | | | | | Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is one if 8, 16, 32, or 64. This leads to having a few more opcodes but now everything is consistent and booleans aren't a weird special case anymore. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
* freedreno: move ir3 to common locationRob Clark2018-11-2721-0/+13688
Move (most of) the ir3 compiler to src/freedreno/ir3 so that it can be re-used by some future vulkan driver. The parts that are gallium specific have been refactored out and remain in the gallium driver. Getting the move done now so that it can happen before further refactoring to support a6xx specific instructions. NOTE also removes ir3_cmdline compiler tool from autotools build since that was easier than fixing it and I normally use meson build. Waiting patiently for the day that we can remove *everything* from the autotools build. Signed-off-by: Rob Clark <robdclark@gmail.com>