aboutsummaryrefslogtreecommitdiffstats
path: root/src/broadcom/compiler
Commit message (Collapse)AuthorAgeFilesLines
* v3d: Add support for gl_HelperInvocation.Eric Anholt2018-12-301-0/+8
| | | | | | We can just look at the MSF flags -- if they're unset, then we're definitely in a helper invocation. Fixes dEQP-GLES31.functional.shaders.helper_invocation.* with GLES3.1 enabled.
* v3d: Add support for textureSize() on MSAA textures.Eric Anholt2018-12-301-0/+1
| | | | | | Fixes failures in dEQP-GLES31.functional.shaders.builtin_functions.texture_size.samples_1_texture_2d in the GLES3.1 suite.
* v3d: Add support for non-constant texture offsets.Eric Anholt2018-12-301-8/+24
| | | | | | Fixes dEQP-GLES31.functional.texture.gather.offset_dynamic.min_required_offset.2d.rgba8.size_pot.clamp_to_edge_repeat and others.
* v3d: Force sampling from base level for tg4.Eric Anholt2018-12-301-3/+3
| | | | | | This is what the GLSL ES 310 spec tells us to do, but apparently the "gather mode" flag doesn't imply it in the HW. Fixes dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.filter_mode.min_nearest_mipmap_linear_mag_linear
* v3d: Add a note for a potential performance win on multop/umul24.Eric Anholt2018-12-301-0/+4
| | | | Noticed while debugging a testcase.
* v3d: Dead-code eliminate unused flags updates.Eric Anholt2018-12-301-4/+42
| | | | | | | | | The greedy comparison folding in bcsel means that we may have left the original bool-generating NIR ALU instruction dead, but DCE wasn't eliminating the VIR code for it because of the flags updates. total instructions in shared programs: 5186024 -> 5100894 (-1.64%) instructions in affected programs: 1448695 -> 1363565 (-5.88%)
* v3d: Don't generate temps for comparisons.Eric Anholt2018-12-301-12/+14
| | | | | This was just generated work for vir_opt_dead_code and cluttered up the dumps.
* v3d: Move "does this instruction have flags" from sched to generic helpers.Eric Anholt2018-12-304-55/+5
| | | | I wanted to reuse it for DCE of flags updates.
* v3d: Drop incorrect dependency for flpop.Eric Anholt2018-12-301-4/+0
| | | | | It is just shifting probably-means-flags bits out of a value, it doesn't actually update the flags on its own.
* v3d: Drop unused count_nir_instrs() helper.Eric Anholt2018-12-301-18/+0
| | | | | This was for shader-db, but I haven't cared about NIR instruction counts in a long time.
* v3d: Hook up some shader-db output to GL_ARB_debug_output.Eric Anholt2018-12-303-2/+43
| | | | | | | This allows the original shader-db project's run.c runner to parse things easily, and is probably a good thing to have for GL_ARB_debug_output in general. I formatted it more like Intel's so I can mostly reuse their report script.
* v3d: Fix uniform pretty printing assertion failure with branches.Eric Anholt2018-12-291-0/+3
| | | | Fixes: 248a7fb392ba ("v3d: Do uniform pretty-printing in the QPU dump.")
* v3d: Drop shadow comparison state from shader variant key.Eric Anholt2018-12-201-2/+0
| | | | The shadow state is now in the sampler.
* nir/opt_peephole_select: Don't peephole_select expensive math instructionsIan Romanick2018-12-171-1/+1
| | | | | | | | | | | | | | | | On some GPUs, especially older Intel GPUs, some math instructions are very expensive. On those architectures, don't reduce flow control to a csel if one of the branches contains one of these expensive math instructions. This prevents a bunch of cycle count regressions on pre-Gen6 platforms with a later patch (intel/compiler: More peephole select for pre-Gen6). v2: Remove stray #if block. Noticed by Thomas. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
* nir/opt_peephole_select: Don't try to remove flow control around indirect loadsIan Romanick2018-12-171-1/+1
| | | | | | | | | | | | | | | | | | | That flow control may be trying to avoid invalid loads. On at least some platforms, those loads can also be expensive. No shader-db changes on any Intel platform (even with the later patch "intel/compiler: More peephole select"). v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested by Rob. See also the big comment in src/intel/compiler/brw_nir.c. v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from nir_lower_io_arrays_to_elements.c). v4: Fix inverted condition in brw_nir.c. Noticed by Lionel. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
* v3d: Fix the argument type for vir_BRANCH().Eric Anholt2018-12-171-1/+1
| | | | | Apparently this has been spewing warnings for Jason's clang, but not my gcc.
* nir: Add a bool to int32 lowering passJason Ekstrand2018-12-161-0/+2
| | | | | | | | We also enable it in all of the NIR drivers. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* nir: Rename Boolean-related opcodes to include 32 in the nameJason Ekstrand2018-12-161-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a squash of a bunch of individual changes: nir/builder: Generate 32-bit bool opcodes transparently nir/algebraic: Remap Boolean opcodes to the 32-bit variant Use 32-bit opcodes in the NIR producers and optimizations Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c Use 32-bit opcodes in the NIR back-ends Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* v3d: Use the original bit size when scalarizing uniform loads.Eric Anholt2018-12-161-1/+2
| | | | | | Prevents a regression in jekstrand's 1-bit series. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* v3d: Drop in a bunch of notes about performance improvement opportunities.Eric Anholt2018-12-143-1/+61
| | | | | | These have all been floating in my head, and while I've thought about encoding them in issues on gitlab once they're enabled, they also make sense to just have in the area of the code you'll need to work in.
* v3d: Do uniform pretty-printing in the QPU dump.Eric Anholt2018-12-141-1/+47
| | | | | If you're trying to trace what's going on in a QPU dump, this will definitely help you find your way.
* v3d: Move uniform pretty-printing to its own helper function.Eric Anholt2018-12-142-71/+77
| | | | I want to reuse it in the QPU dump.
* v3d: Avoid assertion failures when removing end-of-shader instructions.Eric Anholt2018-12-141-0/+6
| | | | | | | | | After generating VIR, we leave c->cursor pointing at the end of the shader. If the shader had dead code at the end (for example from preamble instructions in a shader with no side effects), we would assertion fail that we were leaving the cursor pointing at freed memory. Since anything following DCE should be setting up a new cursor anyway, just clear the cursor at the start.
* v3d: Make sure that a thrsw doesn't split a multop from its umul24.Eric Anholt2018-12-141-0/+1
| | | | | | | The thrsw will invalidate rtop, just like accumulators and flags. Caught by simulator assertions in CS imulextended/umulextended tests. Fixes: 90269ba35333 ("broadcom/vc5: Use THRSW to enable multi-threaded shaders.")
* v3d: Fix a leak of the disassembled instruction string during debug dumps.Eric Anholt2018-12-071-0/+1
| | | | Fixes: ade416d02369 ("broadcom: Add VC5 NIR compiler.")
* v3d: Add VIR dumping of TMU config p0/p1.Eric Anholt2018-12-072-0/+27
| | | | I had a bit of it for V3D 3.x, but didn't update it for 4.x.
* v3d: Simplify VIR uniform dumping using a temporary.Eric Anholt2018-12-071-19/+10
|
* v3d: Garbage collect unused uniforms code.Eric Anholt2018-12-071-2/+0
|
* v3d: Return the right gl_SampleMaskIn[] value.Eric Anholt2018-12-072-3/+1
| | | | | It's supposed to be the dispatched sample mask for this pixel, not the GL state's sample mask.
* v3d: Fix a comment typoEric Anholt2018-12-071-1/+1
|
* v3d: Convert to using nir_src_as_uint() from const_value derefs.Eric Anholt2018-12-071-14/+10
| | | | | Follows 16870de8a0aa ("nir: Use nir_src_is_const and nir_src_as_* in core code") to clean up v3d.
* v3d: Use combined input/output segments.Eric Anholt2018-12-073-1/+34
| | | | | | | The HW apparently has some issues (or at least a much more complicated VCM calculation) with non-combined segments, and the closed source driver also uses combined I/O. Until I get the last CTS failure resolved (which does look plausibly like some VPM stomping), let's use combined I/O too.
* nir: Make boolean conversions sized just like the othersJason Ekstrand2018-12-051-4/+4
| | | | | | | | | Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is one if 8, 16, 32, or 64. This leads to having a few more opcodes but now everything is consistent and booleans aren't a weird special case anymore. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
* nir: Make nir_lower_clip_vs optionally work with variables.Kenneth Graunke2018-11-191-1/+2
| | | | | | | | | | | | The way nir_lower_clip_vs() works with store_output intrinsics makes a ton of assumptions about the driver_location field. In i965 and iris, I'd rather do this lowering early and work with variables. v3d may want to switch to that as well, and ir3 could too, but I'm not sure exactly what would need updating. For now, handle both methods. Reviewed-by: Eric Anholt <eric@anholt.net>
* v3d: Don't try to set PF flags on a LDTMU operationEric Anholt2018-11-151-0/+6
| | | | | We need an ALU op in order to set PF. Fixes a recent assertion failure in dEQP-GLES3.functional.ubo.single_basic_type.shared.bool_vertex
* v3d: Update the TLB config for depth writes on V3D 4.2.Eric Anholt2018-11-011-8/+22
| | | | Fixes 311 piglit cases on the simulator.
* v3d: Use nir_remove_unused_io_vars to handle binner shader output DCEEric Anholt2018-10-302-45/+13
| | | | | | We were doing this late after nir_lower_io, but we can just reuse the core code. By doing it at this stage, we won't even set up the VS attributes as inputs, reducing our VPM size.
* v3d: Only add output slot tracking for the current varying slot.Eric Anholt2018-10-301-1/+1
| | | | | | | | We always emit 4 slots per slot because things like color output and position processing in the epilogue will potentially look up more values than the variable declaration had. However, when we get a .location_frac != 0, we don't want to overwrite components of the following .driver_location.
* v3d: Use nir_lower_io_to_scalar_early to DCE unused VS input components.Eric Anholt2018-10-301-0/+16
| | | | | This lets us trim unused trailing components in the vertex attributes, reducing the size of our VPM allocations.
* v3d: Don't rely on sorting input vars for VPM read setup.Eric Anholt2018-10-301-28/+20
| | | | | | | For supporting scalar VPM i/o at the NIR level, we need to do a pass over the vars to figure out how big each attribute is after DCE. Once we've done that, we can just walk over c->vattr_sizes[] instead of bothering with vars.
* v3d: Split out NIR input setup between FS and VPM.Eric Anholt2018-10-301-47/+80
| | | | | They don't share much code, and I'm about to rewrite the remaining shared code for the VPM case.
* util: use C99 declaration in the for-loop hash_table_foreach() macroEric Engestrom2018-10-252-3/+0
| | | | | Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
* v3d: Add support for hardware pack/unpack of half floats.Eric Anholt2018-10-151-0/+16
| | | | | Cuts the formerly 7-minute simulation time of fs-packHalf2x16.shader_test in half.
* v3d: Fix setup of the VCM cache size.Eric Anholt2018-09-071-1/+2
| | | | | | | | | | | There were two bugs working together to make things mostly work: I wasn't dividing the VPM output size available by the size of a batch (vertex), but I also had the size of the VPM reduced by a factor of 8. Fixes dEQP-GLES3.functional.vertex_array_objects.all_attributes and it seems also my intermittent varying failures. Fixes: 1561e4984eb0 ("v3d: Emit the VCM_CACHE_SIZE packet.")
* v3d: Emit the VCM_CACHE_SIZE packet.Eric Anholt2018-08-062-1/+22
| | | | | | | This is needed to ensure that we don't get blocked waiting for VPM space with bin/render overlapping. Cc: "18.2" <mesa-stable@lists.freedesktop.org>
* v3d: Avoid spilling that breaks the r5 usage after a ldvary.Eric Anholt2018-08-061-0/+9
| | | | | | Fixes bad rendering when forcing 2 spills in glxgears. Cc: "18.2" <mesa-stable@lists.freedesktop.org>
* v3d: Make sure that QPU instruction-has-a-dest matches VIR.Eric Anholt2018-08-062-1/+11
| | | | | | | | | Found when debugging register spilling -- we would try to spill the dest of a STVPMV, inserting spill code after entering the last segment. In fact, we were likely to to choose to do this, given that the STVPMV "dest" temp was never read from, making it cheap to spill. Cc: "18.2" <mesa-stable@lists.freedesktop.org>
* v3d: Wait for TMU writes to complete before continuing after a spill.Eric Anholt2018-08-061-1/+6
| | | | | | | | The simulator complained that we had write responses outstanding at shader end. It seems that a TMU read does not guarantee that previous TMU writes by the thread have completed, which surprised me. Cc: "18.2" <mesa-stable@lists.freedesktop.org>
* v3d: Make sure we don't emit a thrsw before the last one finished.Eric Anholt2018-08-061-2/+13
| | | | | | | Found while forcing some spilling, which creates a lot of short tmua->thrsw->ldtmu sequences. Cc: "18.2" <mesa-stable@lists.freedesktop.org>
* v3d: Add some debug code for forcing register spilling.Eric Anholt2018-08-061-0/+14
| | | | | | This is useful for periodically testing out register spilling to see how it goes on simple shaders, rather than only failing on insanely complicated ones.