summaryrefslogtreecommitdiffstats
path: root/src/broadcom
Commit message (Collapse)AuthorAgeFilesLines
* v3d: Fix the check for "is the last thrsw inside control flow"Eric Anholt2019-02-192-8/+17
| | | | | | | | | | The execute.file check used to be good enough, until I stopped setting up the execute mask for uniform ifs. No known tests fixed, noticed while doing a refactor. Fixes: 080506057310 ("v3d: Handle dynamically uniform IF statements with uniform control flow.") (cherry picked from commit 441294962cd65d44febdbe9ef0b0d99b5d27cec8)
* v3d: Use the early_fragment_tests flag for the shader's disable-EZ field.Eric Anholt2019-02-193-15/+16
| | | | | | | | | | | | Apparently we need disable-EZ flagged, not just "does Z writes". Fixes dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo on 7278, even though it passed in simulation. Signed-off-by: Eric Anholt <[email protected]> Fixes: 051a41d3d56e ("v3d: Add support for the early_fragment_tests flag.") (cherry picked from commit cd5e0b272919a654079620adecd2abe24ff51233)
* Revert "nir/opt_peephole_select: Don't peephole_select expensive math ↵Dylan Baker2019-02-121-1/+1
| | | | | | | | | | instructions" This reverts commit 378f9967710e9145f2a4f8eee89d87badbe0e6ea. This also remove the default true argument from the a2xx nir backend, which was introduced after this commit. There should be no change in functionality.
* v3d: Fix image_load_store clamping of signed integer stores.Eric Anholt2019-01-311-1/+1
| | | | | | | | | This was copy-and-paste fail, that oddly showed up in the CTS's reinterprets of r32f, rgba8, and srgba8 to rgba8i, but not r32ui and r32i to rgba8i or reinterprets to other signed int formats. Fixes: 6281f26f064a ("v3d: Add support for shader_image_load_store.") (cherry picked from commit ab4d5775b0decad7df56245cecad63912ed62b4c)
* vc4: Declare the last cpu pointer as being modified in NEON asm.Emil Velikov2019-01-311-2/+1
| | | | | | | | | | | | Earlier commit addressed 7 of the 8 instances available. v2: Rebase patch back to master (by anholt) Cc: Carsten Haitzler (Rasterman) <[email protected]> Cc: Eric Anholt <[email protected]> Fixes: 300d3ae8b14 ("vc4: Declare the cpu pointers as being modified in NEON asm.") Signed-off-by: Emil Velikov <[email protected]> (cherry picked from commit 385843ac3ce1b868d9e24fcb2dbc0c8d5f5a7c99)
* automake: Add include dir for nir src directory19.0-branchpointDylan Baker2019-01-291-0/+1
| | | | | | Fixes: 6281f26f064ada36b57d45feb68d8e7d783198c9 ("v3d: Add support for shader_image_load_store.") Reviewed-by: Jordan Justen <[email protected]>
* v3d: Fix the autotools build.Eric Anholt2019-01-291-1/+1
| | | | Noticed while looking at the gitlab-CI MR.
* vc4: Declare the cpu pointers as being modified in NEON asm.Carsten Haitzler (Rasterman)2019-01-281-18/+15
| | | | | | | | | | | | | Otherwise, the compiler is free to reuse the register containing the input for another call and assume that the value hasn't been modified. Fixes crashes on texture upload/download with current gcc. We now have to have a temporary for the cpu2 value, since outputs must be lvalues. (commit message by anholt) Fixes: 4d30024238ef ("vc4: Use NEON to speed up utile loads on Pi2.")
* vc4: Use named parameters for the NEON inline asm.Carsten Haitzler (Rasterman)2019-01-281-80/+100
| | | | | | | This makes the asm code more intelligible and clarifies the functional change in the next commit. (commit message and commit squashing by anholt)
* v3d: Create separate sampler states for the various blend formats.Eric Anholt2019-01-271-4/+4
| | | | | | | | | | | | The sampler border color is encoded in the TMU's blending format (half floats, 32-bit floats, or integers) and must be clamped to the format's range unorm/snorm/int ranges by the driver. Additionally, the TMU doesn't know about how we're abusing the swizzle to support BGRA, A, and LA, so we have to pre-swizzle the border color for those. We don't really want to spend half a kb on sampler states in most cases, so skip generating the variants when the border color is unused or is 0,0,0,0.
* v3d: Use the symbolic names for wrap modes from the XML.Eric Anholt2019-01-271-5/+5
|
* v3d: Drop maximum number of texture units down to 16.Eric Anholt2019-01-271-1/+1
| | | | | | This is the GLES 3.2 minmax, and also what the closed source driver does. Avoids hitting OOMs in the CTS's dEQP-GLES3.functional.texture.units.all_units.only_cube.1.
* v3d: Avoid duplicating limits defines between gallium and v3d core.Eric Anholt2019-01-273-5/+43
| | | | | We don't want to pull the compiler into every include in the gallium driver, so just make a new little header to store the limits.
* v3d: Fix overly-large vattr_sizes structs.Eric Anholt2019-01-271-2/+2
| | | | We want one vector size per vector, not per component.
* v3d: Add support for CS barrier() intrinsics.Eric Anholt2019-01-143-0/+61
|
* v3d: Add support for CS shared variable load/store/atomics.Eric Anholt2019-01-143-13/+83
| | | | | CS shared variables are handled effectively as SSBO access to a temporary buffer that will be allocated at CS dispatch time.
* v3d: Add support for CS workgroup/invocation id intrinsics.Eric Anholt2019-01-145-1/+67
| | | | | | We get a payload for the ivec3 workgroup and an int local invocation index, and we use the core lowering to turn into the global invocation id and the local invocation id ivec3s.
* v3d: Add support for shader_image_load_store.Eric Anholt2019-01-148-3/+652
| | | | | | This is only exposed on V3D 4.1+, because we didn't have the TMU write operations for images on 3.3 (To do GLES 3.1 there, you have to lower it to SSBO load/stores, which is a problem to solve later).
* v3d: Add SSBO/atomic counters support.Eric Anholt2019-01-143-6/+143
| | | | | So far I assume that all the buffers get written. If they weren't, you'd probably be using UBOs instead.
* v3d: Add support for matrix inputs to the FS.Eric Anholt2019-01-141-13/+14
| | | | | | We've been relying on linking splitting up our varying matrices into separate vectors, but with SSO that doesn't happen. Supporting matrix inputs isn't too hard, though.
* v3d: Fix txf_ms 2D_ARRAY array index.Eric Anholt2019-01-141-8/+10
| | | | | | We need to pass the array index through our coordinate transform unchanged. Fixes dEQP-GLES31.functional.texture.multisample.samples_1.*_2d_array
* v3d: Add support for the early_fragment_tests flag.Eric Anholt2019-01-141-0/+10
| | | | | If this flag hasn't been set by the shader and it has some visible side effects, then we need to disable EZ.
* v3d: Add support for flushing dirty TMU data at job end.Eric Anholt2019-01-141-0/+23
| | | | This will be needed for SSBOs and image_load_store.
* nir: Add nir_lower_tex support for Broadcom's swizzled TG4 results.Eric Anholt2019-01-081-0/+2
| | | | | | | | V3D returns the texels in a different order in the resulting vec4 from what GLSL wants, so we need to put in a swizzle. Fixes dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.base_level.level_1 Reviewed-by: Jason Ekstrand <[email protected]>
* v3d: Use the core tex lowering.Eric Anholt2019-01-043-123/+10
| | | | | | | | Even without any clever optimization on the unpack operations, this gives us a useful value for the channels read field, which we can use to avoid ldtmu instructions to the no-op register. instructions in affected programs: 890712 -> 881974 (-0.98%)
* v3d: Stop scalarizing our uniform loads.Eric Anholt2019-01-042-102/+57
| | | | | | | | | | | | | | | We can pull a whole vector in a single indirect load. This saves a bunch of round-trips to the TMU, instructions for setting up multiple loads, references to the UBO base in the uniforms, and apparently manages to reduce register pressure as well. instructions in affected programs: 3086665 -> 2454967 (-20.47%) uniforms in affected programs: 919581 -> 721039 (-21.59%) threads in affected programs: 1710 -> 3420 (100.00%) spills in affected programs: 596 -> 522 (-12.42%) fills in affected programs: 680 -> 562 (-17.35%) Improves 3dmmes performance by 2.29312% +/- 0.139825% (n=5)
* v3d: Do UBO loads a vector at a time.Eric Anholt2019-01-042-35/+99
| | | | | | | In the process of adding support for SSBOs and CS shared vars, I ended up needing a helper function for doing TMU general ops. This helper can be that starting point, and saves us a bunch of round-trips to the TMU by loading a vector at a time.
* v3d: Remove dead switch cases and comments from v3d_nir_lower_io.Eric Anholt2019-01-041-8/+3
| | | | Moving things to NIR left this mess around. All we lower now is uniforms.
* v3d: Reinstate the new shader-db output after v3d_compile() refactor.Eric Anholt2019-01-041-1/+18
| | | | I misplaced it in the rebase conflicts.
* v3d: Refactor compiler entrypoints.Eric Anholt2019-01-022-163/+164
| | | | | | Before, I had per-stage entryoints with some helpers shared between them. As I extended for compute shaders and shader-db, it turned out that the other common code in the middle wanted to be shared too.
* v3d: Handle dynamically uniform IF statements with uniform control flow.Eric Anholt2019-01-021-1/+65
| | | | | | | | | Loops will be trickier, since we need some analysis to figure out if the breaks/continues inside are uniform. Until we get that in NIR, this gets us some quick wins. total instructions in shared programs: 6192844 -> 6174162 (-0.30%) instructions in affected programs: 487781 -> 469099 (-3.83%)
* v3d: Fold comparisons for IF conditions into the flags for the IF.Eric Anholt2019-01-025-12/+57
| | | | | total instructions in shared programs: 6193810 -> 6192844 (-0.02%) instructions in affected programs: 800373 -> 799407 (-0.12%)
* v3d: Don't try to fold non-SSA-src comparisons into bcsels.Eric Anholt2019-01-021-1/+17
| | | | | There could have been a write of a src in between the comparison and the bcsel that would invalidate the comparison.
* v3d: Move the "Find the ALU instruction generating our bool" out of bcsel.Eric Anholt2019-01-021-6/+9
| | | | This will be reused for if statements.
* v3d: Simplify the emission of comparisons for the bcsel optimization.Eric Anholt2019-01-021-37/+24
| | | | | | I wanted to reuse the comparison stuff for nir_ifs, but for that I just want the flags and no destination value. Splitting the conditions from the destinations ended up cleaning the existing code up, anyway.
* v3d: Add support for gl_HelperInvocation.Eric Anholt2018-12-301-0/+8
| | | | | | We can just look at the MSF flags -- if they're unset, then we're definitely in a helper invocation. Fixes dEQP-GLES31.functional.shaders.helper_invocation.* with GLES3.1 enabled.
* v3d: Add support for textureSize() on MSAA textures.Eric Anholt2018-12-301-0/+1
| | | | | | Fixes failures in dEQP-GLES31.functional.shaders.builtin_functions.texture_size.samples_1_texture_2d in the GLES3.1 suite.
* v3d: Add support for non-constant texture offsets.Eric Anholt2018-12-301-8/+24
| | | | | | Fixes dEQP-GLES31.functional.texture.gather.offset_dynamic.min_required_offset.2d.rgba8.size_pot.clamp_to_edge_repeat and others.
* v3d: Force sampling from base level for tg4.Eric Anholt2018-12-301-3/+3
| | | | | | This is what the GLSL ES 310 spec tells us to do, but apparently the "gather mode" flag doesn't imply it in the HW. Fixes dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.filter_mode.min_nearest_mipmap_linear_mag_linear
* v3d: Add a note for a potential performance win on multop/umul24.Eric Anholt2018-12-301-0/+4
| | | | Noticed while debugging a testcase.
* v3d: Dead-code eliminate unused flags updates.Eric Anholt2018-12-301-4/+42
| | | | | | | | | The greedy comparison folding in bcsel means that we may have left the original bool-generating NIR ALU instruction dead, but DCE wasn't eliminating the VIR code for it because of the flags updates. total instructions in shared programs: 5186024 -> 5100894 (-1.64%) instructions in affected programs: 1448695 -> 1363565 (-5.88%)
* v3d: Don't generate temps for comparisons.Eric Anholt2018-12-301-12/+14
| | | | | This was just generated work for vir_opt_dead_code and cluttered up the dumps.
* v3d: Move "does this instruction have flags" from sched to generic helpers.Eric Anholt2018-12-306-55/+48
| | | | I wanted to reuse it for DCE of flags updates.
* v3d: Drop incorrect dependency for flpop.Eric Anholt2018-12-301-4/+0
| | | | | It is just shifting probably-means-flags bits out of a value, it doesn't actually update the flags on its own.
* v3d: Drop unused count_nir_instrs() helper.Eric Anholt2018-12-301-18/+0
| | | | | This was for shader-db, but I haven't cared about NIR instruction counts in a long time.
* v3d: Hook up some shader-db output to GL_ARB_debug_output.Eric Anholt2018-12-303-2/+43
| | | | | | | This allows the original shader-db project's run.c runner to parse things easily, and is probably a good thing to have for GL_ARB_debug_output in general. I formatted it more like Intel's so I can mostly reuse their report script.
* v3d: Add a "precompile" debug flag for shader-db.Eric Anholt2018-12-292-0/+2
| | | | | | | | | I've been using my apitrace-based shader-db so far, but it's slow (apitrace decompression), intrusive (apitrace windows spamming the screen), and doesn't have much coverage. The original shader-db provides a lot more coverage and compiles faster, at the expense of not having the actual runtime variant key. As v3d has a lot less runtime variation than vc4 did, this tradeoff makes more sense.
* v3d: Fix uniform pretty printing assertion failure with branches.Eric Anholt2018-12-291-0/+3
| | | | Fixes: 248a7fb392ba ("v3d: Do uniform pretty-printing in the QPU dump.")
* v3d: Drop shadow comparison state from shader variant key.Eric Anholt2018-12-201-2/+0
| | | | The shadow state is now in the sampler.
* v3d: Add a fallthrough path for utile load/store of 32 byte lines.Eric Anholt2018-12-191-12/+16
| | | | | | Now that V3D has 8 byte per pixel formats exposed, we've got stride==32 utiles to load and store. Just handle them through the non-NEON paths for now.