summaryrefslogtreecommitdiffstats
path: root/src/broadcom
Commit message (Collapse)AuthorAgeFilesLines
* v3d: Move i2b and f2b support into emit_comparison.Eric Anholt2019-02-181-13/+12
| | | | | This lets us save a resolve to NIR true/false for ifs and discard_if. No change in shader-db.
* v3d: Emit a simpler negate for the iabs implementation.Eric Anholt2019-02-181-2/+1
| | | | | | One program affected in my shader-db. instructions in affected programs: 110 -> 108 (-1.82%)
* v3d: Delay emitting ldvpm on V3D 4.x until it's actually used.Eric Anholt2019-02-181-6/+43
| | | | | | | | | | | For V3D 3.x, we emitted the ldvpms all at the top so that we didn't need to do VPM setup when the load_inputs are out of order. For V3D 4.x, we can reduce register pressure by delaying our loads until they're actually needed. This also avoids a bunch of silly MOVs in the pre-opt VIR dump. total instructions in shared programs: 6421415 -> 6419933 (-0.02%) total uniforms in shared programs: 2393139 -> 2393140 (<.01%) total threads in shared programs: 153864 -> 153906 (0.03%)
* v3d: Stop tracking num_inputs for VPM loads.Eric Anholt2019-02-184-7/+2
| | | | | It's unused in the VS (since we need vattr_sizes[] anyway), so move it to FS prog data.
* v3d: Add a function to describe what the c->execute.file check means.Eric Anholt2019-02-182-8/+14
| | | | | This is what pointed out that we were misusing the check for last_thrsw in the previous commit.
* v3d: Fix the check for "is the last thrsw inside control flow"Eric Anholt2019-02-182-8/+17
| | | | | | | | | The execute.file check used to be good enough, until I stopped setting up the execute mask for uniform ifs. No known tests fixed, noticed while doing a refactor. Fixes: 080506057310 ("v3d: Handle dynamically uniform IF statements with uniform control flow.")
* v3d: Fix f2b32 behavior.Eric Anholt2019-02-181-1/+6
| | | | | | Now that we don't have the vir_PF() magic, it's obvious that we were doing the wrong thing for f2b32 by allowing -0.0 to produce true instead of false.
* v3d: Kill off vir_PF(), which is hard to use right.Eric Anholt2019-02-183-70/+36
| | | | | | | | | | | | | | | | | | You were allowed to pass in any old temp so that you could hopefully fold the PF up into the def of the temp. If we couldn't find one, it implicitly generated a MOV(nop, reg). However, that PF could have different behavior depending on whether the def being folded into was a float or int opcode, which the caller doesn't necessarily control. Due to the fragility of the function, just switch all callers over to vir_set_pf(). This also encourages the callers to use a _dest call for the inst they're putting the PF on, eliminating a bunch of temps in the pre-optimization VIR. shader-db says the change is in the noise: total instructions in shared programs: 6226247 -> 6227184 (0.02%) instructions in affected programs: 851068 -> 852005 (0.11%)
* v3d: Do bool-to-cond for discard_if as well.Eric Anholt2019-02-181-16/+12
| | | | | | | | | | | | | | | | Turns this minimal conditional discard (glsl-fs-discard-01.shader_test): 0x3de0b086c5fe9000 fcmp.pushn -, r1, r5; mov r2, 0 0x3dec3086bbfc001f nop ; mov.ifa r2, -1 0x3c047186bbe80000 nop ; mov.pushz -, r2 0x3dea3186ba837000 setmsf.ifna -, 0 ; nop into: 0x3c00b186c582a000 fcmp.pushn -, r2, r5; nop 0x3de83186ba837000 setmsf.ifa -, 0 ; nop total instructions in shared programs: 6229820 -> 6226247 (-0.06%)
* v3d: Refactor bcsel and if condition handling.Eric Anholt2019-02-181-29/+14
| | | | | | | Both were doing the same thing to try to get a condition to predicate on. Noticed when I wanted to do this for discard_if as well. No change in shader-db.
* v3d: Add a helper function for getting a nop register.Eric Anholt2019-02-185-14/+18
| | | | Just a little refactor to explain what's going on with QFILE_NULL.
* v3d: Drop our hand-lowered nir_op_ffract.Eric Anholt2019-02-181-3/+1
| | | | | | | | | The NIR lowering works fine, though it causes some slight noise due to what looks like choices about propagating constants up multiply chains changing. total instructions in shared programs: 6229671 -> 6229820 (<.01%) total uniforms in shared programs: 2312171 -> 2312324 (<.01%)
* v3d: Drop a perf note about merging unpack_half_*, which has been implemented.Eric Anholt2019-02-181-3/+0
| | | | This is handled with copy-propagation now.
* v3d: Fix incorrect flagging of ldtmu as writing r4 on v3d 4.x.Eric Anholt2019-02-181-5/+4
| | | | | | | Fixes some stalls in 3DMMES's main vertex shader. total instructions in shared programs: 6280751 -> 6211270 (-1.11%) instructions in affected programs: 2935050 -> 2865569 (-2.37%)
* v3d: Use the early_fragment_tests flag for the shader's disable-EZ field.Eric Anholt2019-02-183-15/+16
| | | | | | | | | | | Apparently we need disable-EZ flagged, not just "does Z writes". Fixes dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo on 7278, even though it passed in simulation. Signed-off-by: Eric Anholt <[email protected]> Fixes: 051a41d3d56e ("v3d: Add support for the early_fragment_tests flag.")
* v3d: Use the NIR lowering for isign instead of rolling our own.Eric Anholt2019-02-141-16/+1
| | | | | min/max instead of comparisons saves 2 instructions on fs-sign-int.shader_test.
* v3d: Whitespace consistency fix.Eric Anholt2019-02-051-1/+1
|
* v3d: Fix copy-propagation of input unpacks.Eric Anholt2019-02-055-35/+94
| | | | | | | | | | | | | | | | | I had a single function for "does this do float input unpacking" with two major flaws: It was missing the most common thing to try to copy propagate a f32 input nunpack to (the VFPACK to an FP16 render target) along with several other ALU ops, and also would try to propagate an f32 unpack into a VFMUL which only does f16 unpacks. instructions in affected programs: 659232 -> 655895 (-0.51%) uniforms in affected programs: 132613 -> 135336 (2.05%) and a couple of programs increase their thread counts. The uniforms hit appears to be a pattern in generated code of doing (-a >= a) comparisons, which when a is abs(b) can result in the abs instruction being copy propagated once but not fully DCEed.
* v3d: Fix input packing of .l for rounding/fdx/fdy.Eric Anholt2019-02-052-1/+2
| | | | | | Avoids a regression in dEQP-GLES3.functional.shaders.derivate.fwidth.texture.* once we start copy-propagating more input packs.
* v3d: Fix pack/unpack of VFPACK operand unpacks.Eric Anholt2019-02-052-1/+33
| | | | We want to be able to copy propagate our texture unpacks into the vfpack.
* v3d: Fix dumping of shaders with alpha test.Eric Anholt2019-02-051-1/+3
| | | | We were trying to print a NULL entry from the table.
* v3d: Store the actual mask of color buffers present in the key.Eric Anholt2019-02-052-5/+6
| | | | | | | If you only bound rt 1+, we'd still emit a write to the rt0 that isn't present (noticed while debugging an ext_framebuffer_multisample-alpha-to-coverage-no-draw-buffer-zero regression in another change).
* v3d: Fix image_load_store clamping of signed integer stores.Eric Anholt2019-01-311-1/+1
| | | | | | | | This was copy-and-paste fail, that oddly showed up in the CTS's reinterprets of r32f, rgba8, and srgba8 to rgba8i, but not r32ui and r32i to rgba8i or reinterprets to other signed int formats. Fixes: 6281f26f064a ("v3d: Add support for shader_image_load_store.")
* v3d: Fix a release build set-but-unused compiler warning.Eric Anholt2019-01-291-1/+2
|
* vc4: Declare the last cpu pointer as being modified in NEON asm.Emil Velikov2019-01-291-2/+1
| | | | | | | | | | | Earlier commit addressed 7 of the 8 instances available. v2: Rebase patch back to master (by anholt) Cc: Carsten Haitzler (Rasterman) <[email protected]> Cc: Eric Anholt <[email protected]> Fixes: 300d3ae8b14 ("vc4: Declare the cpu pointers as being modified in NEON asm.") Signed-off-by: Emil Velikov <[email protected]>
* automake: Add include dir for nir src directory19.0-branchpointDylan Baker2019-01-291-0/+1
| | | | | | Fixes: 6281f26f064ada36b57d45feb68d8e7d783198c9 ("v3d: Add support for shader_image_load_store.") Reviewed-by: Jordan Justen <[email protected]>
* v3d: Fix the autotools build.Eric Anholt2019-01-291-1/+1
| | | | Noticed while looking at the gitlab-CI MR.
* vc4: Declare the cpu pointers as being modified in NEON asm.Carsten Haitzler (Rasterman)2019-01-281-18/+15
| | | | | | | | | | | | | Otherwise, the compiler is free to reuse the register containing the input for another call and assume that the value hasn't been modified. Fixes crashes on texture upload/download with current gcc. We now have to have a temporary for the cpu2 value, since outputs must be lvalues. (commit message by anholt) Fixes: 4d30024238ef ("vc4: Use NEON to speed up utile loads on Pi2.")
* vc4: Use named parameters for the NEON inline asm.Carsten Haitzler (Rasterman)2019-01-281-80/+100
| | | | | | | This makes the asm code more intelligible and clarifies the functional change in the next commit. (commit message and commit squashing by anholt)
* v3d: Create separate sampler states for the various blend formats.Eric Anholt2019-01-271-4/+4
| | | | | | | | | | | | The sampler border color is encoded in the TMU's blending format (half floats, 32-bit floats, or integers) and must be clamped to the format's range unorm/snorm/int ranges by the driver. Additionally, the TMU doesn't know about how we're abusing the swizzle to support BGRA, A, and LA, so we have to pre-swizzle the border color for those. We don't really want to spend half a kb on sampler states in most cases, so skip generating the variants when the border color is unused or is 0,0,0,0.
* v3d: Use the symbolic names for wrap modes from the XML.Eric Anholt2019-01-271-5/+5
|
* v3d: Drop maximum number of texture units down to 16.Eric Anholt2019-01-271-1/+1
| | | | | | This is the GLES 3.2 minmax, and also what the closed source driver does. Avoids hitting OOMs in the CTS's dEQP-GLES3.functional.texture.units.all_units.only_cube.1.
* v3d: Avoid duplicating limits defines between gallium and v3d core.Eric Anholt2019-01-273-5/+43
| | | | | We don't want to pull the compiler into every include in the gallium driver, so just make a new little header to store the limits.
* v3d: Fix overly-large vattr_sizes structs.Eric Anholt2019-01-271-2/+2
| | | | We want one vector size per vector, not per component.
* v3d: Add support for CS barrier() intrinsics.Eric Anholt2019-01-143-0/+61
|
* v3d: Add support for CS shared variable load/store/atomics.Eric Anholt2019-01-143-13/+83
| | | | | CS shared variables are handled effectively as SSBO access to a temporary buffer that will be allocated at CS dispatch time.
* v3d: Add support for CS workgroup/invocation id intrinsics.Eric Anholt2019-01-145-1/+67
| | | | | | We get a payload for the ivec3 workgroup and an int local invocation index, and we use the core lowering to turn into the global invocation id and the local invocation id ivec3s.
* v3d: Add support for shader_image_load_store.Eric Anholt2019-01-148-3/+652
| | | | | | This is only exposed on V3D 4.1+, because we didn't have the TMU write operations for images on 3.3 (To do GLES 3.1 there, you have to lower it to SSBO load/stores, which is a problem to solve later).
* v3d: Add SSBO/atomic counters support.Eric Anholt2019-01-143-6/+143
| | | | | So far I assume that all the buffers get written. If they weren't, you'd probably be using UBOs instead.
* v3d: Add support for matrix inputs to the FS.Eric Anholt2019-01-141-13/+14
| | | | | | We've been relying on linking splitting up our varying matrices into separate vectors, but with SSO that doesn't happen. Supporting matrix inputs isn't too hard, though.
* v3d: Fix txf_ms 2D_ARRAY array index.Eric Anholt2019-01-141-8/+10
| | | | | | We need to pass the array index through our coordinate transform unchanged. Fixes dEQP-GLES31.functional.texture.multisample.samples_1.*_2d_array
* v3d: Add support for the early_fragment_tests flag.Eric Anholt2019-01-141-0/+10
| | | | | If this flag hasn't been set by the shader and it has some visible side effects, then we need to disable EZ.
* v3d: Add support for flushing dirty TMU data at job end.Eric Anholt2019-01-141-0/+23
| | | | This will be needed for SSBOs and image_load_store.
* nir: Add nir_lower_tex support for Broadcom's swizzled TG4 results.Eric Anholt2019-01-081-0/+2
| | | | | | | | V3D returns the texels in a different order in the resulting vec4 from what GLSL wants, so we need to put in a swizzle. Fixes dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.base_level.level_1 Reviewed-by: Jason Ekstrand <[email protected]>
* v3d: Use the core tex lowering.Eric Anholt2019-01-043-123/+10
| | | | | | | | Even without any clever optimization on the unpack operations, this gives us a useful value for the channels read field, which we can use to avoid ldtmu instructions to the no-op register. instructions in affected programs: 890712 -> 881974 (-0.98%)
* v3d: Stop scalarizing our uniform loads.Eric Anholt2019-01-042-102/+57
| | | | | | | | | | | | | | | We can pull a whole vector in a single indirect load. This saves a bunch of round-trips to the TMU, instructions for setting up multiple loads, references to the UBO base in the uniforms, and apparently manages to reduce register pressure as well. instructions in affected programs: 3086665 -> 2454967 (-20.47%) uniforms in affected programs: 919581 -> 721039 (-21.59%) threads in affected programs: 1710 -> 3420 (100.00%) spills in affected programs: 596 -> 522 (-12.42%) fills in affected programs: 680 -> 562 (-17.35%) Improves 3dmmes performance by 2.29312% +/- 0.139825% (n=5)
* v3d: Do UBO loads a vector at a time.Eric Anholt2019-01-042-35/+99
| | | | | | | In the process of adding support for SSBOs and CS shared vars, I ended up needing a helper function for doing TMU general ops. This helper can be that starting point, and saves us a bunch of round-trips to the TMU by loading a vector at a time.
* v3d: Remove dead switch cases and comments from v3d_nir_lower_io.Eric Anholt2019-01-041-8/+3
| | | | Moving things to NIR left this mess around. All we lower now is uniforms.
* v3d: Reinstate the new shader-db output after v3d_compile() refactor.Eric Anholt2019-01-041-1/+18
| | | | I misplaced it in the rebase conflicts.
* v3d: Refactor compiler entrypoints.Eric Anholt2019-01-022-163/+164
| | | | | | Before, I had per-stage entryoints with some helpers shared between them. As I extended for compute shaders and shader-db, it turned out that the other common code in the middle wanted to be shared too.