aboutsummaryrefslogtreecommitdiffstats
path: root/src/broadcom
Commit message (Collapse)AuthorAgeFilesLines
...
* v3d: Bump the maximum texture size to 4k for V3D 4.x.Eric Anholt2019-04-042-3/+4
| | | | | | | 4.1 and 4.2 both have the same 16k limit, but it I'm seeing GPU hangs in the CTS at 8k and 16k. 4k at least lets us get one 4k display working. Cc: [email protected]
* v3d: Remove some dead members of struct v3d_compile.Eric Anholt2019-03-211-12/+0
| | | | These are more vc4 leftovers.
* v3d: Upload all of UBO[0] if any indirect load occurs.Eric Anholt2019-03-213-129/+1
| | | | | | | | | | | | | | | The idea was that we could skip uploading the constant-indexed uniform data and just upload the uniforms that are variably-indexed. However, since the VS bin and render shaders may have a different set of uniforms used, this meant that we had to upload the UBO for each of them. The first case is generally a fairly small impact (usually the uniform array is the most space, other than a couple of FSes in shader-db), while the second is a larger impact: 3DMMES2 was uploading 38k/frame of uniforms instead of 18k. Given that the optimization is of dubious value, has a big downside, and is quite a bit of code, just drop it. No change in shader-db. No change on 3DMMES2 (n=15).
* v3d: Move constant offsets to UBO addresses into the main uniform stream.Eric Anholt2019-03-213-9/+16
| | | | | | | | | | We'd end up with the constant offset in the uniform stream anyway, since they're bigger than small immediates. Avoids the extra uniforms and adds in the shader in favor of just adding once on the CPU. shader-db: total instructions in shared programs: 6496865 -> 6494851 (-0.03%) total uniforms in shared programs: 2119511 -> 2117243 (-0.11%)
* v3d: Rename v3d_tmu_config_data to v3d_unit_data.Eric Anholt2019-03-212-9/+9
| | | | | | I want to reuse this for encoding small constant UBO/SSBO offsets into the uniform stream to reduce the extra uniform loads and adds for the small constant offsets.
* v3d: Fix leak of the mem_ctx after the DAG refactor.Eric Anholt2019-03-121-2/+2
| | | | | | Noticed while trying to get a CTS run again. Fixes: 33886474d646 ("v3d: Use the DAG datastructure for QPU instruction scheduling.")
* v3d: Use the DAG datastructure for QPU instruction scheduling.Eric Anholt2019-03-111-114/+72
| | | | Just a small code reduction from shared infrastructure.
* v3d: Reuse list_for_each_entry_rev().Eric Anholt2019-03-111-2/+2
|
* v3d: Include a count of register pressure in the RA failure dumps.Eric Anholt2019-03-061-1/+13
| | | | | | You usually want to go find the highest pressure and figure out why you couldn't spill or what pattern led to a bunch of pressure leading to that point.
* v3d: Drop the V3D 3.x vpm read dead code elimination.Eric Anholt2019-03-051-33/+2
| | | | | We now have NIR dead code eliminating our VPM reads, so this shouldn't be necessary.
* v3d: Eliminate the TLB and TLBU files.Eric Anholt2019-03-054-41/+20
| | | | We can just use the magic register file like we do for other magic waddrs.
* v3d: Use ldunif instructions for uniforms.Eric Anholt2019-03-0511-270/+27
| | | | | | | | | | | | | | The idea is that for repeated use of the same uniform, we could avoid loading it on each consumer. The results look pretty good. total instructions in shared programs: 6413571 -> 6521464 (1.68%) total threads in shared programs: 154214 -> 154000 (-0.14%) total uniforms in shared programs: 2393604 -> 2119629 (-11.45%) total spills in shared programs: 4960 -> 4984 (0.48%) total fills in shared programs: 6350 -> 6418 (1.07%) Once we do scheduling at the NIR level, the register pressure (and thus also instructions) issues we see here will drop back down.
* v3d: Add support for register-allocating a ldunif to a QFILE_TEMP.Eric Anholt2019-03-052-14/+77
| | | | | On V3D 4.x, we can use ldunifrf to load uniforms to any register, and this will let us schedule the ldunif wherever we want in the program.
* v3d: Drop the old class bits splitting up the accumulators.Eric Anholt2019-03-051-7/+3
| | | | This seems to be left over from vc4, and I don't use them any more.
* v3d: Add support for vir-to-qpu of ldunif instructions to a temp.Eric Anholt2019-03-051-2/+15
| | | | | We can load a uniform to any register, so add support for non-ALU instructions with sig.ldunif to a temp.
* v3d: Switch implicit uniforms over to being any qinst->uniform != ~0.Eric Anholt2019-03-0510-123/+77
| | | | | I'm not sure why I didn't do this before -- it's clearly much simpler to add dumping of the extra thing than to have it as another implicit source.
* v3d: Do uniform rematerialization spilling before dropping threadcountEric Anholt2019-03-051-8/+10
| | | | | | | | | This feels like the right tradeoff for threads vs uniforms, particularly given that we often have very short thread segments right now: total instructions in shared programs: 6411504 -> 6413571 (0.03%) total threads in shared programs: 153946 -> 154214 (0.17%) total uniforms in shared programs: 2387665 -> 2393604 (0.25%)
* v3d: Fix temporary leaks of temp_registers and when spilling.Eric Anholt2019-03-051-5/+4
| | | | | | | On each iteration of successfully spilling a reg, we'd allocate another copy of temp_registers, and when decrementing thread conut we'd allocate another copy of the graph. These all got cleaned up on freeing the compile.
* v3d: Move the stores for fixed function VS output reads into NIR.Eric Anholt2019-03-054-195/+334
| | | | | | | | | | | | | | | This lets us emit the VPM_WRITEs directly from nir_intrinsic_store_output() (useful once NIR scheduling is in place so that we can reduce register pressure), and lets future NIR scheduling schedule the math to generate them. Even in the meantime, it looks like this lets NIR DCE some more code and make better decisions. total instructions in shared programs: 6429246 -> 6412976 (-0.25%) total threads in shared programs: 153924 -> 153934 (<.01%) total loops in shared programs: 486 -> 483 (-0.62%) total uniforms in shared programs: 2385436 -> 2388195 (0.12%) Acked-by: Ian Romanick <[email protected]> (nir)
* v3d: Translate f2i(fround_even) as FTOIN.Eric Anholt2019-03-051-2/+9
| | | | | This appears to be just what the opcode does. Needed for equivalence when moving FF VPM stores into NIR.
* v3d: Stop treating exec masking specially.Eric Anholt2019-03-053-14/+3
| | | | | | | | | | | | | | | In our backend, the successor edges from the blocks only point to where QPU control flow goes, not where the notional control flow goes from a "break" or "continue" modifying the execution mask to resume writing to some channels later. As a result, this attempt at restricting live ranges ended up missing the live range of a value where a conditional break/continue was present in a loop before the later def of a variable. The previous commit ended up fixing the problem that the flag tried to solve. Fixes glsl-vs-loop-continue.shader_test and/or glsl-vs-loop-redundant-condition.shader_test based on register allocation results.
* v3d: Restrict live intervals to the blocks reachable from any def.Eric Anholt2019-03-052-4/+43
| | | | | | | | | | | | | | | In the backend, we often have condition codes on writes to variables, such that there's no screening def anywhere and the previous live ranges algorithm would conclude that the start of the range extends to the start of the program. However, we do know that the live range can only extend as early as you can reach from all blocks writing to the variable. The motivation was that, while we have a couple of hacks to try to promote conditional writes up to being a def within the block, the exec_mask one was broken and needed a replacement. Based on c3c1aa5aeb92 ("intel/fs: Restrict live intervals to the subset possibly reachable from any definition.").
* v3d: Rematerialize MOVs of uniforms instead of spilling them.Eric Anholt2019-02-252-27/+68
| | | | | | | | | | | | | | | | | | | | | | | | | If we have a MOV of a uniform value available to spill, that's one of our best choices. We can just not spill the value, and emit a new load of the uniform as the fill. This saves bothering the TMU and the thrsw, and is the same cost in uniforms (since the spill offset is a uniform anyway). This doesn't have a huge impact on shader-db, since there aren't a whole lot of spills and we usually copy-prop the uniforms at the VIR level such that the only uniform MOVs are from vir_lower_uniforms: total instructions in shared programs: 6430292 -> 6430279 (<.01%) total uniforms in shared programs: 2386023 -> 2385787 (<.01%) total spills in shared programs: 4961 -> 4960 (-0.02%) total fills in shared programs: 6352 -> 6350 (-0.03%) However, I'm interested in dropping the uniforms copy-prop in the backend, since it would be cheaper to not load repeated uniforms if we have the registers to spare. This also saves many spills on dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20, which is what motivated a bunch of my recent backend work in the first place: before: 46 spills, 106 fills, 3062 instructions after: 0 spills, 0 fills, 2611 instructions
* v3d: Dump the VIR after register spilling if we were forced to.Eric Anholt2019-02-251-0/+10
| | | | | Spilling is unusual, but one often has to debug it when it happens, so dump it.
* v3d: Fix vir_is_raw_mov() for input unpacks.Eric Anholt2019-02-251-0/+7
| | | | | There are no users at the moment, but I wanted to start using this in register spilling.
* v3d: Move i2b and f2b support into emit_comparison.Eric Anholt2019-02-181-13/+12
| | | | | This lets us save a resolve to NIR true/false for ifs and discard_if. No change in shader-db.
* v3d: Emit a simpler negate for the iabs implementation.Eric Anholt2019-02-181-2/+1
| | | | | | One program affected in my shader-db. instructions in affected programs: 110 -> 108 (-1.82%)
* v3d: Delay emitting ldvpm on V3D 4.x until it's actually used.Eric Anholt2019-02-181-6/+43
| | | | | | | | | | | For V3D 3.x, we emitted the ldvpms all at the top so that we didn't need to do VPM setup when the load_inputs are out of order. For V3D 4.x, we can reduce register pressure by delaying our loads until they're actually needed. This also avoids a bunch of silly MOVs in the pre-opt VIR dump. total instructions in shared programs: 6421415 -> 6419933 (-0.02%) total uniforms in shared programs: 2393139 -> 2393140 (<.01%) total threads in shared programs: 153864 -> 153906 (0.03%)
* v3d: Stop tracking num_inputs for VPM loads.Eric Anholt2019-02-184-7/+2
| | | | | It's unused in the VS (since we need vattr_sizes[] anyway), so move it to FS prog data.
* v3d: Add a function to describe what the c->execute.file check means.Eric Anholt2019-02-182-8/+14
| | | | | This is what pointed out that we were misusing the check for last_thrsw in the previous commit.
* v3d: Fix the check for "is the last thrsw inside control flow"Eric Anholt2019-02-182-8/+17
| | | | | | | | | The execute.file check used to be good enough, until I stopped setting up the execute mask for uniform ifs. No known tests fixed, noticed while doing a refactor. Fixes: 080506057310 ("v3d: Handle dynamically uniform IF statements with uniform control flow.")
* v3d: Fix f2b32 behavior.Eric Anholt2019-02-181-1/+6
| | | | | | Now that we don't have the vir_PF() magic, it's obvious that we were doing the wrong thing for f2b32 by allowing -0.0 to produce true instead of false.
* v3d: Kill off vir_PF(), which is hard to use right.Eric Anholt2019-02-183-70/+36
| | | | | | | | | | | | | | | | | | You were allowed to pass in any old temp so that you could hopefully fold the PF up into the def of the temp. If we couldn't find one, it implicitly generated a MOV(nop, reg). However, that PF could have different behavior depending on whether the def being folded into was a float or int opcode, which the caller doesn't necessarily control. Due to the fragility of the function, just switch all callers over to vir_set_pf(). This also encourages the callers to use a _dest call for the inst they're putting the PF on, eliminating a bunch of temps in the pre-optimization VIR. shader-db says the change is in the noise: total instructions in shared programs: 6226247 -> 6227184 (0.02%) instructions in affected programs: 851068 -> 852005 (0.11%)
* v3d: Do bool-to-cond for discard_if as well.Eric Anholt2019-02-181-16/+12
| | | | | | | | | | | | | | | | Turns this minimal conditional discard (glsl-fs-discard-01.shader_test): 0x3de0b086c5fe9000 fcmp.pushn -, r1, r5; mov r2, 0 0x3dec3086bbfc001f nop ; mov.ifa r2, -1 0x3c047186bbe80000 nop ; mov.pushz -, r2 0x3dea3186ba837000 setmsf.ifna -, 0 ; nop into: 0x3c00b186c582a000 fcmp.pushn -, r2, r5; nop 0x3de83186ba837000 setmsf.ifa -, 0 ; nop total instructions in shared programs: 6229820 -> 6226247 (-0.06%)
* v3d: Refactor bcsel and if condition handling.Eric Anholt2019-02-181-29/+14
| | | | | | | Both were doing the same thing to try to get a condition to predicate on. Noticed when I wanted to do this for discard_if as well. No change in shader-db.
* v3d: Add a helper function for getting a nop register.Eric Anholt2019-02-185-14/+18
| | | | Just a little refactor to explain what's going on with QFILE_NULL.
* v3d: Drop our hand-lowered nir_op_ffract.Eric Anholt2019-02-181-3/+1
| | | | | | | | | The NIR lowering works fine, though it causes some slight noise due to what looks like choices about propagating constants up multiply chains changing. total instructions in shared programs: 6229671 -> 6229820 (<.01%) total uniforms in shared programs: 2312171 -> 2312324 (<.01%)
* v3d: Drop a perf note about merging unpack_half_*, which has been implemented.Eric Anholt2019-02-181-3/+0
| | | | This is handled with copy-propagation now.
* v3d: Fix incorrect flagging of ldtmu as writing r4 on v3d 4.x.Eric Anholt2019-02-181-5/+4
| | | | | | | Fixes some stalls in 3DMMES's main vertex shader. total instructions in shared programs: 6280751 -> 6211270 (-1.11%) instructions in affected programs: 2935050 -> 2865569 (-2.37%)
* v3d: Use the early_fragment_tests flag for the shader's disable-EZ field.Eric Anholt2019-02-183-15/+16
| | | | | | | | | | | Apparently we need disable-EZ flagged, not just "does Z writes". Fixes dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo on 7278, even though it passed in simulation. Signed-off-by: Eric Anholt <[email protected]> Fixes: 051a41d3d56e ("v3d: Add support for the early_fragment_tests flag.")
* v3d: Use the NIR lowering for isign instead of rolling our own.Eric Anholt2019-02-141-16/+1
| | | | | min/max instead of comparisons saves 2 instructions on fs-sign-int.shader_test.
* v3d: Whitespace consistency fix.Eric Anholt2019-02-051-1/+1
|
* v3d: Fix copy-propagation of input unpacks.Eric Anholt2019-02-055-35/+94
| | | | | | | | | | | | | | | | | I had a single function for "does this do float input unpacking" with two major flaws: It was missing the most common thing to try to copy propagate a f32 input nunpack to (the VFPACK to an FP16 render target) along with several other ALU ops, and also would try to propagate an f32 unpack into a VFMUL which only does f16 unpacks. instructions in affected programs: 659232 -> 655895 (-0.51%) uniforms in affected programs: 132613 -> 135336 (2.05%) and a couple of programs increase their thread counts. The uniforms hit appears to be a pattern in generated code of doing (-a >= a) comparisons, which when a is abs(b) can result in the abs instruction being copy propagated once but not fully DCEed.
* v3d: Fix input packing of .l for rounding/fdx/fdy.Eric Anholt2019-02-052-1/+2
| | | | | | Avoids a regression in dEQP-GLES3.functional.shaders.derivate.fwidth.texture.* once we start copy-propagating more input packs.
* v3d: Fix pack/unpack of VFPACK operand unpacks.Eric Anholt2019-02-052-1/+33
| | | | We want to be able to copy propagate our texture unpacks into the vfpack.
* v3d: Fix dumping of shaders with alpha test.Eric Anholt2019-02-051-1/+3
| | | | We were trying to print a NULL entry from the table.
* v3d: Store the actual mask of color buffers present in the key.Eric Anholt2019-02-052-5/+6
| | | | | | | If you only bound rt 1+, we'd still emit a write to the rt0 that isn't present (noticed while debugging an ext_framebuffer_multisample-alpha-to-coverage-no-draw-buffer-zero regression in another change).
* v3d: Fix image_load_store clamping of signed integer stores.Eric Anholt2019-01-311-1/+1
| | | | | | | | This was copy-and-paste fail, that oddly showed up in the CTS's reinterprets of r32f, rgba8, and srgba8 to rgba8i, but not r32ui and r32i to rgba8i or reinterprets to other signed int formats. Fixes: 6281f26f064a ("v3d: Add support for shader_image_load_store.")
* v3d: Fix a release build set-but-unused compiler warning.Eric Anholt2019-01-291-1/+2
|
* vc4: Declare the last cpu pointer as being modified in NEON asm.Emil Velikov2019-01-291-2/+1
| | | | | | | | | | | Earlier commit addressed 7 of the 8 instances available. v2: Rebase patch back to master (by anholt) Cc: Carsten Haitzler (Rasterman) <[email protected]> Cc: Eric Anholt <[email protected]> Fixes: 300d3ae8b14 ("vc4: Declare the cpu pointers as being modified in NEON asm.") Signed-off-by: Emil Velikov <[email protected]>