summaryrefslogtreecommitdiffstats
path: root/src/broadcom/qpu
Commit message (Collapse)AuthorAgeFilesLines
* tree-wide: replace MAYBE_UNUSED with ASSERTEDEric Engestrom2019-07-311-1/+1
| | | | | | Suggested-by: Jason Ekstrand <[email protected]> Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* v3d: add shader-db stat to count SFU stallsJose Maria Casanova Crespo2019-07-222-12/+23
| | | | | | | | | | | | | | SFU operations have a latency of 2 cicles, so if their results are used in the following cycle to a SFU instruction, the GPU stalls for an extra cycle until the result is available. This adds the number of stalls to the shader-db debug mode and sum of instruction + stalls to evaluate optimizations to schedule instructions that avoid generating sfu-stalls. v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric) Reviewed-by: Eric Anholt <[email protected]>
* v3d: handle ldtlb and ldtlbu signals during disassemblyIago Toral Quiroga2019-07-121-0/+2
| | | | | | | | We already have code to print these signals but the early return in the code that checks if any signals are present present was missing the checks for them, so it would skip printing them unless they were paired with other signals. Reviewed-by: Eric Anholt <[email protected]>
* v3d: implement simultaneous peripheral access exceptions for V3D 4.1+Iago Toral Quiroga2019-06-182-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shader-db results: total instructions in shared programs: 9117550 -> 9102719 (-0.16%) instructions in affected programs: 1752873 -> 1738042 (-0.85%) helped: 7076 HURT: 478 helped stats (abs) min: 1 max: 22 x̄: 2.19 x̃: 2 helped stats (rel) min: 0.07% max: 13.89% x̄: 1.70% x̃: 1.07% HURT stats (abs) min: 1 max: 7 x̄: 1.41 x̃: 1 HURT stats (rel) min: 0.09% max: 10.17% x̄: 0.86% x̃: 0.54% 95% mean confidence interval for instructions value: -2.00 -1.92 95% mean confidence interval for instructions %-change: -1.58% -1.50% Instructions are helped. total max-temps in shared programs: 1327774 -> 1327728 (<.01%) max-temps in affected programs: 1025 -> 979 (-4.49%) helped: 47 HURT: 2 helped stats (abs) min: 1 max: 2 x̄: 1.02 x̃: 1 helped stats (rel) min: 2.63% max: 20.00% x̄: 7.67% x̃: 5.26% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 4.17% max: 4.17% x̄: 4.17% x̃: 4.17% 95% mean confidence interval for max-temps value: -1.06 -0.82 95% mean confidence interval for max-temps %-change: -8.89% -5.49% Max-temps are helped. Reviewed-by: Eric Anholt <[email protected]>
* delete autotools .gitignore filesEric Engestrom2019-04-291-1/+0
| | | | | | | | One special case, `src/util/xmlpool/.gitignore` is not entirely deleted, as `xmlpool.pot` still gets generated (eg. by `ninja xmlpool-pot`). Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* v3d: Fix incorrect flagging of ldtmu as writing r4 on v3d 4.x.Eric Anholt2019-02-181-5/+4
| | | | | | | Fixes some stalls in 3DMMES's main vertex shader. total instructions in shared programs: 6280751 -> 6211270 (-1.11%) instructions in affected programs: 2935050 -> 2865569 (-2.37%)
* v3d: Fix copy-propagation of input unpacks.Eric Anholt2019-02-052-0/+69
| | | | | | | | | | | | | | | | | I had a single function for "does this do float input unpacking" with two major flaws: It was missing the most common thing to try to copy propagate a f32 input nunpack to (the VFPACK to an FP16 render target) along with several other ALU ops, and also would try to propagate an f32 unpack into a VFMUL which only does f16 unpacks. instructions in affected programs: 659232 -> 655895 (-0.51%) uniforms in affected programs: 132613 -> 135336 (2.05%) and a couple of programs increase their thread counts. The uniforms hit appears to be a pattern in generated code of doing (-a >= a) comparisons, which when a is abs(b) can result in the abs instruction being copy propagated once but not fully DCEed.
* v3d: Fix input packing of .l for rounding/fdx/fdy.Eric Anholt2019-02-052-1/+2
| | | | | | Avoids a regression in dEQP-GLES3.functional.shaders.derivate.fwidth.texture.* once we start copy-propagating more input packs.
* v3d: Fix pack/unpack of VFPACK operand unpacks.Eric Anholt2019-02-052-1/+33
| | | | We want to be able to copy propagate our texture unpacks into the vfpack.
* v3d: Fold comparisons for IF conditions into the flags for the IF.Eric Anholt2019-01-022-0/+19
| | | | | total instructions in shared programs: 6193810 -> 6192844 (-0.02%) instructions in affected programs: 800373 -> 799407 (-0.12%)
* v3d: Move "does this instruction have flags" from sched to generic helpers.Eric Anholt2018-12-302-0/+43
| | | | I wanted to reuse it for DCE of flags updates.
* v3d: Do uniform pretty-printing in the QPU dump.Eric Anholt2018-12-142-0/+15
| | | | | If you're trying to trace what's going on in a QPU dump, this will definitely help you find your way.
* v3d: Add missing flagging of SYNCB as a TSY op.Eric Anholt2018-12-141-0/+1
| | | | Fixes: f2e41daac577 ("broadcom/vc5: Update QPU instruction pack/unpack for v4.2.")
* meson: Add tests to suitesDylan Baker2018-11-201-1/+2
| | | | | | | | | | | | | | | | Meson test has a concepts of suites, which allow tests to be grouped together. This allows for a subtest of tests to be run only (say only the tests for nir). A test can be added to more than one suite, but for the most part I've only added a test to a single suite, though I've added a compiler group that includes nir, glsl, and glcpp tests. To use this you'll need to invoke meson test directly, instead of ninja test (which always runs all targets). it can be invoked as: `meson test -C builddir --suite $suitename` (meson test has addition options that are pretty useful). Tested-By: Gert Wollny <[email protected]> Acked-by: Eric Engestrom <[email protected]>
* v3d: Add support for the TMUWT instruction.Eric Anholt2018-07-312-0/+9
| | | | | | This instruction is used to ensure that TMU stores have been processed before moving on. In particular, you need any TMU ops to be done by the time the shader ends.
* vc4: Fix meson build when enabled without v3d.Eric Anholt2018-07-291-0/+2
| | | | | Reported-by: Rob Clark <[email protected]> Fixes: e92959c4e03c ("v3d: Pass the whole clif_dump structure to v3d_print_group().")
* v3d: Switch to using the new SFU instructions on V3D 4.x.Eric Anholt2018-07-232-0/+31
| | | | | | | | | | | | | | | | These instructions let us write directly to the phys regfile, instead of just R4. That lets us avoid moving out of R4 to avoid conflicting with other SFU results, and to avoid conflicting with thread switches. There is still an extra instruction of latency, which is not represented in the scheduler at the moment. If you use the result before it's ready, the QPU will just stall, unlike the magic R4 mode where you'd read the previous value. That means that the following shader-db results aren't quite representative (since we now cause some stalls instead of emitting nops), but they're impressive enough that I'm happy with the change. total instructions in shared programs: 95669 -> 91275 (-4.59%) instructions in affected programs: 82590 -> 78196 (-5.32%)
* v3d: Add QPU pack/unpack for the new SFU instructions.Eric Anholt2018-07-234-0/+32
| | | | | These instructions allow writing the result to any register, instead of a special writeback to r4.
* v3d: Fix the name of the "flpop" operation.Eric Anholt2018-07-234-4/+5
| | | | | Noticed while trying to sort a new op into the appropriate place to match the documentation.
* v3d: Print the instruction we're testing in the QPU disasm/pack round-trip.Eric Anholt2018-07-231-2/+3
| | | | | If we fail initial disassembly, it's good to know what instruction it was that failed.
* broadcom/vc5: Add a QPU helper for instructions using the TLB.Eric Anholt2018-03-192-0/+23
| | | | This will be used for detecting last thread segment in register spilling.
* broadcom/vc5: Introduce v3d_qpu_reads_vpm()/v3d_qpu_writes_vpm().Eric Anholt2018-03-192-4/+35
| | | | | These helpers will be used in register spilling to determine where to add a last thrsw if needed, and might help refactor QPU scheduling.
* broadcom/vc5: The ldvpm signal also a case of using the VPM.Eric Anholt2018-03-191-0/+3
| | | | | The QPU scheduling code calling this function already separately checked this signal.
* broadcom/vc5: Extract v3d_qpu_writes_tmu() helper.Eric Anholt2018-03-192-0/+11
| | | | This will be reused in register spilling.
* broadcom/vc5: Update QPU instruction pack/unpack for v4.2.Eric Anholt2018-01-274-5/+9
| | | | | After the 4.1 spec, 4.2 retroactively renamed patchid to barrierid because it's used for other barriers in compute.
* broadcom/vc5: Add the new TMU write addresses for V3D 4.x (and r5rep).Eric Anholt2018-01-122-10/+37
| | | | | | The V3D 3.x series of TMU writes with meaning depending on the texture type is replaced with writes to specific registers for each texture argument semantic.
* broadcom/vc5: Add a test for .ifb in ADD ops.Eric Anholt2018-01-121-0/+1
| | | | | I had a .ifb being decoded weird in sampid, so this is to check that .ifb is fine.
* broadcom/vc5: Add the new tesselation opcodes in V3D 4.1.Eric Anholt2018-01-122-1/+5
|
* broadcom/vc5: Use the new LDVPM/STVPM opcodes on V3D 4.1.Eric Anholt2018-01-124-12/+110
| | | | | | | | | | | Now, instead of a magic write register for VPM stores we have an instruction to do them (which means no packing of other ALU ops into it), with the ability to reorder the VPM stores due to the offset being baked into the instruction. VPM loads also gain the ability to be reordered by packing the row into the A argument. They also no longer write to the r3 accumulator, and instead must be stored to a physical register.
* broadcom/vc5: Drop dead VC5_QPU_* defines from qpu_instr.c.Eric Anholt2018-01-121-80/+0
| | | | | I had all the packing code in this file at one point, but these defines now live in qpu_pack.c.
* broadcom/vc5: Add support for QPU pack/unpack/disasm of small immediates.Eric Anholt2018-01-124-1/+94
|
* broadcom/vc5: Drop signal bit #defines.Eric Anholt2018-01-122-8/+0
| | | | Signals are more complicated than that, and tables ended up being better.
* broadcom/vc5: Add support for V3Dv4 signal bits.Eric Anholt2018-01-125-24/+220
| | | | | | | The WRTMUC replaces the implicit uniform loads in the first two texture instructions. LDVPM disappears in favor of an ALU op. LDVARY, LDTMU, LDTLB, and LDUNIF*RF now write to arbitrary registers, which required passing the devinfo through to a few more functions.
* broadcom/vc5: Fix pack/unpack of vfmul input unpack flags.Eric Anholt2018-01-122-0/+40
|
* meson: Use consistent style for testsDylan Baker2018-01-111-4/+8
| | | | | | | Don't use intermediate variables, use consistent whitespace. Acked-by: Eric Engestrom <[email protected]> Signed-off-by: Dylan Baker <[email protected]>
* broadcom/vc5: Fix a typo in memcmp for sig unpack checking.Eric Anholt2017-12-141-1/+1
| | | | | | | This shockingly ended up working out, because only the first byte of *sig is used and (sizeof(*sig) != 0) == 1. Fixes a compiler warning. Link: https://bugs.freedesktop.org/show_bug.cgi?id=104183
* broadcom/vc5: Fix scheduling for a non-SFU R4 write after a dead R4 write.Eric Anholt2017-11-071-2/+28
| | | | | | | The v3d_qpu_writes_r*() were only checking for fixed-function accumulator writes, not normal ALU writes to those regs. Fixes fs-discard-exit-2 on simulation (but not HW).
* meson: Fix vc5 deps on the XML-generated headers.Eric Anholt2017-10-201-1/+1
| | | | | I typoed and was depending on v3d_xml.h (the gzipped xml)_, not on the v3d_packet_v33_pack.h that the compiler and QPU packing actually use.
* meson: Add support for the vc5 driver.Eric Anholt2017-10-171-0/+39
| | | | | | | v2: Default vc5 to off, since it requires the simulator currently. Add missing dep on the XML generation from libbroadcom_vc5. Reviewed-by: Dylan Baker <[email protected]> (v1)
* broadcom: Add V3D 3.3 QPU instruction pack, unpack, and disasm.Eric Anholt2017-10-108-0/+2746
Unlike VC4, I've defined an unpacked instruction format with pack/unpack functions to convert to 64-bit encoded instructions. This will let us incrementally put together our instructions and validate them in a more natural way than the QPU_GET_FIELD/QPU_SET_FIELD used to. The pack/unpack unfortuantely are written by hand. While I could define genxml for parts of it, there are many special cases (like operand order of commutative binops choosing which binop is being performed!) and it probably wouldn't come out much cleaner. The disasm unit test ensures that we have the same assembly format as Broadcom's internal tools, other than whitespace changes. v2: Fix automake variable redefinition complaints, add test to .gitignore