| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Suggested-by: Jason Ekstrand <[email protected]>
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SFU operations have a latency of 2 cicles, so if their results
are used in the following cycle to a SFU instruction, the GPU
stalls for an extra cycle until the result is available.
This adds the number of stalls to the shader-db debug mode and
sum of instruction + stalls to evaluate optimizations to schedule
instructions that avoid generating sfu-stalls.
v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric)
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
We already have code to print these signals but the early return in the code
that checks if any signals are present present was missing the checks for them,
so it would skip printing them unless they were paired with other signals.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shader-db results:
total instructions in shared programs: 9117550 -> 9102719 (-0.16%)
instructions in affected programs: 1752873 -> 1738042 (-0.85%)
helped: 7076
HURT: 478
helped stats (abs) min: 1 max: 22 x̄: 2.19 x̃: 2
helped stats (rel) min: 0.07% max: 13.89% x̄: 1.70% x̃: 1.07%
HURT stats (abs) min: 1 max: 7 x̄: 1.41 x̃: 1
HURT stats (rel) min: 0.09% max: 10.17% x̄: 0.86% x̃: 0.54%
95% mean confidence interval for instructions value: -2.00 -1.92
95% mean confidence interval for instructions %-change: -1.58% -1.50%
Instructions are helped.
total max-temps in shared programs: 1327774 -> 1327728 (<.01%)
max-temps in affected programs: 1025 -> 979 (-4.49%)
helped: 47
HURT: 2
helped stats (abs) min: 1 max: 2 x̄: 1.02 x̃: 1
helped stats (rel) min: 2.63% max: 20.00% x̄: 7.67% x̃: 5.26%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 4.17% max: 4.17% x̄: 4.17% x̃: 4.17%
95% mean confidence interval for max-temps value: -1.06 -0.82
95% mean confidence interval for max-temps %-change: -8.89% -5.49%
Max-temps are helped.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
One special case, `src/util/xmlpool/.gitignore` is not entirely deleted,
as `xmlpool.pot` still gets generated (eg. by `ninja xmlpool-pot`).
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
| |
Fixes some stalls in 3DMMES's main vertex shader.
total instructions in shared programs: 6280751 -> 6211270 (-1.11%)
instructions in affected programs: 2935050 -> 2865569 (-2.37%)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I had a single function for "does this do float input unpacking" with two
major flaws: It was missing the most common thing to try to copy propagate
a f32 input nunpack to (the VFPACK to an FP16 render target) along with
several other ALU ops, and also would try to propagate an f32 unpack into
a VFMUL which only does f16 unpacks.
instructions in affected programs: 659232 -> 655895 (-0.51%)
uniforms in affected programs: 132613 -> 135336 (2.05%)
and a couple of programs increase their thread counts.
The uniforms hit appears to be a pattern in generated code of doing (-a >=
a) comparisons, which when a is abs(b) can result in the abs instruction
being copy propagated once but not fully DCEed.
|
|
|
|
|
|
| |
Avoids a regression in
dEQP-GLES3.functional.shaders.derivate.fwidth.texture.* once we start
copy-propagating more input packs.
|
|
|
|
| |
We want to be able to copy propagate our texture unpacks into the vfpack.
|
|
|
|
|
| |
total instructions in shared programs: 6193810 -> 6192844 (-0.02%)
instructions in affected programs: 800373 -> 799407 (-0.12%)
|
|
|
|
| |
I wanted to reuse it for DCE of flags updates.
|
|
|
|
|
| |
If you're trying to trace what's going on in a QPU dump, this will
definitely help you find your way.
|
|
|
|
| |
Fixes: f2e41daac577 ("broadcom/vc5: Update QPU instruction pack/unpack for v4.2.")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Meson test has a concepts of suites, which allow tests to be grouped
together. This allows for a subtest of tests to be run only (say only
the tests for nir). A test can be added to more than one suite, but for
the most part I've only added a test to a single suite, though I've
added a compiler group that includes nir, glsl, and glcpp tests.
To use this you'll need to invoke meson test directly, instead of ninja
test (which always runs all targets). it can be invoked as:
`meson test -C builddir --suite $suitename` (meson test has addition
options that are pretty useful).
Tested-By: Gert Wollny <[email protected]>
Acked-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
| |
This instruction is used to ensure that TMU stores have been processed
before moving on. In particular, you need any TMU ops to be done by the
time the shader ends.
|
|
|
|
|
| |
Reported-by: Rob Clark <[email protected]>
Fixes: e92959c4e03c ("v3d: Pass the whole clif_dump structure to v3d_print_group().")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These instructions let us write directly to the phys regfile, instead of
just R4. That lets us avoid moving out of R4 to avoid conflicting with
other SFU results, and to avoid conflicting with thread switches.
There is still an extra instruction of latency, which is not represented
in the scheduler at the moment. If you use the result before it's ready,
the QPU will just stall, unlike the magic R4 mode where you'd read the
previous value. That means that the following shader-db results aren't
quite representative (since we now cause some stalls instead of emitting
nops), but they're impressive enough that I'm happy with the change.
total instructions in shared programs: 95669 -> 91275 (-4.59%)
instructions in affected programs: 82590 -> 78196 (-5.32%)
|
|
|
|
|
| |
These instructions allow writing the result to any register, instead of a
special writeback to r4.
|
|
|
|
|
| |
Noticed while trying to sort a new op into the appropriate place to match
the documentation.
|
|
|
|
|
| |
If we fail initial disassembly, it's good to know what instruction it was
that failed.
|
|
|
|
| |
This will be used for detecting last thread segment in register spilling.
|
|
|
|
|
| |
These helpers will be used in register spilling to determine where to add
a last thrsw if needed, and might help refactor QPU scheduling.
|
|
|
|
|
| |
The QPU scheduling code calling this function already separately checked
this signal.
|
|
|
|
| |
This will be reused in register spilling.
|
|
|
|
|
| |
After the 4.1 spec, 4.2 retroactively renamed patchid to barrierid because
it's used for other barriers in compute.
|
|
|
|
|
|
| |
The V3D 3.x series of TMU writes with meaning depending on the texture
type is replaced with writes to specific registers for each texture
argument semantic.
|
|
|
|
|
| |
I had a .ifb being decoded weird in sampid, so this is to check that .ifb
is fine.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Now, instead of a magic write register for VPM stores we have an
instruction to do them (which means no packing of other ALU ops into it),
with the ability to reorder the VPM stores due to the offset being baked
into the instruction.
VPM loads also gain the ability to be reordered by packing the row into
the A argument. They also no longer write to the r3 accumulator, and
instead must be stored to a physical register.
|
|
|
|
|
| |
I had all the packing code in this file at one point, but these defines
now live in qpu_pack.c.
|
| |
|
|
|
|
| |
Signals are more complicated than that, and tables ended up being better.
|
|
|
|
|
|
|
| |
The WRTMUC replaces the implicit uniform loads in the first two texture
instructions. LDVPM disappears in favor of an ALU op. LDVARY, LDTMU,
LDTLB, and LDUNIF*RF now write to arbitrary registers, which required
passing the devinfo through to a few more functions.
|
| |
|
|
|
|
|
|
|
| |
Don't use intermediate variables, use consistent whitespace.
Acked-by: Eric Engestrom <[email protected]>
Signed-off-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
| |
This shockingly ended up working out, because only the first byte of *sig
is used and (sizeof(*sig) != 0) == 1. Fixes a compiler warning.
Link: https://bugs.freedesktop.org/show_bug.cgi?id=104183
|
|
|
|
|
|
|
| |
The v3d_qpu_writes_r*() were only checking for fixed-function accumulator
writes, not normal ALU writes to those regs.
Fixes fs-discard-exit-2 on simulation (but not HW).
|
|
|
|
|
| |
I typoed and was depending on v3d_xml.h (the gzipped xml)_, not on the
v3d_packet_v33_pack.h that the compiler and QPU packing actually use.
|
|
|
|
|
|
|
| |
v2: Default vc5 to off, since it requires the simulator currently. Add
missing dep on the XML generation from libbroadcom_vc5.
Reviewed-by: Dylan Baker <[email protected]> (v1)
|
|
Unlike VC4, I've defined an unpacked instruction format with pack/unpack
functions to convert to 64-bit encoded instructions. This will let us
incrementally put together our instructions and validate them in a more
natural way than the QPU_GET_FIELD/QPU_SET_FIELD used to.
The pack/unpack unfortuantely are written by hand. While I could define
genxml for parts of it, there are many special cases (like operand order
of commutative binops choosing which binop is being performed!) and it
probably wouldn't come out much cleaner.
The disasm unit test ensures that we have the same assembly format as
Broadcom's internal tools, other than whitespace changes.
v2: Fix automake variable redefinition complaints, add test to .gitignore
|