| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can use the same register spilling infrastructure for our loads/stores
of indirect access of temp variables, instead of doing an if ladder.
Cuts 50% of instructions and max-temps from 2 KSP shaders in shader-db.
Also causes several other KSP shaders with large bodies and large loop
counts to not be force-unrolled.
The change was originally motivated by NOLTIS slightly modifying register
pressure in piglit temp mat4 array read/write tests, triggering register
allocation failures.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Our exec masking introduces lots of redundant flags updates, and even
without that there will be cases where NIR comparisons on the same sources
for different reasons may generate the same comparison instruction before
the selection.
total instructions in shared programs: 6492930 -> 6460934 (-0.49%)
total uniforms in shared programs: 2117460 -> 2115106 (-0.11%)
total spills in shared programs: 4983 -> 4987 (0.08%)
total fills in shared programs: 6408 -> 6416 (0.12%)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea is that for repeated use of the same uniform, we could avoid
loading it on each consumer. The results look pretty good.
total instructions in shared programs: 6413571 -> 6521464 (1.68%)
total threads in shared programs: 154214 -> 154000 (-0.14%)
total uniforms in shared programs: 2393604 -> 2119629 (-11.45%)
total spills in shared programs: 4960 -> 4984 (0.48%)
total fills in shared programs: 6350 -> 6418 (1.07%)
Once we do scheduling at the NIR level, the register pressure (and thus
also instructions) issues we see here will drop back down.
|
|
|
|
|
| |
We don't want to pull the compiler into every include in the gallium
driver, so just make a new little header to store the limits.
|
|
|
|
|
|
| |
This is only exposed on V3D 4.1+, because we didn't have the TMU write
operations for images on 3.3 (To do GLES 3.1 there, you have to lower it
to SSBO load/stores, which is a problem to solve later).
|
|
|
|
|
| |
These implementations of whole-utile load/stores would be the same for
v3d, though the layouts of blocks of utiles has changed.
|
|
|
|
|
|
|
|
|
| |
We can do one per instruction, and we have to be careful not to overwrite
raddr_b, but this greatly reduces the pressure on uniform loads
(particularly around ldvpm/stvpm instructions).
total instructions in shared programs: 90768 -> 88220 (-2.81%)
instructions in affected programs: 82711 -> 80163 (-3.08%)
|
|
|
|
|
|
| |
The XML ends up noisier if you're only looking at one version, but from
the diffstat there's obvious wins in terms of deduplication. This will
get even more significant if we ever support 3.2 or 4.0.
|
| |
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
|
| |
|
|
|
|
|
| |
V3D 4.x texturing changes enough that #ifdefs would just make a mess of
it.
|
|
|
|
|
| |
For V4.1 texturing, I need the V4.1 XML, so the main compiler needs to
stop including V3.3 XML.
|
|
|
|
|
| |
I want the library's entrypoints to still be unversioned, but the actual
packet dumping needs to be per-version.
|
| |
|
|
|
|
|
|
|
|
|
| |
The HW has no native sampler support for multisample textures, but since
we only need to support txf_ms and the layout is UIF, we just need to
scale up the texcoords and then add in the sample.
This drops the old TEXTURE_MSAA_ADDR special uniform, since we're treating
MSAA textures as textures, rather than basically texbos like VC4 had to.
|
|
|
|
|
|
|
|
|
|
|
| |
This is a pretty straightforward fork of VC4's NIR compiler to VC5. The
condition codes, registers, and I/O have all changed, making the backend
hard to share, though their heritage is still recognizable.
v2: Move to src/broadcom/compiler to match intel's layout, rename more
"vc5" to "v3d", rename QIR to VIR ("V3D IR") to avoid symbol conflicts
with vc4, use new v3d_debug header, add compiler init/free functions,
do texture swizzling in NIR to allow optimization.
|
|
|
|
|
|
|
|
| |
This will be usable with "VC5_DEBUG=cl" on the vc5 driver to stream a CLIF
file (the Broadcom equivalent of i965's AUB) to stderr. I haven't tested
that this is actually usable with the internal CLIF-consuming tools, but
is close enough as a baseline and is useful for visually inspecting the
command stream.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unlike VC4, I've defined an unpacked instruction format with pack/unpack
functions to convert to 64-bit encoded instructions. This will let us
incrementally put together our instructions and validate them in a more
natural way than the QPU_GET_FIELD/QPU_SET_FIELD used to.
The pack/unpack unfortuantely are written by hand. While I could define
genxml for parts of it, there are many special cases (like operand order
of commutative binops choosing which binop is being performed!) and it
probably wouldn't come out much cleaner.
The disasm unit test ensures that we have the same assembly format as
Broadcom's internal tools, other than whitespace changes.
v2: Fix automake variable redefinition complaints, add test to .gitignore
|
|
|
|
|
|
|
| |
Unlike vc4, where the compiler and gallium driver live together, for vc5
the compiler will live up in the shared broadcom directory, and need
access to the debug flags. Define a set of debug flags and helpers there,
so it can be shared between compiler, vc5, and vulkan.
|
|
|
|
|
| |
This will be used by the new vc5 gallium driver, and a future Vulkan
driver.
|
|
|
|
|
|
|
| |
This is copied from Intel's XML decoder, modified to handle V3D's
byte-oriented packets.
v2: Squash in robher's fixes for Android
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes `make distcheck`
> make[3]: *** No rule to make target 'common/v3d_devinfo.h', needed by 'distdir'. Stop.
> make[3]: Leaving directory '/home/local/mesa/src/broadcom'
> Makefile:945: recipe for target 'distdir' failed
> make[2]: Leaving directory '/home/local/mesa/src'
> make[2]: *** [distdir] Error 1
> make[1]: *** [distdir] Error 1
Fixes: 427bbbb99c ("broadcom: Introduce a header for talking about chip revisions.")
Cc: Emil Velikov <[email protected]>
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
This will be used by the VC5 driver and various shared VC4/VC5 tooling,
like the XML decoder.
|
|
I really liked this idea, as it should help with management of packet
parsing tools like the CL dump. The python script is forked off of theirs
because our packets are byte-based instead of dwords, and the changes to
do so while avoiding performance regressions due to unaligned accesses
were quite invasive.
v2: Fix Android.mk paths, drop shebang for python script, fix overlap
detection.
Acked-by: Jason Ekstrand <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
Tested-by: Rob Herring <[email protected]>
|