| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most of the relevant work happens in the v3d_nir_lower_io. Since
geometry shaders can write any number of output vertices, this pass
injects a few variables into the shader code to keep track of things
like the number of vertices emitted or the offsets into the VPM
of the current vertex output, etc. This is also where we handle
EmitVertex() and EmitPrimitive() intrinsics.
The geometry shader VPM output layout has a specific structure
with a 32-bit general header, then another 32-bit header slot for
each output vertex, and finally the actual vertex data.
When vertex shaders are paired with geometry shaders we also need
to consider the following:
- Only geometry shaders emit fixed function outputs.
- The coordinate shader used for the vertex stage during binning must
not drop varyings other than those used by transform feedback, since
these may be read by the binning GS.
v2:
- Use MAX3 instead of a chain of MAX2 (Alejandro).
- Make all loop variables unsigned in ntq_setup_gs_inputs (Alejandro)
- Update comment in IO owering so it includes the GS stage (Alejandro)
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
| |
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Until now this made sense because we always paired vertex shaders
with fragment shaders, but as soon as we implement geometry and
tessellation shaders that will no longer be the case, so rename
this to (num_)used_outputs.
v2: Use 'used_outputs' instead of ns_outputs, which is more explicit (Eric).
Reviewed-by: Alejandro Piñeiro <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
In common we can use implementation for Vulkan.
|
|
|
|
|
| |
We were missing a * 4 even if the particular hardware matched our
assumption.
|
|
|
|
|
| |
A shader invocation always executes 16 channels together, so we often end
up multiplying things by this magic 16 number. Give it a name.
|
|
|
|
|
|
|
| |
4.1 and 4.2 both have the same 16k limit, but it I'm seeing GPU hangs in
the CTS at 8k and 16k. 4k at least lets us get one 4k display working.
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
| |
Earlier commit addressed 7 of the 8 instances available.
v2: Rebase patch back to master (by anholt)
Cc: Carsten Haitzler (Rasterman) <[email protected]>
Cc: Eric Anholt <[email protected]>
Fixes: 300d3ae8b14 ("vc4: Declare the cpu pointers as being modified in NEON asm.")
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise, the compiler is free to reuse the register containing the input
for another call and assume that the value hasn't been modified. Fixes
crashes on texture upload/download with current gcc.
We now have to have a temporary for the cpu2 value, since outputs must be
lvalues.
(commit message by anholt)
Fixes: 4d30024238ef ("vc4: Use NEON to speed up utile loads on Pi2.")
|
|
|
|
|
|
|
| |
This makes the asm code more intelligible and clarifies the functional
change in the next commit.
(commit message and commit squashing by anholt)
|
|
|
|
|
|
| |
This is the GLES 3.2 minmax, and also what the closed source driver does.
Avoids hitting OOMs in the CTS's
dEQP-GLES3.functional.texture.units.all_units.only_cube.1.
|
|
|
|
|
| |
We don't want to pull the compiler into every include in the gallium
driver, so just make a new little header to store the limits.
|
|
|
|
|
|
|
|
|
| |
I've been using my apitrace-based shader-db so far, but it's slow
(apitrace decompression), intrusive (apitrace windows spamming the
screen), and doesn't have much coverage. The original shader-db provides
a lot more coverage and compiles faster, at the expense of not having the
actual runtime variant key. As v3d has a lot less runtime variation than
vc4 did, this tradeoff makes more sense.
|
|
|
|
|
|
| |
Now that V3D has 8 byte per pixel formats exposed, we've got stride==32
utiles to load and store. Just handle them through the non-NEON paths for
now.
|
|
|
|
|
| |
These implementations of whole-utile load/stores would be the same for
v3d, though the layouts of blocks of utiles has changed.
|
|
|
|
|
|
|
| |
This is needed to ensure that we don't get blocked waiting for VPM space
with bin/render overlapping.
Cc: "18.2" <[email protected]>
|
|
|
|
|
|
| |
A few of the upcoming changes would make the V3D_DEBUG=cl output less
readable, so let's make proper CLIF file production be under a separate
V3D_DEBUG=clif flag.
|
|
|
|
| |
The #define existed and was checked in the driver.
|
|
|
|
|
|
|
|
|
| |
There is a compile warning from Android 8 (API version 26) from "include cutils/log.h"
warning: "Deprecated: don't include cutils/log.h, use either android/log.h or log/log.h"-W#warnings,
Change to include "log/log.h" on Android 8 or later major version to avoid this warning
Signed-off-by: jenny.q.cao <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
| |
|
|
|
|
|
| |
This will be used by vc5 for prefixing functions and including the pack
header in v3d-version-dependent code, following the model of anv.
|
|
|
|
|
|
|
| |
Unlike vc4, where the compiler and gallium driver live together, for vc5
the compiler will live up in the shared broadcom directory, and need
access to the debug flags. Define a set of debug flags and helpers there,
so it can be shared between compiler, vc5, and vulkan.
|
|
This will be used by the VC5 driver and various shared VC4/VC5 tooling,
like the XML decoder.
|