| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
The execute.file check used to be good enough, until I stopped setting up
the execute mask for uniform ifs.
No known tests fixed, noticed while doing a refactor.
Fixes: 080506057310 ("v3d: Handle dynamically uniform IF statements with uniform control flow.")
(cherry picked from commit 441294962cd65d44febdbe9ef0b0d99b5d27cec8)
|
|
|
|
|
|
|
|
|
|
|
|
| |
Apparently we need disable-EZ flagged, not just "does Z writes".
Fixes
dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo
on 7278, even though it passed in simulation.
Signed-off-by: Eric Anholt <[email protected]>
Fixes: 051a41d3d56e ("v3d: Add support for the early_fragment_tests flag.")
(cherry picked from commit cd5e0b272919a654079620adecd2abe24ff51233)
|
|
|
|
|
|
|
|
|
|
| |
instructions"
This reverts commit 378f9967710e9145f2a4f8eee89d87badbe0e6ea.
This also remove the default true argument from the a2xx nir backend,
which was introduced after this commit. There should be no change in
functionality.
|
|
|
|
|
|
|
|
|
| |
This was copy-and-paste fail, that oddly showed up in the CTS's
reinterprets of r32f, rgba8, and srgba8 to rgba8i, but not r32ui and r32i
to rgba8i or reinterprets to other signed int formats.
Fixes: 6281f26f064a ("v3d: Add support for shader_image_load_store.")
(cherry picked from commit ab4d5775b0decad7df56245cecad63912ed62b4c)
|
|
|
|
|
|
|
|
|
|
|
|
| |
Earlier commit addressed 7 of the 8 instances available.
v2: Rebase patch back to master (by anholt)
Cc: Carsten Haitzler (Rasterman) <[email protected]>
Cc: Eric Anholt <[email protected]>
Fixes: 300d3ae8b14 ("vc4: Declare the cpu pointers as being modified in NEON asm.")
Signed-off-by: Emil Velikov <[email protected]>
(cherry picked from commit 385843ac3ce1b868d9e24fcb2dbc0c8d5f5a7c99)
|
|
|
|
|
|
| |
Fixes: 6281f26f064ada36b57d45feb68d8e7d783198c9
("v3d: Add support for shader_image_load_store.")
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
| |
Noticed while looking at the gitlab-CI MR.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise, the compiler is free to reuse the register containing the input
for another call and assume that the value hasn't been modified. Fixes
crashes on texture upload/download with current gcc.
We now have to have a temporary for the cpu2 value, since outputs must be
lvalues.
(commit message by anholt)
Fixes: 4d30024238ef ("vc4: Use NEON to speed up utile loads on Pi2.")
|
|
|
|
|
|
|
| |
This makes the asm code more intelligible and clarifies the functional
change in the next commit.
(commit message and commit squashing by anholt)
|
|
|
|
|
|
|
|
|
|
|
|
| |
The sampler border color is encoded in the TMU's blending format (half
floats, 32-bit floats, or integers) and must be clamped to the format's
range unorm/snorm/int ranges by the driver. Additionally, the TMU doesn't
know about how we're abusing the swizzle to support BGRA, A, and LA, so we
have to pre-swizzle the border color for those.
We don't really want to spend half a kb on sampler states in most cases,
so skip generating the variants when the border color is unused or is
0,0,0,0.
|
| |
|
|
|
|
|
|
| |
This is the GLES 3.2 minmax, and also what the closed source driver does.
Avoids hitting OOMs in the CTS's
dEQP-GLES3.functional.texture.units.all_units.only_cube.1.
|
|
|
|
|
| |
We don't want to pull the compiler into every include in the gallium
driver, so just make a new little header to store the limits.
|
|
|
|
| |
We want one vector size per vector, not per component.
|
| |
|
|
|
|
|
| |
CS shared variables are handled effectively as SSBO access to a temporary
buffer that will be allocated at CS dispatch time.
|
|
|
|
|
|
| |
We get a payload for the ivec3 workgroup and an int local invocation
index, and we use the core lowering to turn into the global invocation id
and the local invocation id ivec3s.
|
|
|
|
|
|
| |
This is only exposed on V3D 4.1+, because we didn't have the TMU write
operations for images on 3.3 (To do GLES 3.1 there, you have to lower it
to SSBO load/stores, which is a problem to solve later).
|
|
|
|
|
| |
So far I assume that all the buffers get written. If they weren't, you'd
probably be using UBOs instead.
|
|
|
|
|
|
| |
We've been relying on linking splitting up our varying matrices into
separate vectors, but with SSO that doesn't happen. Supporting matrix
inputs isn't too hard, though.
|
|
|
|
|
|
| |
We need to pass the array index through our coordinate transform
unchanged. Fixes
dEQP-GLES31.functional.texture.multisample.samples_1.*_2d_array
|
|
|
|
|
| |
If this flag hasn't been set by the shader and it has some visible side
effects, then we need to disable EZ.
|
|
|
|
| |
This will be needed for SSBOs and image_load_store.
|
|
|
|
|
|
|
|
| |
V3D returns the texels in a different order in the resulting vec4 from
what GLSL wants, so we need to put in a swizzle. Fixes
dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.base_level.level_1
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Even without any clever optimization on the unpack operations, this gives
us a useful value for the channels read field, which we can use to avoid
ldtmu instructions to the no-op register.
instructions in affected programs: 890712 -> 881974 (-0.98%)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can pull a whole vector in a single indirect load. This saves a bunch
of round-trips to the TMU, instructions for setting up multiple loads,
references to the UBO base in the uniforms, and apparently manages to
reduce register pressure as well.
instructions in affected programs: 3086665 -> 2454967 (-20.47%)
uniforms in affected programs: 919581 -> 721039 (-21.59%)
threads in affected programs: 1710 -> 3420 (100.00%)
spills in affected programs: 596 -> 522 (-12.42%)
fills in affected programs: 680 -> 562 (-17.35%)
Improves 3dmmes performance by 2.29312% +/- 0.139825% (n=5)
|
|
|
|
|
|
|
| |
In the process of adding support for SSBOs and CS shared vars, I ended up
needing a helper function for doing TMU general ops. This helper can be
that starting point, and saves us a bunch of round-trips to the TMU by
loading a vector at a time.
|
|
|
|
| |
Moving things to NIR left this mess around. All we lower now is uniforms.
|
|
|
|
| |
I misplaced it in the rebase conflicts.
|
|
|
|
|
|
| |
Before, I had per-stage entryoints with some helpers shared between them.
As I extended for compute shaders and shader-db, it turned out that the
other common code in the middle wanted to be shared too.
|
|
|
|
|
|
|
|
|
| |
Loops will be trickier, since we need some analysis to figure out if the
breaks/continues inside are uniform. Until we get that in NIR, this gets
us some quick wins.
total instructions in shared programs: 6192844 -> 6174162 (-0.30%)
instructions in affected programs: 487781 -> 469099 (-3.83%)
|
|
|
|
|
| |
total instructions in shared programs: 6193810 -> 6192844 (-0.02%)
instructions in affected programs: 800373 -> 799407 (-0.12%)
|
|
|
|
|
| |
There could have been a write of a src in between the comparison and the
bcsel that would invalidate the comparison.
|
|
|
|
| |
This will be reused for if statements.
|
|
|
|
|
|
| |
I wanted to reuse the comparison stuff for nir_ifs, but for that I just
want the flags and no destination value. Splitting the conditions from
the destinations ended up cleaning the existing code up, anyway.
|
|
|
|
|
|
| |
We can just look at the MSF flags -- if they're unset, then we're
definitely in a helper invocation. Fixes
dEQP-GLES31.functional.shaders.helper_invocation.* with GLES3.1 enabled.
|
|
|
|
|
|
| |
Fixes failures in
dEQP-GLES31.functional.shaders.builtin_functions.texture_size.samples_1_texture_2d
in the GLES3.1 suite.
|
|
|
|
|
|
| |
Fixes
dEQP-GLES31.functional.texture.gather.offset_dynamic.min_required_offset.2d.rgba8.size_pot.clamp_to_edge_repeat
and others.
|
|
|
|
|
|
| |
This is what the GLSL ES 310 spec tells us to do, but apparently the
"gather mode" flag doesn't imply it in the HW. Fixes
dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.filter_mode.min_nearest_mipmap_linear_mag_linear
|
|
|
|
| |
Noticed while debugging a testcase.
|
|
|
|
|
|
|
|
|
| |
The greedy comparison folding in bcsel means that we may have left the
original bool-generating NIR ALU instruction dead, but DCE wasn't
eliminating the VIR code for it because of the flags updates.
total instructions in shared programs: 5186024 -> 5100894 (-1.64%)
instructions in affected programs: 1448695 -> 1363565 (-5.88%)
|
|
|
|
|
| |
This was just generated work for vir_opt_dead_code and cluttered up the
dumps.
|
|
|
|
| |
I wanted to reuse it for DCE of flags updates.
|
|
|
|
|
| |
It is just shifting probably-means-flags bits out of a value, it doesn't
actually update the flags on its own.
|
|
|
|
|
| |
This was for shader-db, but I haven't cared about NIR instruction counts
in a long time.
|
|
|
|
|
|
|
| |
This allows the original shader-db project's run.c runner to parse things
easily, and is probably a good thing to have for GL_ARB_debug_output in
general. I formatted it more like Intel's so I can mostly reuse their
report script.
|
|
|
|
|
|
|
|
|
| |
I've been using my apitrace-based shader-db so far, but it's slow
(apitrace decompression), intrusive (apitrace windows spamming the
screen), and doesn't have much coverage. The original shader-db provides
a lot more coverage and compiles faster, at the expense of not having the
actual runtime variant key. As v3d has a lot less runtime variation than
vc4 did, this tradeoff makes more sense.
|
|
|
|
| |
Fixes: 248a7fb392ba ("v3d: Do uniform pretty-printing in the QPU dump.")
|
|
|
|
| |
The shadow state is now in the sampler.
|
|
|
|
|
|
| |
Now that V3D has 8 byte per pixel formats exposed, we've got stride==32
utiles to load and store. Just handle them through the non-NEON paths for
now.
|