| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
There were two bugs working together to make things mostly work: I wasn't
dividing the VPM output size available by the size of a batch (vertex),
but I also had the size of the VPM reduced by a factor of 8.
Fixes dEQP-GLES3.functional.vertex_array_objects.all_attributes and it
seems also my intermittent varying failures.
Fixes: 1561e4984eb0 ("v3d: Emit the VCM_CACHE_SIZE packet.")
|
|
|
|
|
|
|
| |
This is needed to ensure that we don't get blocked waiting for VPM space
with bin/render overlapping.
Cc: "18.2" <[email protected]>
|
|
|
|
|
|
| |
Fixes bad rendering when forcing 2 spills in glxgears.
Cc: "18.2" <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Found when debugging register spilling -- we would try to spill the dest
of a STVPMV, inserting spill code after entering the last segment. In
fact, we were likely to to choose to do this, given that the STVPMV "dest"
temp was never read from, making it cheap to spill.
Cc: "18.2" <[email protected]>
|
|
|
|
|
|
|
|
| |
The simulator complained that we had write responses outstanding at shader
end. It seems that a TMU read does not guarantee that previous TMU writes
by the thread have completed, which surprised me.
Cc: "18.2" <[email protected]>
|
|
|
|
|
|
|
| |
Found while forcing some spilling, which creates a lot of short
tmua->thrsw->ldtmu sequences.
Cc: "18.2" <[email protected]>
|
|
|
|
|
|
| |
This is useful for periodically testing out register spilling to see how
it goes on simple shaders, rather than only failing on insanely
complicated ones.
|
|
|
|
|
|
| |
This instruction is used to ensure that TMU stores have been processed
before moving on. In particular, you need any TMU ops to be done by the
time the shader ends.
|
|
|
|
|
| |
Reported-by: Rob Clark <[email protected]>
Fixes: e92959c4e03c ("v3d: Pass the whole clif_dump structure to v3d_print_group().")
|
|
|
|
|
|
|
| |
This is controlled by a new nir_shader_compiler_options flag, and fixes
dEQP-GLES3.functional.shaders.builtin_variable.pointcoord on V3D.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We can do one per instruction, and we have to be careful not to overwrite
raddr_b, but this greatly reduces the pressure on uniform loads
(particularly around ldvpm/stvpm instructions).
total instructions in shared programs: 90768 -> 88220 (-2.81%)
instructions in affected programs: 82711 -> 80163 (-3.08%)
|
|
|
|
|
|
| |
Sometimes when iterating over sources, we might want to check if it's the
implicit one. We wouldn't want to match on a non-implicit src using this
function.
|
|
|
|
|
|
| |
shader-db:
total instructions in shared programs: 91275 -> 90768 (-0.56%)
instructions in affected programs: 20702 -> 20195 (-2.45%)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These instructions let us write directly to the phys regfile, instead of
just R4. That lets us avoid moving out of R4 to avoid conflicting with
other SFU results, and to avoid conflicting with thread switches.
There is still an extra instruction of latency, which is not represented
in the scheduler at the moment. If you use the result before it's ready,
the QPU will just stall, unlike the magic R4 mode where you'd read the
previous value. That means that the following shader-db results aren't
quite representative (since we now cause some stalls instead of emitting
nops), but they're impressive enough that I'm happy with the change.
total instructions in shared programs: 95669 -> 91275 (-4.59%)
instructions in affected programs: 82590 -> 78196 (-5.32%)
|
|
|
|
|
| |
Noticed while trying to sort a new op into the appropriate place to match
the documentation.
|
|
|
|
| |
We lower saturates in NIR.
|
|
|
|
|
|
|
|
|
|
|
| |
Similarly to VC4's implementation, by not picking r0 immediately upon
freeing it, we give the scheduler more of a chance to fit later writes in
earlier. I'm not clear on whether there's any real cost to picking phys
over accumulators, so keep that behavior for now.
shader-db:
total instructions in shared programs: 96831 -> 95669 (-1.20%)
instructions in affected programs: 77254 -> 76092 (-1.50%)
|
|
|
|
|
|
|
|
|
| |
This restriction existed in V3D 2.x, but lifting it was a major change in
3.x.
shader-db results:
total instructions in shared programs: 98117 -> 96831 (-1.31%)
instructions in affected programs: 48520 -> 47234 (-2.65%)
|
|
|
|
|
| |
I keep having to ignore these shader-db changes since I don't trust them,
so just disable the reports entirely.
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 98578 -> 98119 (-0.47%)
instructions in affected programs: 27571 -> 27112 (-1.66%)
and it also eliminates most spills/fills on the CTS's randomized uniform
usage testcases.
|
|
|
|
| |
The docs had an update noting this restriction, so reflect it in the code.
|
|
|
|
|
| |
This doesn't affect us yet since we're not doing TMUWTs, but I think we
will for GLES 3.1.
|
|
|
|
|
| |
Fixes a bunch of piglit interpolation tests, and reduces my concern about
some MSAA blit shaders with noperspective varyings.
|
|
|
|
| |
Fixes piglit ext_framebuffer_multisample-draw-buffers-alpha-to-one
|
|
|
|
|
| |
We don't actually set the two flags together, but I want to use the
r/g/b/a reordered fields in the next commit.
|
|
|
|
|
|
| |
There's a convenient "FTOC" instruction for generating the coverage now,
unlike vc4. This fixes
dEQP-GLES3.functional.multisample.fbo_4_samples.proportionality_alpha_to_coverage
|
|
|
|
|
|
| |
Fixes simulator assertion failures in
dEQP-GLES3.functional.shaders.texture_functions.texture.samplercubeshadow_bias_fragment
and similar complicated cases.
|
|
|
|
|
|
|
|
| |
The docs called this field "uses both center W and centroid W", but
actually it's "do you need center W even if varyings don't obviously call
for it?"
Fixes dEQP-GLES3.functional.shaders.builtin_variable.fragcoord_w
|
|
|
|
| |
Fixes dEQP-GLES3.functional.fragment_out.random.26 and 6 others.
|
| |
|
|
|
|
|
| |
Fixes segfaults and undefined behavior in
dEQP-GLES3.functional.fragment_out.basic.fixed.srgb8_alpha8_lowp_float
|
|
|
|
|
|
|
|
|
|
| |
These together get the GLSL 3.00 unorm/snorm pack functions and
MESA_shader_integer operations working.
v2: Fix commit message typo.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It would be nice to share the flags packet emit logic with flat shade
flags, but I couldn't come up with a good way while still using our pack
macros. We need to refactor this to shader record setup at compile time,
anyway.
Fixes ext_framebuffer_multisample-interpolation * centroid-*
|
|
|
|
| |
We don't use ldunifa yet, but we will eventually for UBOs.
|
|
|
|
| |
We don't use TMUWT yet, but we will once we do SSBOs.
|
|
|
|
|
|
|
| |
The next shader gets to start writing the register file during these
slots, so make sure we don't stomp over them.
The only case of hitting this that I could imagine would be dead writes.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
util_is_power_of_two_or_zero
The new name make the zero-input behavior more obvious. The next
patch adds a new function with different zero-input behavior.
Signed-off-by: Ian Romanick <[email protected]>
Suggested-by: Matt Turner <[email protected]>
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
|
| |
In the absence of a general NIR or VIR-level scheduler, this at least
avoids spilling in
GTF-GLES3.gtf.GL3Tests.uniform_buffer_object.uniform_buffer_object_storage_layouts
|
|
|
|
| |
Just like TLB without a config uniform, we don't have a register index.
|
|
|
|
|
| |
Fixes failure in
GTF-GLES3.gtf.GL3Tests.draw_instanced.draw_instanced_attrib_size
|
|
|
|
|
|
| |
Our backend needs some sort of vertex position value to emit the scaled
viewport values and such. Fixes potential segfaults in
KHR-GLES3.copy_tex_image_conversions.required.cubemap_negx_cubemap_negx
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately TGSI doesn't record the type of the FS output like GLSL
does, but VC5's TLB writes depend on the output's base type. Just record
the type in the key at variant compile time when we've got a TGSI input
and then fix it up.
Fixes KHR-GLES3.packed_pixels.pbo_rectangle.rgba32i/ui and apparently a
GPU hang that breaks most tests that come after it.
|
|
|
|
|
| |
As you're debugging register allocation, you may have changed the
intervals and not recomputed yet. Just skip the dump in that case.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Our register spilling support is nice to have since vc4 couldn't at all,
but we're still very restricted due to needing to not spill during a TMU
operation, or during the last segment of the program (which would be nice
to spill a value of, when there's a long-lived value being passed through
with little modification from the start to the end).
We could do better by emitting unspills for the last-segment values just
before the last thrsw, since the last segment is probably not the maximum
interference area.
Fixes GTF uniform_buffer_object_arrays_of_all_valid_basic_types and 3
others.
|
|
|
|
| |
The point was to get the MOV, which the MOV_dest already returned.
|
|
|
|
| |
This is nice for debugging when you've made a bad instruction.
|
|
|
|
|
| |
This will let me do lowering late in compilation using the same
instruction builder as we use in nir_to_vir.
|
|
|
|
| |
Anywhere we want to multiply, we probably want this.
|