| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
It would be nice to share the flags packet emit logic with flat shade
flags, but I couldn't come up with a good way while still using our pack
macros. We need to refactor this to shader record setup at compile time,
anyway.
Fixes ext_framebuffer_multisample-interpolation * centroid-*
|
|
|
|
| |
We don't use ldunifa yet, but we will eventually for UBOs.
|
|
|
|
| |
We don't use TMUWT yet, but we will once we do SSBOs.
|
|
|
|
|
|
|
| |
The next shader gets to start writing the register file during these
slots, so make sure we don't stomp over them.
The only case of hitting this that I could imagine would be dead writes.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
util_is_power_of_two_or_zero
The new name make the zero-input behavior more obvious. The next
patch adds a new function with different zero-input behavior.
Signed-off-by: Ian Romanick <[email protected]>
Suggested-by: Matt Turner <[email protected]>
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
|
| |
In the absence of a general NIR or VIR-level scheduler, this at least
avoids spilling in
GTF-GLES3.gtf.GL3Tests.uniform_buffer_object.uniform_buffer_object_storage_layouts
|
|
|
|
| |
Just like TLB without a config uniform, we don't have a register index.
|
|
|
|
|
| |
Fixes failure in
GTF-GLES3.gtf.GL3Tests.draw_instanced.draw_instanced_attrib_size
|
|
|
|
|
|
| |
Our backend needs some sort of vertex position value to emit the scaled
viewport values and such. Fixes potential segfaults in
KHR-GLES3.copy_tex_image_conversions.required.cubemap_negx_cubemap_negx
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately TGSI doesn't record the type of the FS output like GLSL
does, but VC5's TLB writes depend on the output's base type. Just record
the type in the key at variant compile time when we've got a TGSI input
and then fix it up.
Fixes KHR-GLES3.packed_pixels.pbo_rectangle.rgba32i/ui and apparently a
GPU hang that breaks most tests that come after it.
|
|
|
|
|
| |
As you're debugging register allocation, you may have changed the
intervals and not recomputed yet. Just skip the dump in that case.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Our register spilling support is nice to have since vc4 couldn't at all,
but we're still very restricted due to needing to not spill during a TMU
operation, or during the last segment of the program (which would be nice
to spill a value of, when there's a long-lived value being passed through
with little modification from the start to the end).
We could do better by emitting unspills for the last-segment values just
before the last thrsw, since the last segment is probably not the maximum
interference area.
Fixes GTF uniform_buffer_object_arrays_of_all_valid_basic_types and 3
others.
|
|
|
|
| |
The point was to get the MOV, which the MOV_dest already returned.
|
|
|
|
| |
This is nice for debugging when you've made a bad instruction.
|
|
|
|
|
| |
This will let me do lowering late in compilation using the same
instruction builder as we use in nir_to_vir.
|
|
|
|
| |
Anywhere we want to multiply, we probably want this.
|
| |
|
|
|
|
| |
Otherwise our start/ends ips won't line up with the actual instructions.
|
|
|
|
| |
This will be reused in register spilling.
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
Obviously it would be good to have an ADD and a MUL and a signal together,
but we can even potentially have multiple signals merged, as well.
total instructions in shared programs: 100423 -> 97874 (-2.54%)
instructions in affected programs: 78812 -> 76263 (-3.23%)
|
|
|
|
|
|
|
|
| |
We emit some MOVs to track lifetimes of payload registers, but we don't
need there to be actual MOV instructions for them.
total instructions in shared programs: 101045 -> 100423 (-0.62%)
instructions in affected programs: 37083 -> 36461 (-1.68%)
|
|
|
|
| |
I must have misplaced it in the instruction packing rework.
|
|
|
|
| |
We don't have a src1 to look up if the compare instruction is "i2b".
|
|
|
|
|
|
|
| |
This will be used for freedreno and vc4 which require all inputs
and outputs to be copied to temps.
Reviewed-by: Marek Olšák <[email protected]>
|
| |
|
|
|
|
|
| |
Prevents potential infinite loops when a non-dispatched or discarded
channel never triggers the loop break condition.
|
|
|
|
|
| |
I think this should be equivalent other than power, and it's the kind of
comparison we use for nir_op_ieq.
|
|
|
|
| |
I was trying to do a NULL-destination UF, and it got removed.
|
|
|
|
|
|
|
| |
The LDVARY signal now writes an arbitrary register, so I took out the
magic src register file and replaced it with an instruction with LDVARY
set so we have somewhere to hang a QFILE_TEMP destination for register
allocation.
|
| |
|
|
|
|
|
| |
V3D 4.x texturing changes enough that #ifdefs would just make a mess of
it.
|
|
|
|
|
| |
For V4.1 texturing, I need the V4.1 XML, so the main compiler needs to
stop including V3.3 XML.
|
|
|
|
|
| |
This is a major performance boost on all of V3D, but is required on V3D
4.x where shaders are always either 2- or 4-threaded.
|
|
|
|
|
|
|
|
|
|
| |
This fills in the delay slots of thread end as much as we can (other than
being cautious about potential TLBZ writes).
In the process, I moved the thread end THRSW instruction creation to the
scheduler. Once we start emitting THRSWs in the shader, we need to
schedule the thread-end one differently from other THRSWs, so having it in
there makes that easy.
|
|
|
|
| |
Apparently the VPM writes need to be flushed out before we end the shader.
|
|
|
|
|
| |
This is needed for LDVPM on V3D 4.x, but will also be needed for keeping
values out of the accumulators across THRSW.
|
|
|
|
|
|
|
|
|
|
|
| |
Now, instead of a magic write register for VPM stores we have an
instruction to do them (which means no packing of other ALU ops into it),
with the ability to reorder the VPM stores due to the offset being baked
into the instruction.
VPM loads also gain the ability to be reordered by packing the row into
the A argument. They also no longer write to the r3 accumulator, and
instead must be stored to a physical register.
|
|
|
|
|
|
|
| |
The WRTMUC replaces the implicit uniform loads in the first two texture
instructions. LDVPM disappears in favor of an ALU op. LDVARY, LDTMU,
LDTLB, and LDUNIF*RF now write to arbitrary registers, which required
passing the devinfo through to a few more functions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This creates two new internal dependencies, idep_nir_headers and
idep_nir. The former encapsulates the generation of nir_opcodes.h and
nir_builder_opcodes.h and adding src/compiler/nir as an include path.
This ensures that any target that needs nir headers will have the
includes and that the generated headers will be generated before the
target is build. The second, idep_nir, includes the first and
additionally links to libnir.
This is intended to make it easier to avoid race conditions in the build
when using nir, since the number of consumers for libnir and it's
headers are quite high.
Acked-by: Eric Engestrom <[email protected]>
Signed-off-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
| |
I want to do the SETMSF.IFA to discard only if execute == 0 and cond, so
our dest of the PUSHZ needs to be nonzero if execute or !cond are nonzero.
Fixes dEQP-GLES3.functional.shaders.discard.dynamic_loop_dynamic.
|
|
|
|
|
| |
Fixes a simulator assertion failure on
dEQP-GLES3.functional.fragment_out.array.fixed.r8_highp_float.
|
|
|
|
|
|
|
|
|
|
| |
This means that with no flatshading we'll emit the single-byte
ZERO_ALL_FLAT_SHADE_FLAGS, and otherwise emit a set of FLAT_SHADE_FLAGS to
get all the bits we need set.
There's a _SET enum in the packet we could use to possibly set entire
ranges of the bitfield without using another packet, but this at least
fixes the conformance failure.
|
|
|
|
|
|
|
|
| |
In updating the simulator, behavior changed slightly so that our old code
wasn't getting glxgears's flatshading interpolated right. Emit flat
shading code just like we would for a normal flat-shaded varying, by
passing a flag in the shader key for glShadeModel(GL_FLAT) state and
customizing the color inputs based on that.
|
|
|
|
|
|
|
|
| |
The compiler decides how many LDTMUs we're going to emit, and that must
match the P1 flags. This brings the return channel counting to a single
place (so all that's passed into the compiler is "how many return channels
you may request from this texture's format), and was a necessary step for
shadow samplers once we stop using OVRTMUOUT=0.
|
|
|
|
|
|
|
|
| |
Fixes almost all of piglit's arb_shader_texture_lod grad tests, except for
the base -texgrad/texgradcube ones which fail on what appear to be
precision problems.
Reviewed-by: Ian Romanick <[email protected]>
|
| |
|
|
|
|
|
|
|
| |
The HW doesn't add the base level anywhere (the min/max lod clamping is
what does base level), so we need to add it manually in this case.
Fixes piglit tex-miplevel-selection *Lod 2D.
|
|
|
|
|
| |
This should fix some GPU hangs in our (currently always single-threaded)
fragment shaders, and definitely fixes assertion failures in simulation.
|