| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
If an instruction using address register value gets eliminated, we need
to remove it from the indirects list, otherwise it causes mayhem in
sched for scheduling address register usage.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A handful of fixes and cleanups:
1) If we split addr/pred, we need the newly created instruction to
end up in the unscheduled_list
2) Avoid scheduling a write to the address register if there is no
instruction using the address register that is otherwise ready
to schedule. Note that I currently don't bother with the same
logic for predicate register, since the only instructions using
predicate (br/kill) don't take any other src registers, so this
situation should not arise.
3) few other cosmetic cleanups
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
cp would update instr->address but not update the indirects array
resulting in sched getting confused when it had to 'spill' the address
register. Add an ir3_instr_set_address() helper to set instr->address
and also update ir->indirects, and update all places that were writing
instr->address to use helper instead.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We need to distinguish a shader that has separate writes to each MRT
from one which is supposed to write the data from MRT 0 to all the MRTs.
In TGSI this is done with a property. NIR doesn't have that, so encode
it as a funny location and decode on the other end.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It is silly to traverse back to find first instruction that writes part
of a larger "virtual" register many times per instruction (plus per use
as a src to later instructions). Cache this information so we only
figure it out once.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The fanin source could be grouped, for example with shaders like:
VERT
DCL IN[0]
DCL IN[1]
DCL OUT[0], POSITION
DCL OUT[1], GENERIC[9]
DCL SAMP[0]
DCL SVIEW[0], 2D, FLOAT
DCL TEMP[0], LOCAL
0: MOV TEMP[0].xy, IN[1].xyyy
1: MOV TEMP[0].w, IN[1].wwww
2: TXF TEMP[0], TEMP[0], SAMP[0], 2D
3: MOV OUT[1], TEMP[0]
4: MOV OUT[0], IN[0]
5: END
The second arg to the isaml is IN[1].w, so we need to look at the fanin
source to get the correct offset.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Split out most of dump_info() from ir3_cmdline compiler into a function
that can be used both by cmdline compiler and also for the disasm debug
option. This way, for FD_MESA_DEBUG=disasm we also get to see intput/
output registers, etc.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Some piglit tests, like arb_fragment_program-sparse-samplers, result in
having a null samp#0 but valid samp#1.
TODO: a3xx probably needs similar fix
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
We can't rely on what we get from the assembler if we have indirect
addressing of constant file, since the assembler doesn't know the array
index. This got lost in the transition to NIR.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For query_levels, we generate a getinfo with writemask of (z), which RA
will consider as size==3. But we were still generating four fanouts.
Which meant that RA would see it as two different register classes,
depending on the path to definer. Ie. on the getinfo instruction itself
it would see size==3, but when chasing back through the fanouts it would
see size==4.
Easiest way to solve that is to just generate the chain of neighboring
fanouts to have the correct size in the first place.
Note: we may eventually want split_dest() to take start/end or wrmask
instead, since really we only need size==1. But RA is not clever enough
for that, query_levels is not that common, and the other two registers
that get allocated are never used so those register slots can be
immediately re-used. So bunch of work for probably no real gain.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Seems like a4xx gets this right.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
We get this information from NIR (which gets it from sview decl in tgsi
when translating from tgsi), so no need to maintain shader variants for
this.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This shuffles things around to allow the shader to have multiple basic
blocks. We drop the entire CFG structure from nir and just preserve the
blocks. At scheduling we know whether to schedule conditional branches
or unconditional jumps at the end of the block based on the # of block
successors. (Dropping jumps to the following instruction, etc.)
One slight complication is that variables (load_var/store_var, ie.
arrays) are not in SSA form, so we have to figure out where to put the
phi's ourself. For this, we use the predecessor set information from
nir_block. (We could perhaps use NIR's dominance frontier information
to help with this?)
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Without this, negative branch/jump offsets look like very large positive
offsets.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
These belong in the shader, rather than the block. Mostly a lot of
churn and nothing too interesting. But splitting this out from the
rest of ir3_block reshuffling to cut down the noise in the later
patch.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Right now, just provides a cleaner way to get at the gpu-id, given the
separation between compiler and context. But we will need this also to
hold the reg-set for new register allocation.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
No longer used, or even possible, with NIR frontend.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Also remove ir3_flatten which was only used by tgsi f/e.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Use a more standard priority-queue based scheduling algo. It is simpler
and will make things easier once we have multiple basic blocks and flow
control.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Use standard list_head double-linked list and related iterators,
helpers, etc, rather than weird combo of instruction array and next
pointers depending on stage. Now block has an instrs_list. In
certain stages where we want to remove and re-add to the blocks list
we just use list_replace() to copy the list to a new list_head.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
At least for now.. right now the instruction and instruction list
printing should suffice, and the re-working of ir3_block would require
a lot of changes in that code.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Use ir3_MOV() builder in a couple of spots, rather than open-coding the
instruction construction. Also add ir3_NOP() builder and use that
instead of open coding.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than forcing everyone to provide their own definition of the symbol
provide a common (dummy) one.
This helps us resolve the build of the standalone pipe-drivers (amongst
others), which are missing the symbol.
Cc: Rob Clark <[email protected]>
Cc: "10.6" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
| |
Just like every other place in gallium.
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
| |
Cc: Rob Clark <[email protected]>
Cc: "10.6" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
| |
Softpipe, llvmpipe, r300g, and radeonsi pass tests. Other drivers need testing.
Freedreno and nv30 are definitely broken. Other drivers seem to be alright.
|
|
|
|
|
|
|
|
| |
Fixes non-determinism in bin/point-sprite rendering, and the stars on
the intro screen to neverball.
Cc: "10.6" <[email protected]>
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
get_immediate will return a const reference, the requested immediate
isn't necessarily in the x slot. Make sure to use the swizzle.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we used intrinsic->const_index[1] to represent "the number of
array elements to load" for load/store intrinsics. However, this set to 1
by every pass that ever creates a load/store intrinsic. Also, while it
might make some sense for registers, it makes no sense whatsoever in SSA.
On top of that, the i965 backend was the only backend to ever support it;
freedreno and vc4 just assert that it's always 1. Let's just delete it.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
It's a remnant of some old NV extension. Unused.
I also have a patch that removes predicates if anyone is interested.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
| |
A fence can outlive the ctx, so we shouldn't deref the ctx to get at the
screen. We need some updates in libdrm_freedreno API to completely
handle fences properly, but this is at least an improvement.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This was causing corruption with hw binning on a306. Unlikely that it
is a306 specific, but rather the smaller gmem size resulted in different
tile configuration which was triggering the bug at certain resolutions.
Signed-off-by: Rob Clark <[email protected]>
Cc: "10.4" and "10.5" and "10.6" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Whitelist adreno 306 (as found in msm8916/apq8016). Works pretty much
out of the box, although the smaller GMEM size requires more tiles to
fit 1920x1080, so bump up the max # of tiles as well.
Since it is just whitelist + trivial change, it makes sense to land on
all the active release branches.
Note that a305c ends up with gpu-id "306", hence a306 ends up with
gpu-id of "307". Apparently that is what happens when you let the
marketing dept name things.
Cc: "10.4" and "10.5" and "10.6" <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Our lower if/else pass was missed when converting NIR to use linked
lists rather than hashsets to track use/def sets.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The linked list in gallium is pretty much the kernel list and we would like
to have a C-based linked list for all of mesa. Let's not duplicate and
just steal the gallium one.
Acked-by: Connor Abbott <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
GL_AMD_performance_monitor must return an error when a monitoring
session cannot be started.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Martin Peres <[email protected]>
|
|
|
|
|
|
|
|
| |
This allows queries to return different numeric types.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Martin Peres <[email protected]>
|
|
|
|
|
|
|
|
| |
When there is a colormask active that does not cover all the channels,
enable reading in the destination like with a combining blend
operation. This fixes fbo-blending-formats on a3xx.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Enables ARB_depth_buffer_float. There is no sampling support for
interleaved Z32F_S8, so we store the two textures separately, one as
Z32F, the other as S8. As a result, we need a lot of additional logic
for restores and transfers.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
32-bit depth buffers are stored as unorm, and thus need special handling
when moving to and from gmem. They are copied into gmem by writing
depth, and resolved from gmem using a special resolve bit which
apparently float-ifies the data.
Signed-off-by: Ilia Mirkin <[email protected]>
|