| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
tgsi_to_nir emits things with arrays as global vars.. and nir->ir3 does
lower_locals_to_regs. But nothing was lowering global to local, which
breaks compiling tgsi shaders
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Add helpers to get the number of src/dest components for an intrinsic,
and update spots that were open-coding this logic to use the helpers
instead.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Since we were MARK flag for both preventing loops, and tracking whether
instructions were used, we could end up in an infinite loop due to
bd2ca2bcdd. Instead invert the logic.. mark all instructions UNUSED
up front and clear the flag as we visit them.
Fixes: bd2ca2bcdd freedreno/ir3: eliminate unused false-deps
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Buffers can be large, so we probably don't want to make them all 32x
bigger. But they can't be rendered to (at least in GL) so we don't
need this workaround to prevent page faults on mem<->gmem.
Cc: "18.0" <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We could alternatively fall back to using "old style" draw's for
mem<->gmem (ie. what <= a4xx do) when height is not aligned to 32,
but that is somewhat more work (and not really something that could
be applied to stable)
Cc: "18.0" <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes an issue that became possible when we started lowering phi webs to
regs (a7ea2b4e) (although was not really seen until we also switched to
using peephole select pass (ec8bc54a) instead of lowering *all* if/else
to select).
If texture coord (or anything else that uses create_collect() to collect
scalar values in a sequence of scalar registers) was consuming a value
produced on either side of an if/else (ie. a phi lowered to nir reg,
which in ir3 is an "array" of length 1) then register allocation would
happen incorrectly and we'd end up sampling from garbage coordinates.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Some instructions require src/dst to be in full or half precision
register depending on src/dst type. So do a better job of propagating
register type.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
We'll also need to be able to create a half-precision immediate. So
re-work create_immed(). Prep work for following patch.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Prep work for following patch.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously false-dependencies would get flagged as used, even if the
only "use" was a false dep to (for example) prevent a load from being
scheduled after a store.
In addition to being pointless instructions, in some cases they can
cause problems. For example, ldg (and similar instructions) depend on
an immed arg getting CP'd into the instruction, but this doesn't happen
if an instruction is otherwise unused. Which can result in undefined
results (overwriting unintended registers).
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Avoids a misleading "INVALID FLAGS" warning in debug builds.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
This is also useful to see if optmsgs are enabled.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Use DOT2ADDv instruction with 0.0f constant add.
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Extend translate_sge_slt to emit these, in analogous fashion
but using CNDEv.
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for:
- PIPE_FORMAT_ETC1_RGB8
- PIPE_FORMAT_DXT1_RGB
- PIPE_FORMAT_DXT1_RGBA
- PIPE_FORMAT_DXT3_RGBA
- PIPE_FORMAT_DXT5_RGBA
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Denormalized texture coordinates are required for text rendering in
GALLIUM_HUD.
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Textures will sometimes be updated if texture view state was
un-set, without this change that causes an assertion crash or
segfault.
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Compose swizzles using util_format_compose_swizzles instead
of the custom code (which somehow had a bug).
This makes the GL_ALPHA internal format work.
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change use of BLEND_ to BLEND2_,
BLEND_* a3xx_rb_blend_opcode
BLEND2_* is a2xx_rb_blend_opcode
This makes no effective difference as the used enumerant has the same
value (0), but the other enumerants do not match 1-to-1 so this will
avoid future problems.
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The format enumeration comes comes from the yamoto
register headers that are part of the amd-gpu kernel driver.
(see freedreno envytools commit b8fb7978e7ae106d0d11d0b238ab2ba2d4dd9d43)
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
util_is_power_of_two_or_zero
The new name make the zero-input behavior more obvious. The next
patch adds a new function with different zero-input behavior.
Signed-off-by: Ian Romanick <[email protected]>
Suggested-by: Matt Turner <[email protected]>
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Generated with
git grep -l nir_intrinsic_image | xargs \
sed -i 's/nir_intrinsic_image/nir_intrinsic_image_var/g'
and some manual fixing in nir_intrinsics.h
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some instructions, assume src and/or dst is half-precision based on a
type field (ie. f32/s32/u32 are full precision but others are half
precision). So add some code to sanity check the src/dst registers to
catch mixups.
Also propagate half-precision flag for SSA sources. The instruction
consuming a SSA value needs to be of the same type as the one producing
it.
This is probably not complete half-precision support, but a useful first
step. We do still need to add support for nir alu instructions for
converting between half/full precision.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It isn't just vertex shaders that need to fixup reg footprint for inputs
populated before shader starts.
This problem showed up with compute shaders. If you have (for example)
a localregid sysval, but only the .x component is used, the hw still
writes the .yz components, which could overflow into other threads
causing corruption. Showed up in cl cts 'basic/test_basic intmath_int'.
But in theory the same problem could crop up elsewhere.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
At least for clover.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Not *entirely* sure why this is a different BIND bit, but it is.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
I think this should also always only occur at the end of a BB (by
definition), and the BB successor should be the end block.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Temporary hack, but since we can't do 64b math yet in ir3, pretend that
we don't support 64b pointers.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Was hitting an assert with vs-varying-array-mat4-index-col-row-wr.shader_test
When eliminating a copy, we were dropping the use_count of the mov that
is skipped, but not increasing the use_count of it's src instruction.
Fixes: 76440fcca91 freedreno/ir3: clean up dangling false-dep's
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Required by radeonsi for optimal behavior.
|
|
|
|
|
|
|
|
|
| |
This fixes a race condition in building targets that link in freedreno.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105120
Fixes: 0bbecc5a8548883f76a7 ("meson: define driver dependencies")
Signed-off-by: Dylan Baker <[email protected]>
Acked-by: Mark Janes <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Nobody queries these and nobody sets them to anything useful,
the docs say TODO.
Drop them until a use appears.
Reviewed-by: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Flush a resource's previous write_batch synchronously. Because a
resource's associated batches are not updated until after the flush
thread submits rendering to the kernel, this was causing a bit of
confusion in the following loop. This fixes a bug that appeared with
recent stk.
Perhaps we need to re-work things a bit to clear out dependent patches
in the ctx's thread and use a fence to deal with the period between
when a flush is queued and when it is submitted to the kernel. But
this will do until time permits a larger refactor.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because of loops, we can't schedule all of a block's predecessors first.
Instead just assume that the result consumed in a block was written far
enough away in all paths into a block. And do an intra-block scheduling
pass to figure out if there are any cases where we need to insert extra
nop's. This works out better than always assuming the worst case (ie.
that a value live into a block was written in the last instruction in
the predecessor block).
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Account for the move to predicate register, to try to avoid needing to
insert extra NOPs later.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Normally false-deps are not something to consider, since they mostly
exist for delay-slot related reasons:
* barriers
* ordering writes after read
* SSBO/image access ordering
The exception is a false-dependency on an array store.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we didn't handle flow control in legalize, and instead just
set (ss)(sy) on the first instruction in every block. Which isn't very
clever.
Instead, consider output state of all predecessor blocks, so we only
set a sync bit if needed for any possible path leading into a block.
Because of loops, we can't require that all successor blocks are
legalized before a given block, so instead run in a loop until results
converge.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Useful in the following patches.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Maybe there is a better way for this.. where it comes useful is "array"
loads, which end up as a false-dep for a later array store.
If all the uses of an array load are CP'd into their consumer, it still
leaves the dangling array load, leading to funny things like:
mov.u32u32 r5.y, r0.y
mov.u32u32 r5.y, r0.z
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Consider also immediates for swapping the first two srcs, because they
can be lowered to constant.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Now that we convert phi webs to ssa, we can drop all this.
Signed-off-by: Rob Clark <[email protected]>
|