| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
The compiler doesn't know that num_visuals > 0.
Fixes: 37a8d907cc16 ("egl/gbm: Ensure EGLConfigs match GBM surface format")
Reviewed-by: Daniel Stone <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Flush a resource's previous write_batch synchronously. Because a
resource's associated batches are not updated until after the flush
thread submits rendering to the kernel, this was causing a bit of
confusion in the following loop. This fixes a bug that appeared with
recent stk.
Perhaps we need to re-work things a bit to clear out dependent patches
in the ctx's thread and use a fence to deal with the period between
when a flush is queued and when it is submitted to the kernel. But
this will do until time permits a larger refactor.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because of loops, we can't schedule all of a block's predecessors first.
Instead just assume that the result consumed in a block was written far
enough away in all paths into a block. And do an intra-block scheduling
pass to figure out if there are any cases where we need to insert extra
nop's. This works out better than always assuming the worst case (ie.
that a value live into a block was written in the last instruction in
the predecessor block).
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Account for the move to predicate register, to try to avoid needing to
insert extra NOPs later.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Normally false-deps are not something to consider, since they mostly
exist for delay-slot related reasons:
* barriers
* ordering writes after read
* SSBO/image access ordering
The exception is a false-dependency on an array store.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we didn't handle flow control in legalize, and instead just
set (ss)(sy) on the first instruction in every block. Which isn't very
clever.
Instead, consider output state of all predecessor blocks, so we only
set a sync bit if needed for any possible path leading into a block.
Because of loops, we can't require that all successor blocks are
legalized before a given block, so instead run in a loop until results
converge.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Useful in the following patches.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Maybe there is a better way for this.. where it comes useful is "array"
loads, which end up as a false-dep for a later array store.
If all the uses of an array load are CP'd into their consumer, it still
leaves the dangling array load, leading to funny things like:
mov.u32u32 r5.y, r0.y
mov.u32u32 r5.y, r0.z
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Consider also immediates for swapping the first two srcs, because they
can be lowered to constant.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Now that we convert phi webs to ssa, we can drop all this.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Now that it is unused.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Generally seems to do worse on instruction count and register usage,
according to shader-db. But shader-db also doesn't do a very good job
of weighting loop bodies, so that might not be totally valid.
So add an env variable to enable GCM pass for easier experimentation.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
There are more useful nir passes added since initial conversion to nir.
But ir3 was never updated to use them.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Agressively lowering all if/else to selects in some extreme cases
results in much higher register pressure. Using peephole select instead
with a modest threshold speeds up alu2 4x!
16 seems like a good limit, low enough to help alu2 but not too low that
it penalizes everything else. With a bit better scheduling of the
instruction that moves a value into a predicate register, we might be
able to lower this limit a bit more in the future, but since we need 6
cycles from the move to predicate register to predicated branch, that
puts some sort of lower bound on how far we can lower this threshold.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
nir's from_ssa pass is much better at avoiding inserting extra moves
than our logic is. And lowering phi webs to regs just treats anything
involved in a phi web as an array of length=1. Which with previous
array related fixes in RA/etc ends up working out quite well. This cuts
down on extra instructions and also helps with register pressure.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Makes it easier to compare values seen in-game (where there are many
shaders) to cmdline standalone compiler.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Instead, if possible fold (sat) flag into src, otherwise use:
(sat)max.f rD, rS, rS
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Seems to be there since a3xx, but we always lowered fsat. But we can
shave some instructions, especially in shaders that use lots of
clamp(foo, 0.0, 1.0) by not lowering fsat.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Use livein state of other blocks to extend liverange of arrays when they
are still needed by successor blocks.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
(Plus a couple TODOs)
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Since these are not in SSA form, add to block's keeps so it doesn't
appear unused.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
When eliminating movs, the instruction that is now directly using the
src of the mov has the same scheduling order constraints as the original
mov instruction.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Function ends after this if/else ladder, so it was pointless.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The number of bits depends on generation. But printing negative values
with a5xx encoding (largest size) but compiling for a3xx or a4xx, would
result in negative values printed as large positive values.
I guess in practice huge negative branch offsets aren't likely (and if
that is the case, the shader is probably too big to grok by reading the
assembly). So just print using smallest bitfield size.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Try to clean up things like:
br !p0.x #2
br p0.x #something
to eliminate the first branch.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
When trying to optimize to reduce stalls, it is nice to see this info.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Fix the following:
warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but
argument 3 has type ‘uint64_t {aka long long unsigned int}.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
| |
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
v2: include compute shader support
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We move the nir check before the shader cache call so that we can
call a nir based caching function in a following patch.
Also with this change we simply check if vertex shaders support
NIR rather than looping over the stages as mixing of shader types
is not supported anyway.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Clover now checks PIPE_SHADER_CAP_SUPPORTED_IRS for native support instead.
This change indirectly enables NIR support for compute shaders
on radeonsi.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
We now use PIPE_SHADER_CAP_SUPPORTED_IRS to check for native support
in clover.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
PIPE_SHADER_CAP_PREFERRED_IR was conflicting with PIPE_SHADER_IR_NIR
for compute shaders, so we let clover pick the one it wants to use.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Acked-by: Pierre Moreau <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
| |
Fixes a number of draw buffers piglit tests.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes the following piglit test:
./bin/arb_vertex_attrib_64bit-check-explicit-location -auto -fbo
Where we would end up with the nir such as:
vec1 64 ssa_11 = pack_64_2x32_split ssa_9, ssa_10
vec1 32 ssa_12 = f2f32 ssa_2
And our pack_64_2x32_split nir to llvm code always produces
a 64bit integer as output.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
| |
Previously the asserts did not take swizzles into account.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
| |
This reverts commit 712332ed54f14b5ee34c2990e351ca48992488b2, which
caused over 90k failures in Mesa i965 CI.
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we create an EGL window surface on a GBM surface, ensure that the
EGLConfig is compatible with the GBM format, notwithstanding XRGB/ARGB
interchange.
For example, rendering with an XRGB8888 EGLConfig on to an ARGB8888
gbm_surface (and vice-versa) are acceptable, but rendering with an
XRGB2101010 EGLConfig on to an XRGB8888 gbm_surface will now be
rejected.
This was previously allowed through; when 10bpc formats were enabled,
clients which picked a completely random EGL config and hoped/assumed
they were XRGB8888 would break.
If you have bisected a failure to start a GBM/KMS client to this commit,
please look at its EGLConfig selection (e.g. through eglChooseConfigs),
and add an EGL_NATIVE_VISUAL_ID == gbm_surface format match to the
attribs for config selection.
Signed-off-by: Daniel Stone <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Tested-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we have mask/channel information in gbm_dri's format conversion
table, we can remove the copy in EGL.
As this table contains more formats (notably including R8 and RG8, which
can be used for BO but not surface allocation), we now compare the masks
of all channels when trying to find a suitable config. Without doing
this, an XRGB8888 EGLConfig would match on an R8 format.
Signed-off-by: Daniel Stone <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Tested-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Daniel Stone <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Tested-by: Ilia Mirkin <[email protected]>
|