| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
However only if the file exists in DEBUG_OUTPUT_DIR. The expectation is
that AR rasterizerLauncher will start placing it there when launching
a workload (which is in a subsequent checkin)
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Make sure we initialize variables.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
expecting <8xi16> return.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
| |
- 64-bit cvt-to-float needs to be explicitly handled
- gathers need the right parameter types to work with doubles
Fixes draw-vertices piglit tests
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
VecEqual and VecHash
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
ALLOCA pointer elements, not pointers.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
previous change in generators necessitates this change
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
for the case when USE_SIMD16_SHADERS == FALSE
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Christian Gmeiner <[email protected]>
Reviewed-by: Philipp Zabel <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Christian Gmeiner <[email protected]>
Reviewed-by: Philipp Zabel <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It would be nice to share the flags packet emit logic with flat shade
flags, but I couldn't come up with a good way while still using our pack
macros. We need to refactor this to shader record setup at compile time,
anyway.
Fixes ext_framebuffer_multisample-interpolation * centroid-*
|
|
|
|
|
| |
Our TF outputs always start at 6 or 7 currently, so we don't hit the
broken 8 case. Let's make sure that doesn't change somehow.
|
|
|
|
|
|
| |
This should fix help with intermittent GPU hangs in tests switching
formats while rendering small frames. Unfortunately, it didn't help with
the tests I'm having troubles with.
|
|
|
|
|
|
|
|
|
| |
s/attibute/attribute/
s/suface/surface/
v2: rebased(Leo)
Reviewed-by: Leo Liu <[email protected]>
|
|
|
|
|
|
|
|
| |
VASurfaceAttribExternalBuffers.pitches is indexed by
plane. Current implementation only supports single plane layout.
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
|
|
|
|
|
|
|
|
| |
Previous bit-fields assignments are incorrect and will result certain mpeg4
decode failed due to wrong flag values. This patch fixes these assignments.
Signed-off-by: Boyuan Zhang <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The driver may have a reference on the separate stencil buffer for some
reason (like an unflushed job using it), so we can't directly free the
resource and should instead just decrement the refcount that we own.
Fixes double-free in KHR-GLES3.packed_depth_stencil.blit.depth32f_stencil8
on vc5.
Fixes: e94eb5e6000e ("gallium/util: add u_transfer_helper")
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
| |
Like for stores, we need to emit a separate load_general packet.
|
|
|
|
|
|
| |
The internal-type-bpp path is for surfaces that get stored in the raw TLB
format. For 4.x, we're storing MSAA as just 2x width/height at the
original format.
|
|
|
|
| |
Fixes piglit fbo-depthstencil blit default_fb
|
| |
|
|
|
|
|
| |
For single-sample we have to always program SAMPLE_0, but for multisample
we want to store all the samples.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The logic was flawed, since mul(x,y) will be <= 0 (exactly 0) when
the sign is the same but both numbers are sufficiently small
(if the product is smaller than 2^-128).
This could apparently lead to emitting a sufficient amount of
additional bogus vertices to overflow the allocated array for them,
hitting an assertion (still safe with release builds since we just
aborted clipping after the assertion in this case - I'm however unsure
if this is now really no longer possible, so that code stays).
Not sure if the additional vertices could cause other grief, I didn't
see anything wrong even when hitting the assertion.
Essentially, both +-0 are treated as positive (the vertex is considered
to be inside the clip volume for this plane), so integrate the logic
determining different sign into the branch there.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Simplifies the logic when to emit null tris (albeit the reasons why we
have to do this remain unclear).
This is strictly just logic simplification, the behavior doesn't change
at all.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some analysis suggests that all short immediates are sign-extended. The
insnCanLoad logic already accounted for this, but we could still pick
the wrong form when emitting actual instructions that support both short
and long immediates (with the long form usually having additional
restrictions that insnCanLoad should be aware of).
This also reverses a bunch of commits that had previously "worked
around" this issue in various emitters:
9c63224540ef: gm107/ir: make use of ADD32I for all immediates
83a4f28dc27b: gm107/ir: make use of LOP32I for all immediates
b84c97587b4a: gm107/ir: make use of IMUL32I for all immediates
d30768025a22: gk110/ir: make use of IMUL32I for all immediates
as well as the original import for UMUL in the nvc0 emitter.
Reported-by: Karol Herbst <[email protected]>
Signed-off-by: Ilia Mirkin <[email protected]>
Tested-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have already required 0.44 for building clover and swr, so it was
already partially required. This just makes it required across the board
instead of just for clover and swr.
There is a bug in 0.44 which makes it impossible to build mesa in some
configurations, so require 0.44.1 which fixes this.
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This one's completely my fault, I didn't do good enough testing after
rebasing and this got missed.
Fixes: d28c24650110c130008be3d3fe584520ff00ceb1
("meson: build graw tests")
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Imad needs to set a read barrier.
With significant big work groups I was getting wrong results for div u32. Turns
out the issue was with the sched opcodes.
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Without this, we'd assertion fail in u_transfer_helper when mapping an
imported resource.
|
|
|
|
|
| |
The kernel shouldn't return a bo at NULL, and the HW special-cases NULL
address values for things like OQs.
|
|
|
|
|
| |
The kernel won't return us BOs at offset 0 (because things like OQs
wouldn't work there), so we shouldn't in the simulator either.
|
|
|
|
|
| |
Otherwise we'd crash immediately upon importing a BO through EGL
interfaces.
|
|
|
|
|
| |
We don't have any kernel metadata about BO tiling, so this probably is all
we should do for the moment.
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit dab02dea3411d325a5aee6cda5b581e61396ecc6.
It causes crashes of qtcreator and firefox.
Fixes: dab02de "st/dri: Fix dangling pointer to a destroyed dri_drawable"
Cc: 18.0 18.1 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we dump the bitcode for off-line debug purposes, we really want the
pre-optimized bitcode, otherwise it's useless in identifying problems
with IR optimization (if you have a shader which takes an hour to do
IR optimization, it's also nice you don't have to wait that hour...).
Also, print out the function passes for opt which correspond to what
was used for jit compilation (and also the opt level for codegen).
Using opt/llc this way should then pretty much mimic what was done
for jit. (When specifying something like -time-passes
-debug-pass=[Structure|Arguments] (for either opt or llc) that also
gives very useful information in which passes all the time was spent,
and which passes are really run along with the order - llvm will add
passes due to dependencies on its own, and of course -O2 for llc
comes with a ~100 pass list.)
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
| |
Conversion to int can otherwise overflow if compile times are over
~71min. (Yes this can happen...)
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LICM is simply too expensive, even though it presumably can help quite
a bit in some cases.
It was definitely cheaper in llvm 3.3, though as far as I can tell with
llvm 3.3 it failed to do anything in most cases. early-cse also actually
seems to cause licm to be able to move things when it previously couldn't,
which causes noticeable compile time increases.
There's more loop passes in llvm, but I'm not sure which ones are helpful,
and I couldn't find anything which would roughly do what the old licm in
llvm 3.3 did, so ditch it.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass is quite cheap, and can simplify the IR quite a bit for our
generated IR.
In particular on a variety of shaders I've found the time saved by
other passes due to the simplified IR more than makes up for the cost
of this pass, and on top of that the end result is actually better.
The only downside I've found is this enables the LICM pass to move some
things out of the main shader loop (in the case I've seen, instanced
vertex fetch (which is constant within the jit shader) plus the derived
instructions in the shader) which it couldn't do before for some reason.
This would actually be desirable but can increase compile time
considerably (licm seems to have considerable cost when it actually can
move things out of loops, due to alias analysis). But blaming early cse
for this seems inappropriate. (Note that the first two sroa / earlycse
passes are similar to what a standard llvm opt -O1/-O2 pipeline would
do, albeit this has some more passes even before but I don't think
they'd do much for us.)
It also in particular helps some crazy shader used for driver
verification (don't ask...) a lot (about factor of 6 faster in compile
time) (due to simplfiying the ir before LICM is run).
While here, also move licm behind simplifycfg. For some shaders there
seems to be very significant compile time gains (we've seen a factor
of 10000 albeit that was a really crazy shader you'd certainly never
see in a real app), beause LICM is quite expensive and there's cases
where running simplifycfg (along with sroa and early-cse) before licm
reduces IR complexity significantly. (I'm not entirely sure if it would
make sense to also run it afterwards.)
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
This refactors the code out to share it between radv and radeonsi.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
This just makes this common code between the two drivers.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
We can just unreachable here, this aligns with radv code, makes
it easier to move to common code.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If an EGLSurface is created, made current and destroyed, and then a second
EGLSurface is created. Then the second malloc in driCreateNewDrawable may
return the same pointer address the first surface's drawable had.
Consequently, when dri_make_current later tries to determine if it should
update the texture_stamp it compares the surface's drawable pointer against
the drawable in the last call to dri_make_current and assumes it's the same
surface (which it isn't).
When texture_stamp is left unset, then dri_st_framebuffer_validate thinks
it has already called update_drawable_info for that drawable, leaving it
unvalidated and this is when bad things starts to happen. In my case it
manifested itself by the width and height of the surface being unset.
This is fixed this by setting the pointer to NULL before freeing the
surface.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106126
Signed-off-by: Johan Klokkhammer Helsing <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
Cc: 18.0 18.1 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For nv50 we coalesce the srcs and defs into a single node. As such, we
can end up with impossible constraints if the source is referenced
after the tex operation (which, due to the coalescing of values, will
have overwritten it).
This logic already exists for inserting moves for MERGE/UNION sources.
It's the exact same idea here, so leverage that code, which also
includes a few optimizations around not extending live ranges
unnecessarily.
Fixes tests/spec/glsl-1.30/execution/fs-textureSize-components.shader_test
Signed-off-by: Ilia Mirkin <[email protected]>
|