| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Comment was right, implementation was wrong ;-)
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103128
Fixes: cad959d90145 ("gallium: add LDEXP TGSI instruction and corresponding cap")
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
v2: copy input primitive
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Trivial.
|
|
|
|
| |
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
| |
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When creating a context without SetPixelFormat() don't blindly take the
pixel format reported by GDI. Instead, look for our own closest pixel
format.
Minor clean-ups added by Brian Paul.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103412
Reviewed-by: Brian Paul <[email protected]>
Tested-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We only have a single stencil read mask and write mask. Issue a
warning if different front/back values are used. The Piglit
gl-2.0-two-sided-stencil test hits this.
Reviewed-by: Neha Bhende <[email protected]>
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sampler TS is an hardware optimization that can be used when rendering
to textures. After rendering to a resource with TS enabled, the
texture unit can use this to bypass lookups to empty tiles. This also
means a resolve-in-place can be avoided to flush the TS.
This commit is also an optimization when not using sampler TS, as
resolve-in-place will now be skipped if a resource has no (valid) TS.
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is to make sure that the TS is properly flushed to memory before
rendering to a new surface starts.
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Lucas Stach <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Sampler TS introduces yet another format enumeration for
renderable+textureable formats. Introduce it into the etnaviv_format
table as another column.
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Lucas Stach <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Resources only need a resolve-to-itself if their TS is valid for any
level, not just if it happens to be allocated.
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Lucas Stach <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Float rts were always set as unorm instead of float.
Not sure of the consequences, but at least it looks like the blend clamp
would have been enabled, which is against the rules (only eg really bothered
to even attempt to specify this correctly, r600 always used clamp anyway).
Albeit r600 (not r700) setup still looks bugged to me due to never setting
BLEND_FLOAT32 which must be set according to docs...
Not sure if the hw really cares, no piglit change (on eg/juniper).
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Both r600 and evergreen used the clamped version, whereas cayman used the
ieee one. I don't think there's a valid reason for this discrepancy, so let's
switch to the ieee version for r600 and evergreen too, since we generally
want to stick to ieee arithmetic.
With this, behavior for both rcp and rsq should now be the same for all of
r600, eg, cm, all using ieee versions (albeit note rsq retains the abs
behavior for everybody, which may not be a good idea ultimately).
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r600 used the clamped version for rcp, whereas both evergreen and cayman
used the ieee version. I don't know why that discrepancy exists (it does so
since day 1) but there does not seem to be a valid reason for this, so make
it consistent. This seems now safer than before the previous commit (using
the dx10 clamp bit).
Note that rsq still uses clamped version (as before even though the table
may have suggested otherwise for evergreen) for r600/eg, but not for cayman.
Will be changed separately for better regression tracking...
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The docs are not very concise in what this really does, however both
Alex Deucher and Nicolai Hähnle suggested this only really affects instructions
using the CLAMP output modifier, and I've confirmed that with the newly
changed piglit isinf_and_isnan test.
So, with this bit set, if an instruction has the CLAMP modifier bit (which
clamps to [0,1]) set, then NaNs will be converted to zero, otherwise the result
will be NaN.
D3D10 would require this, glsl doesn't have modifiers (with mesa
clamp(x,0,1) would get converted to such a modifier) coupled with a
whatever-floats-your-boat specified NaN behavior, but the clamp behavior
should probably always be used (this also matches what a decomposition into
min(1.0, max(x, 0.0)) would do, if min/max also adhere to the ieee spec of
picking the non-nan result).
Some apps may in fact rely on this, as this prevents misrenderings in
This War of Mine since using ieee muls
(ce7a045feeef8cad155f1c9aa07f166e146e3d00), without having to use clamped
rcp opcode, which would also fix this bug there.
radeonsi also seems to set this bit nowadays if I see that righ (albeit the
llvm amdgpu code comment now says "Make clamp modifier on NaN input returns 0"
instead of "Do not clamp NAN to 0" since it was changed, which also looks
a bit misleading).
v2: set it in all shader stages.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103544
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I believe this is the safe thing to do, especially ever since the driver
actually generates NaNs for muls too.
The ISA docs are not very helpful here, however the dx10 versions will pick
a non-nan result over a NaN one (this is also the ieee754 behavior), whereas
the non-dx10 ones will pick the NaN (verified by newly changed piglit
isinf-and-isnan test).
Other "modern" drivers will most likely do the same.
This was shown to make some difference for bug 103544, albeit it is not
required to fix it.
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A lot of cubemap array piglits fail, port the texture type
picking code from radeonsi which seems to fix most of them.
For images I will port the rest of the code.
Fixes:
getteximage-depth gl_texture_cube_map_array-*
fbo-generatemipmap-cubemap array
getteximage-targets cube_array
amongst others.
Reviewed-by: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
A couple failures in piglit tests w/ TF or gl_VertexID + indirect draws.
OTOH all the deqp tests (although they don't test those combinations).
I suspect this could be fixed by a firmware update, but I don't think
there is much we can do in mesa for that.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
We need a similar thing for indirect draws.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
Stumbled into this when adding a new PIPE_CAP.
Signed-off-by: Andres Rodriguez <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
Speed up simd16 frontend (default) on avx/avx2 platforms;
fixes performance regression caused by switch to simdlib.
Reviewed-by: Bruce Cherniak <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
| |
Speed up avx512 platforms; fixes performance regression caused
by swithc to simdlib.
Reviewed-by: Bruce Cherniak <[email protected]>
Cc: [email protected]
|
|
|
|
| |
v2: remove r600_can_dma_copy_buffer
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using build_by_default : false is convenient for dependencies that can
be pulled in by various diverse components of the build system, the
gallium hardware/software drivers and state trackers do not fit that
description. Instead, these should be guarded using the variable that tracks
whether that driver should be enabled.
This leaves a few helper libraries: trace, rbug, etc, and the generic
winsys bits as `build_by_default : false` because there are a large
number of gallium components that pull them in.
v2: - remove build_by_default from winsys convenience libs as well.
v3: - Always put drivers before winsys for consistency
Signed-off-by: Dylan Baker <[email protected]>
Tested-by: Lionel Landwerlin <[email protected]> (v1)
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Fixes:
tests/spec/arb_gpu_shader5/execution/built-in-functions/fs-bitfieldExtract.shader_test
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This handles the bits >= 32 corner case in bitfieldInsert.
Fixes:
tests/spec/arb_gpu_shader5/execution/built-in-functions/fs-bitfieldInsert.shader_test.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Like
radeonsi: generate GS prolog to (partially) fix triangle strip adjacency rotation
evergreen hw suffers from the same problem, so rotate the
geometry inputs to fix this.
This fixes:
./bin/glsl-1.50-geometry-primitive-types GL_TRIANGLE_STRIP_ADJACENCY
on evergreen.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
As per radeonsi, the tess factor components for isolines
are reversed.
Fixes: tests/spec/arb_tessellation_shader/execution/isoline.shader_test
Cc: <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
r0 in input into vertex shaders contains things like vertexid,
we need to reserve it even if we have no inputs.
This fixes a bunch of tessellation piglits.
Cc: <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
Otherwise we end up emitting the fence.
Tested-By: Gert Wollny <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
v2: include the file also in the meson.build (Eric Engestrom).
Fixes: f1e1c60ff6 ("etnaviv: Update from rnndb")
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
In case an instruction only writes one register, and it is .x, we can
skip the extra level of fanout indirection.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Seems to fix dEQP compute related tests.. and matches what i965 does, so
perhaps there is some assumption that std430 packing is on by default
somewhere in NIR?
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new pass that inserts additional dependencies, rather than simply
relying on SSA srcs added in the nir->ir3 frontend. This makes it
easier to deal with barriers, but the additional false deps also lets us
deal properly with ensuring a write depends on all previous reads.
Since conversion to barrier instructions is lossy (ie. just knowing the
instruction doesn't tell us enough about what other instructions the
barrier applies to), use barrier_class/barrier_conflict fields in the
ir3_instruction to retain this information.
This could probably be relaxed somewhat by considering *which* array/
buffer/image variable is being referenced. Ie. a write to buffer A
can overtake a read from buffer B, if B is not coherent. (right?)
Signed-off-by: Rob Clark <[email protected]>
|