| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Signed-off-by: Glenn Kennard <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
Supported on R700 and up.
Signed-off-by: Glenn Kennard <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Since X has undefined contents in new pixmaps, it will allocate new
textures for an FBO and draw to them without an explicit clear. For
VC4, it's much faster to emit a clear than the load of the actual
undefined memory contents, so just do that instead.
|
|
|
|
|
|
|
| |
I'm not sure what the caller does is appropriate (just have a NULL sampler
at this slot), but it fixes the immediate crash.
Cc: "11.0" <[email protected]>
|
|
|
|
|
|
|
| |
I was afraid our callers weren't prepared for this, but it looks like
at least for resource creation, mesa/st throws an error appropriately.
Cc: "11.0" <[email protected]>
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Altough the compute support is still not complete because textures and
surfaces need to be implemented, it allows to launch very simple compute
kernel like one which reads reading MP performance counters.
This turns on PIPE_CAP_COMPUTE and PIPE_SHADER_COMPUTE.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
There might only be a single arg (e.g. cvt), so use mode rather than
looking at the source directly. Also we don't want to rely on the type
of the value, which can be unreliable, but instead use the
instruction's. This works out well since mkSplit doesn't adjust the
type.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
Not reachable from TGSI since it only has UMUL, no IMUL. However it's
surprising that setting argument types to s32 will cause sign to get
lost.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Force the fence to get kicked off, which won't actually wait for its
completion, but any additional work will be put onto a fresh list.
This fixes crashes in teximage-colors --benchmark with too many active
maps.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
As pointed out by Emil, this sometimes hangs, appears to be due to threading
need to rethink how this stuff works for llvmpipe.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
There are a few non-stoney changes too.
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
| |
v2: set emit_scratch_reloc, add a NULL check
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
v2: don't call get_flush_flags twice per function
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
This should improve performance for big copies that need to be split.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
| |
Nothing actually uses this yet (due to complications), but the emission
logic is right.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
This removes the hack used for merge, which only covers a fraction of
the cases.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
Need to emulate rcp/rsq before providing full fp64 support
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
Now that we support 64 bit immediates in insnCanLoad, we need to swap
64 bit immediate sources too for optimal effect.
Signed-off-by: Hans de Goede <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Teach insnCanLoad about double immediates, together with the
"Add support for merge-s to the ConstantFolding pass"
This turns the following (nvc0) code:
1: mov u32 $r2 0x00000000 (8)
2: mov u32 $r3 0x3fe00000 (8)
3: add f64 $r0d $r0d $r2d (8)
Into:
1: add f64 $r0d $r0d 0.500000 (8)
Signed-off-by: Hans de Goede <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This allows later passes like LoadPropagation to properly deal with 64
bit immediates.
If the new 64 bit load this introduces does not get optimized away then
split64BitOpPostRA() will split this into 2 instructions again.
Signed-off-by: Hans de Goede <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
No instructions are able to load short immediates like nvc0 can.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
Add support for encoding double immediates (up to 20 bits of precision)
into the generated gm107 machine-code.
Signed-off-by: Hans de Goede <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
Add support for encoding double immediates (up to 20 bits of precision)
into the generated nvc0 machine-code.
Signed-off-by: Hans de Goede <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We just needed to set the extra width/height fields to get this working.
v2 (chk): rebased, CC stable added, commit message added, fixed coding style
Signed-off-by: Boyuan Zhang <[email protected]>
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Cc: "10.6 11.0" <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Boyan Ding <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 342e68dc60 (nvc0: remove BGRA4 format support) removed the
support to fix a WoW trace. However after further experimentation, I was
able to get the blit to work by using a different "fake" format in the
2d engine.
The reason why this worked on nv50 is that nv50 falls back to the 3d
blit path in case either the src or the dst aren't "faithfully"
supported, while nvc0 only does it for the dst format. RG8 is better
supported by the nvc0 2d engine than R16.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
There are some weird problems with 8-wide vectors.
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We also have the "reserved for kick" space available. Some of my earlier
changes can probably be removed, but this is a quick fix for some of the
rarer fallout.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This greatly increases the pressure you can put on the driver before
create fails. Ultimately we need to let the kernel take control of
our cached BOs and just take them from us (and other clients)
directly, but this is a very easy patch for the moment.
Cc: "11.0" <[email protected]>
|
|
|
|
|
| |
It's surprising to see "0kb" printed for debug on short shaders, while
4kb alignment won't be suprising.
|
|
|
|
| |
60MB of cached BOs are a lot less scary than 600MB.
|
|
|
|
|
|
|
|
|
|
| |
When we emulate XOR logicop mode with blend-subtract, we need to ensure
that the fragment shader always emits white. We had this implemented
for VGPU9, but not VGPU10.
VMware bug 1545492.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
| |
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
compressed textures are very slow because decoding is rather complex
(and because there's no jit code code to decode them too for non-technical
reasons).
Thus, add some texture cache which holds a couple of decoded blocks.
Right now this handles only s3tc format albeit it could be extended to work
with other formats rather trivially as long as the result of decode fits into
32bit per texel (ideally, rgtc actually would decode to more than 8 bits
per channel, but even then making it work for it shouldn't be too difficult).
This can improve performance noticeably but don't expect wonders (uncompressed
is unsurprisingly still faster). It's also possible it might be slower in
some cases (using nearest filtering for example or if there's otherwise not
many cache hits, the cache is only direct mapped which isn't great).
Also, actual decode of a block relies on util code, thus even though always
full blocks are decoded it is done texel by texel - this could obviously
benefit greatly from simd-optimized code decoding full blocks at once...
Note the cache is per (raster) thread, and currently only used for fragment
shaders.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are currently two methods in llvmpipe code to calculate coeffs to
be used as inputs for the fragment shader. The two methods use slightly
different ways to do the floating point calculations and thus produce
slightly different results.
The decision which method to use is determined by the size of the vector
that is used by the platform.
For vectors with size of more than 128bit, a single-step method is used,
in which coeffs_init_simple() + attribs_update_simple() are called.
For vectors with size of 128bit or less, a two-step method is used, in
which coeffs_init() + attribs_update() are called.
This causes some piglit tests (clip-distance-bulk-copy,
interface-vs-unnamed-to-fs-unnamed) to fail when using platforms with
128bit vectors (such as ppc64le or x86-64 without AVX).
This patch makes platforms with 128bit vectors use the single-step
method (aka "simple" method) instead of the two-step method.
This would make the resulting coeffs identical between more platforms,
make sure the piglit tests passes, and make debugging and maintainability
a bit easier as the generated LLVM IR will be the same for more platforms.
The performance impact is negligible for x86-64 without AVX, and
basically non-existent for ppc64le, as it can be seen from the following
benchmarking results:
- glxspheres, on ppc64le:
- original code: 4.892745317 frames/sec 5.460303857 Mpixels/sec
- with the patch: 4.932083873 frames/sec 5.504205571 Mpixels/sec
- Additional 0.8% performance boost
- glxspheres, on x86-64 without AVX:
- original code: 20.16418809 frames/sec 22.50323395 Mpixels/sec
- with the patch: 20.31328989 frames/sec 22.66963152 Mpixels/sec
- Additional 0.74% performance boost
- glmark2, on ppc64le:
- original code: score of 58
- with my change: score of 57
- glmark2, on x86-64 without AVX:
- original code: score of 175
- with the patch: score of 167
- Impact of of -4.5% on performance
- OpenArena, on ppc64le:
- original code: 3398 frames 1719.0 seconds 2.0 fps
255.0/505.9/2773.0/0.0 ms
- with the patch: 3398 frames 1690.4 seconds 2.0 fps
241.0/497.5/2563.0/0.2 ms
- 29 seconds faster with the patch, which is about 2%
- OpenArena, on x86-64 without AVX:
- original code: 3398 frames 239.6 seconds 14.2 fps
38.0/70.5/719.0/14.6 ms
- with the patch: 3398 frames 244.4 seconds 13.9 fps
38.0/71.9/697.0/14.3 ms
- 0.3 fps slower with the patch (about 2%)
Additional details can be found at:
http://lists.freedesktop.org/archives/mesa-dev/2015-October/098635.html
Signed-off-by: Oded Gabbay <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
pipe->flush never returned SDMA fences. This fixes it.
This is only an issue on amdgpu where fences can signal out of order.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
| |
This fixes crashes with some piglit OpenCL tests.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
To get the size (in bytes) of a compute parameter, clover first calls
get_compute_param() with a NULL data pointer. The RET() macro is based
on nv50.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|