| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
There's a levelZero flag which forces texturing to pick level zero (and
not consume an explicit LOD argument). This is set for MS targets, but
could also be set for any other incoming instruction. As that is what
determines whether a LOD argument is present, check that rather than the
more indirect isMS logic.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ever since a long time ago when I messed around with fences, I ensure
that after a PUSH_SPACE call there is enough space to write a fence out
into the pushbuf.
However the PUSH_SPACE macro is not all-knowing, and so sometimes we
have to invoke nouveau_pushbuf_space manually with the relocs/pushes
args set. If we don't take the extra allocation from PUSH_SPACE into
account, then we will end up accidentally flushing when the code was not
expecting a flush. This can lead to various runtime and rendering
failures.
The amount of extra allocation isn't that important - it has to be at
least 8 based on the current nouveau_winsys.h setting, but even more
won't hurt. I just rounded up to powers of 2.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99354
Cc: "12.0 13.0" <[email protected]>
Signed-off-by: Ilia Mirkin <[email protected]>
Acked-by: Ben Skeggs <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Code is taken from a combination of radv (for the more basic functions,
to avoid gallivm dependencies) and radeonsi (for the new and improved
derivative calculations).
v2: add 0.5 offset to tex coords only after derivative calculation
v3:
- really only touch the first three coordinates
- rebase on the removal of the 1.5 --> 0.5 offset change
Reviewed-by: Bas Nieuwenhuizen <[email protected]> (v2)
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
Sourcing coords_arg[4] is actually never correct, since bias is handled
differently in tex_fetch_args anyway.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As remarked by the comment in the original code, the old algorithm fails when
(tc + deriv) points at a different cube face. Instead, simply project the
derivative directly to the plane of the selected cube face.
The new code is based on exactly differentiating (using the chain rule)
the projection onto a plane corresponding to a fixed cube map face (which
is still selected in the usual way based on the texture coordinate itself).
The computations end up fairly involved, but we do save two reciprocal
computations.
Fixes GL45-CTS.texture_cube_map_array.sampling.
v2: add 0.5 offset to tex coords only after derivative calculation
v3: go back to 1.5 offset
Reviewed-by: Bas Nieuwenhuizen <[email protected]> (v2)
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
v2: fix compile error that snuck in during rebase
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
b838f642 "ac/debug: Move sid_tables.h generation to common code." moved
sid_tables.h but forgot the corresponding .gitignore.
Signed-off-by: Grazvydas Ignotas <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
Cc: [email protected]
Cc: Bruce Cherniak <[email protected]>
Signed-of-by: Chuck Atkins <[email protected]>
Reviewed-by: Bruce Cherniak <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The generated file is correctly stored in the builddir as of earlier
commit. Yet the commit forgot to add the respective include flag thus
the compiler would error out failing to find sid_tables.h
Bugzila: https://bugs.freedesktop.org/show_bug.cgi?id=99389
Fixes: d1dc22eb466 "ac: automake: rework sid_tables.h generation"
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the flag D3DCREATE_MULTITHREAD is set, a global mutex is used
to protect nine calls.
However for performance reasons, AddRef and Release didn't hold the mutex,
and instead used atomics.
Unfortunately at item release, the item can be destroyed, and that
destruction path should be protected by a mutex (at least for
some objects).
Without this patch, it is possible an app thread is in a dtor
while another thread is making gallium nine calls. It is possible
that two threads are using the same gallium pipe, which is forbiden.
The problem has been made worse with csmt, because it can cause hang,
since nine_csmt_process is not threadsafe.
Fixes Hitman hang, and possibly others.
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
|
| |
Flush the queue to get refcounts right, and properly
release the items, instead of throwing away all pending
commands.
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
|
| |
Some nine_state_* and nine_context_* functions
used for Reset() require all pending commands are
flushed.
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
| |
nine_context uses NineSurface9 fields, thus we need to flush
pending commands using the surface before changing the fields.
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
| |
Create both surfaces in one call.
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
|
| |
There is no need to check on csmt_active before
calling nine_csmt_process, because the function
checks already.
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
When dirty region is empty, u_box_union_* incorrectly expands
the new region.
This fixes broken font rendering issue in WOLF RPG Editor v2.10 games.
Signed-off-by: Masanori Kakura <[email protected]>
Reviewed-by: Axel Davy <[email protected]>
|
|
|
|
|
|
| |
... and list the public header within it.
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
| |
Note: the currently mentioned etnaviv_utils.h is typo.
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Changes from V1 -> V2:
- updated Copyright
- added $(top_srcdir)/src/gallium/winsys to include path (suggested by Emil)
- adapted driver to new renderonly API
Signed-off-by: Christian Gmeiner <[email protected]>
Acked-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This driver supports a wide range of Vivante IP cores like GC880,
GC1000, GC2000 and GC3000.
Changes from V1 -> V2:
- added missing files to actually integrate the driver into build system.
- adapted driver to new renderonly API
Signed-off-by: Christian Gmeiner <[email protected]>
Signed-off-by: Lucas Stach <[email protected]>
Signed-off-by: Philipp Zabel <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Signed-off-by: Russell King <[email protected]>
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Acked-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This a very lightweight library to add basic support for renderonly
GPUs. A kms gallium driver must specify how a renderonly_scanout
objects gets created. Also it must provide file handles to the used
kms device and the used gpu device.
This could look like:
struct renderonly ro = {
.create_for_resource = renderonly_create_gpu_import_for_resource,
.kms_fd = fd,
.gpu_fd = open("/dev/dri/renderD128", O_RDWR | O_CLOEXEC)
};
The renderonly_scanout object exits for two reasons:
- Do any special treatment for a scanout resource like importing the
GPU resource into the scanout hw.
- Make it easier for a gallium driver to detect if anything special
needs to be done in flush_resource(..) like a resolve to linear.
A GPU gallium driver which gets used as renderonly GPU needs to be
aware of the renderonly library.
This library will likely break android support and hopefully will get
replaced with a better solution based on gbm2.
Changes from V1 -> V2:
- reworked the lifecycle of renderonly object (suggested by Nicolai Hähnle)
- killed the midlayer (suggested by Thierry Reding)
- made the API more explicit regarding gpu and kms fd's
- added some docs
Signed-off-by: Christian Gmeiner <[email protected]>
Acked-by: Emil Velikov <[email protected]>
Tested-by: Alexandre Courbot <[email protected]>
|
|
|
|
|
|
|
| |
Defer delete on regular resources. This ensures that any work being done
on the resource is completed before freeing up the resource's memory.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Although, arb_shader_image_load_store-atomicity will most likely
hang your box, I think it's now quite reasonable to enable GL 4.3
on Maxwell/Pascal GPUs. I suspect that test to be wrong because
it doesn't even work on the NVIDIA blob.
I have tested a bunch of benchmarks (UE4 demos) and real games
like Shadow of Mordor and they all work fine.
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Pierre Moreau <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Pierre Moreau <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Yes, IMUL/IMAD require dependency barriers and we should
definitely replace these instructions by XMAD but the
different flags need to be figured out. Note that XMAD only
supports 16-bits integers.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Pierre Moreau <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes use of scheduling control codes which are very useful
for improving the instruction pipelining.
This patch will increase performance on Maxwell GPUs by, at least,
x1.5 up to x3.5 for some benchmarks.
Although this has been fairly well tested, I would not be suprised
if someone hit a corner case somewhere. That way, the scheduler
is enabled by default but it can be deactivated by using
NV50_PROG_SCHED=0.
Thanks to Scott Gray for the reverse engineering work available from
https://github.com/NervanaSystems/maxas/wiki/Control-Codes.
Signed-off-by: Samuel Pitoiset <[email protected]>
Acked-by: Pierre Moreau <[email protected]>
Tested-by: Alexandre Courbot <[email protected]>
Tested-by: Jan Vesely <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
It's actually useless to insert those texture barriers post RA
because the current control code (ie. st 0x0) will wait for all
dependencies before issuing a new instruction.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Pierre Moreau <[email protected]>
|
|
|
|
|
|
| |
The old setting didn't hurt, but this is cleaner.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Without this check, the kernel::bind() method would fail with a
std::out_of_range exception, letting an exception escape from the
library into the client, rather than returning the corresponding error
code CL_INVALID_PROGRAM_EXECUTABLE.
Signed-off-by: Pierre Moreau <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
In parse_identifier, it doesn't stop copying '*pcur'
untill encounter the NULL. As the 'ret' has a
fixed-size buffer, if the '*pcur' has a long string,
there will be a buffer overflow. This patch avoid this.
Signed-off-by: Li Qiang <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
Reviewed-by: Marc-André Lureau <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Tapani Pälli <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
Sometimes it is useful to disable the "growable" cmdstream buffers for
debugging. (See 419a154d in libdrm)
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Now that issues glamor was hitting w/ glsl>=130 (aka missing INSTANCED
bit in vertex attribute state) is fixed, remove hack.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Add missing bit, now that we know where it is.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is for handling chained command buffers and secondary command
buffers. It doesn't handle the trace id for secondary command buffers
yet, but I don't think that is possible in general with just writes,
as we could call a secondary command buffer multiple times.
I think this is good enough for now, as the most useful case is the
chaining when we grow an IB.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: do it properly
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98238
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
This has no effect because no code uses those members with ranged decls.
Tested-by: Edmondo Tommasina <[email protected]>
Acked-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Tested-by: Edmondo Tommasina <[email protected]>
Acked-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
r600g doesn't set pipe_context::clear_buffer.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99303
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generally we should do tranpose after conversion, if the format has less than
32 bits per channel (if it has 32 bits, conversion is going to be a no-op
anyway...). This is obviously because there's less vectors to deal with.
Though the advantage for 16 bit formats isn't that big, and in fact with AVX
there isn't really any (as the 32bit unpacks can be done with 256bit, but
the smaller ones cannot, although that would change again with proper AVX2
support).
Only makes sense for 2d and not 1d cases. And to keep things easy, only handle
1,2 and 4 channels (rgbx is just fine).
For rgba unorm8 format the backend conversion sums up to these instruction
totals (not counting the movs for SSE2 due to 2-op syntax - generally every 2
unpacks need an additional mov).
SSE2 AVX
transpose: 32 unpack 16 unpack
untwiddle: 0 8 (128bit low/high permutes)
convert: 16 mul + 16 cvt 8 mul + 8 cvt
32->8bit: 12 pack 8 (128bit extract) + 12 pack
When doing transpose/untwiddle afterwards we get:
convert: 16 mul + 16 cvt 8 mul + 8 cvt
32->8bit: 12 pack 8 (128bit extract) + 12 pack
transpose/untwiddle 12 unpack 12 unpack
So for SSE2, this drops 20 unpacks (total instruction count 76->56)
whereas for AVX it replaces the 16 256bit unpacks with 8 128bit ones
and drops the 8 lo/hi permutes (in total 60->48). (Albeit to be fair,
the permutes could be dropped even when doing the transpose first,
they are extremely pointless but we'd need to be able to tell
lp_build_conv to reorder the vectors, for AVX2 we're going to need to
be able to tell lp_build_conv about ordering in any case.)
(With different ordering going into conversion, it would be possible
to do 4 unpacks + 4 pshufbs instead of 12 unpacks, but that might not
be better, and not all cpus can do it. Proper AVX2 support should eliminate
the 8 128bit extracts, reduce these 12 packs to 6 and the 12 unpacks to 2
pshufb + 2 permq ideally (+ 2 final 128bit extracts).)
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This special packing path can be easily extended to handle not just
float->unorm8 but also float->snorm8 and uint32->uint8 and int32->int8
(i.e. all interesting cases for llvmpipe fs backend code).
The packing parts all stay the same (only the last step packing will
be signed->signed instead of signed->unsigned but luckily even sse2 can do
both).
While here also note some bugs with that (we keep the bugs identical to
what we did before on x86, albeit other archs may differ). In particular
float->unorm8 too large values will still get clamped to 0, not 255, and for
float->snorm8 NaNs will end up as -1, not 0 (but we do the clamp against 1.0
there to prevent too large values ending up as -1.0 - this is inconsistent
to unorm8 handling but is what we ended up before, I'm not sure we can get
away without it). This is quite fishy in any case as we depend on
arch-dependent behavior of the iround (my understanding is in fact with
altivec the conversion would actually saturate although I've no idea about
NaNs, so probably wouldn't need to do anything for snorm).
(There are only minimal piglit tests for unorm clamping behavior AFAICT, in
particular nothing seems to test values which are too large to be handled by
the float->int conversion.)
For uint32->uint8 we also do a min against MAX_INT, since the source for
the packs is always signed (again, on x86 - should probably be able to
express these arch-dependent bits better some day).
Reviewed-by: Jose Fonseca <[email protected]>
|