| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This looks like an evergreen specific feature, but with atomic
counters AMD have hw specific counters they use instead of operating
on buffers directly. These are separate to the buffer atomics,
so require different limits and code paths.
I've left the CAP for atomic type extensible in case someone
else has a variant on this sort of thing (freedreno maybe?)
and needs to change it.
This adds all the CAPs required to add support for those atomic
counters, along with a related CAP for limiting the number of
output resources.
I'd like to land this and the st patch then I can start to
upstream the evergreen support for these and other GL4.x features.
v2: drop the ATOMIC_COUNTER_MODE cap, just use the return
from the HW counters. If 0 we use the current mode.
v3: fix some rebase errors (Gert Wollny)
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Tested-By: Gert Wollny <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
This isn't needed in r600 anymore.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
st_src_reg is a class, not a struct. Simply remove 'struct' to silence
a MSVC compiler warning (class vs. struct mismatch).
Reviewed-by; Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
| |
Also fix local variable declarations and replace -1 with BUFFER_NONE.
No Piglit changes.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
| |
BUFFER_NONE is -1 so no reason for GLint.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
| |
This function should probably be moved elsewhere, too.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
| |
Remove trailing whitespace, fix indentation, wrap lines to 78 columns, etc.
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to what we did for pixel shader threads - see gen_device_info.c.
We don't want to bump the actual Maximum Number of Threads though, so
we adjust it here. For pixel shaders, we don't use max_wm_threads, so
we could just bump it globally.
Supposedly fixes Piglit tests:
arb_gpu_shader_int64/execution/built-in-functions/cs-op-div-i64vec3-int64_t
arb_gpu_shader_int64/execution/built-in-functions/cs-op-div-i64vec4-int64_t
arb_gpu_shader_int64/execution/built-in-functions/cs-op-div-u64vec4-uint64_t
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes use of the DRM_IOCTL_VC4_GEM_MADVISE ioctl to mark all
BOs placed in the mesa BO cache as purgeable so that the system can
reclaim this memory under memory pressure.
v2:
- Removed BOs from the cache when they've been purged by the kernel
- Check whether the madvise ioctl is supported or not before using it
v3: Don't walk the whole list when we find a busy BO (by anholt, acked by
Boris)
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Tested-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
Somehow on my cross build the -pthread is getting lost. All the other
deps seem to work out fine.
Reviewed-by: Dylan Baker <[email protected]>
Tested-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
| |
Pushed ahead of things actually working.
This reverts commit 5293b96b160b904c0e53cbce93679c3aa090f846.
|
|
|
|
|
|
| |
160 -> 136 bytes
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
1752 -> 1736 bytes
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
216 -> 160 bytes
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This more or less ports EGL_KHR_no_config_context to GLX.
v2: Enable the extension only for those backends that support it.
Khronos: https://github.com/KhronosGroup/OpenGL-Registry/pull/102
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Adam Jackson <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This should be safe as these backends already support the EGL version of
this extension. DRI1 is not affected because it does not support
GLX_ARB_create_context anyway. DRI-Windows is not prepared to implement
this as there's no equivalent WGL extension, and wglCreateContextAttribs
seems to really want the HDC's pixel format to be set.
Signed-off-by: Adam Jackson <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Adam Jackson <[email protected]>
|
|
|
|
| |
Fixes: e3a8013de8ca ("util/u_queue: add util_queue_fence_wait_timeout")
|
|
|
|
|
|
|
|
| |
Fixes non-deterministic failures in
dEQP-EGL.functional.sharing.gles2.multithread.simple_egl_sync.images.texture_source.teximage2d_render
and others in dEQP-EGL.functional.sharing.gles2.multithread.*
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current value was introduced in commit a27180d0d8666, which claims
that it represents ~1.11 years. However, it is interpreted in nanoseconds,
so it actually only represents ~9.8 hours. That seems a bit short.
Use the largest value consistent with both int32 and int64. It
corresponds to ~292 years in nanoseconds.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
st_flush should flush state tracker-internal state and the pipe, but
not mesa/main state. Of the four callers:
- glFlush/glFinish already call FLUSH_{VERTICES,STATE}.
- st_vdpau doesn't need to call them.
- st_manager will now call them explicitly.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
There may be pending operations (e.g. vertices) that need to be flushed
by the state tracker.
Found by inspection.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
Transfer commands can have associated GPU operations.
Enabled by passing GALLIUM_DDEBUG=transfers.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch has multiple goals:
1. Off-load the writing of records in 'always' mode to another thread
for performance.
2. Allow using ddebug with threaded contexts. This really forces us to
move some of the "after_draw" handling into another thread.
3. Simplify the different modes of ddebug, both in the code and in
the user interface, i.e. GALLIUM_DDEBUG. In particular, there's
no 'pipelined' anymore, since we're always pipelined; and 'noflush'
is replaced by 'flush', since we no longer flush by default.
4. Fix the fences in pipelining mode. They previously relied on writes
via pipe_context::clear_buffer. However, on radeonsi, those could
(quite reasonably) end up in the SDMA buffer. So we use the newly
added PIPE_FLUSH_{TOP,BOTTOM}_OF_PIPE fences instead.
5. Improve pipelined mode overall, using the finer grained information
provided by the new fences.
Overall, the result is that pipelined mode should be more useful, and
using ddebug in default mode is much less invasive, in the sense that
it changes the overall driver behavior less (which is kind of crucial
for a driver debugging tool).
An example of the new hang debug output:
Gallium debugger active.
Hang detection timeout is 1000ms.
GPU hang detected, collecting information...
Draw # driver prev BOP TOP BOP dump file
-------------------------------------------------------------
2 YES YES YES NO /home/nha/ddebug_dumps/shader_runner_19919_00000000
3 YES NO YES NO /home/nha/ddebug_dumps/shader_runner_19919_00000001
4 YES NO YES NO /home/nha/ddebug_dumps/shader_runner_19919_00000002
5 YES NO YES NO /home/nha/ddebug_dumps/shader_runner_19919_00000003
Done.
We can see that there were almost certainly 4 draws in flight when
the hang happened: the top-of-pipe fence was signaled for all 4 draws,
the bottom-of-pipe fence for none of them. In virtually all cases,
we'd expect the first draw in the list to be at fault, but due to the
GPU parallelism, it's possible (though highly unlikely) that one of
the later draws causes a component to get stuck in a way that prevents
the earlier draws from making progress as well.
(In the above example, there were actually only 3 draws truly in flight:
the last draw is a blit that waits for the earlier draws; however, its
top-of-pipe fence is emitted before the cache flush and wait, and so
the fact that the draw hasn't truly started yet can only be seen from a
closer inspection of GPU state.)
Acked-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Change format to %p while we're at it.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
v2: use uncached system memory for the fence, and use the CPU to
clear it so we never read garbage when checking the fence
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
v2: remove the change to si_fence_server_sync, we'll handle that more
robustly
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
For running post-draw operations inside the driver thread. ddebug will
use it.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Queries should still get marked as flushed when flushes are executed
asynchronously in the driver thread.
To this end, the management of the unflushed_queries list is moved into
the driver thread.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This requires out-of-band creation of fences, and will be signaled to
the pipe_context::flush implementation by a special TC_FLUSH_ASYNC flag.
v2:
- remove an incorrect assertion
- handle fence_server_sync for unsubmitted fences by
relying on the improved cs_add_fence_dependency
- only implement asynchronous flushes on amdgpu
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The driver uses (and must use) the flushed flag of queries as a hint that
it does not have to check for synchronization with currently queued up
commands. Deferred flushes do not actually flush queued up commands, so
we must not set the flushed flag for them.
Found by inspection.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea is to fix the following interleaving of operations
that can arise from deferred fences:
Thread 1 / Context 1 Thread 2 / Context 2
-------------------- --------------------
f = deferred flush
<------- application-side synchronization ------->
fence_server_sync(f)
...
flush()
flush()
We will now stall in fence_server_sync until the flush of context 1
has completed.
This scenario was unlikely to occur previously, because applications
seem to be doing
Thread 1 / Context 1 Thread 2 / Context 2
-------------------- --------------------
f = glFenceSync()
glFlush()
<------- application-side synchronization ------->
glWaitSync(f)
... and indeed they probably *have* to use this ordering to avoid
deadlocks in the GLX model, where all GL operations conceptually
go through a single connection to the X server. However, it's less
clear whether applications have to do this with other WSI (i.e. EGL).
Besides, even this sequence of GL commands can be translated into
the Gallium-level sequence outlined above when Gallium threading
and asynchronous flushes are used. So it makes sense to be more
robust.
As a side effect, we no longer busy-wait on submission_in_progress.
We won't enable asynchronous flushes on radeon, but add a
cs_add_fence_dependency stub anyway to document the potential
issue.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These bits are intended to be used by the ddebug hang detection and are
named in analogy to the Vulkan stage bits (and the corresponding Radeon
pipeline event).
Hang detection needs fences on the granularity of individual commands,
which nothing else really covers. The closest alternative would have
been PIPE_QUERY_GPU_FINISHED, but (a) queries are a per-context object
and we really want a per-screen object, (b) queries don't offer a
wait with timeout, and (c) in any case, PIPE_QUERY_GPU_FINISHED is
meant to imply that GPU caches are flushed, which the new bits
explicitly aren't.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Also document some subtleties of pipe_context::flush.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
v2:
- style fixes
- fix missing timeout handling in futex path
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
C11 threads were changed to use struct timespec instead of xtime, and
thrd_sleep got a second argument.
See http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1554.htm and
http://en.cppreference.com/w/c/thread/{thrd_sleep,cnd_timedwait,mtx_timedlock}
Note that cnd_timedwait is spec'd to be relative to TIME_UTC / CLOCK_REALTIME.
v2: Fix Windows build errors. Tested with a default Appveyor config
that uses Visual Studio 2013. Judging from Brian's email and
random internet sources, Visual Studio 2015 does have timespec
and timespec_get, hence the _MSC_VER-based guard which I have
not tested.
Cc: Jose Fonseca <[email protected]>
Cc: Brian Paul <[email protected]>
Reviewed-by: Marek Olšák <[email protected]> (v1)
|
|
|
|
|
| |
Cc: Jose Fonseca <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|