mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	gallium: add CAPs to support HW atomic counters. (v3)	Dave Airlie	2017-11-10	15	-1/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This looks like an evergreen specific feature, but with atomic counters AMD have hw specific counters they use instead of operating on buffers directly. These are separate to the buffer atomics, so require different limits and code paths. I've left the CAP for atomic type extensible in case someone else has a variant on this sort of thing (freedreno maybe?) and needs to change it. This adds all the CAPs required to add support for those atomic counters, along with a related CAP for limiting the number of output resources. I'd like to land this and the st patch then I can start to upstream the evergreen support for these and other GL4.x features. v2: drop the ATOMIC_COUNTER_MODE cap, just use the return from the HW counters. If 0 we use the current mode. v3: fix some rebase errors (Gert Wollny) Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Tested-By: Gert Wollny <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
*	r600/query: drop rest of vi workaround code.	Dave Airlie	2017-11-10	2	-37/+13
\| \| \| \| \| \| \| \|	This isn't needed in r600 anymore. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
*	broadcom/vc4: Mark BOs as purgeable when they enter the BO cache	Boris Brezillon	2017-11-09	3	-48/+86
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch makes use of the DRM_IOCTL_VC4_GEM_MADVISE ioctl to mark all BOs placed in the mesa BO cache as purgeable so that the system can reclaim this memory under memory pressure. v2: - Removed BOs from the cache when they've been purged by the kernel - Check whether the madvise ioctl is supported or not before using it v3: Don't walk the whole list when we find a busy BO (by anholt, acked by Boris) Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	meson: Enable VC4's NEON assembly support.	Eric Anholt	2017-11-09	1	-0/+13
\| \| \| \| \| \|	Reviewed-by: Dylan Baker <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Tested-by: Timothy Arceri <[email protected]>
*	meson: Always link libgallium_dri.so against dep_thread.	Eric Anholt	2017-11-09	1	-0/+1
\| \| \| \| \| \| \| \|	Somehow on my cross build the -pthread is getting lost. All the other deps seem to work out fine. Reviewed-by: Dylan Baker <[email protected]> Tested-by: Timothy Arceri <[email protected]>
*	radeonsi: pack r600_surface better	Marek Olšák	2017-11-09	1	-11/+11
\| \| \| \| \| \|	160 -> 136 bytes Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: pack r600_texture better	Marek Olšák	2017-11-09	1	-27/+26
\| \| \| \| \| \|	1752 -> 1736 bytes Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: clean up r600_surface	Marek Olšák	2017-11-09	2	-29/+11
\| \| \| \| \| \|	216 -> 160 bytes Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: remove r600_texture::non_disp_tiling	Marek Olšák	2017-11-09	2	-9/+0
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: remove DBG_NO_DISCARD_RANGE	Marek Olšák	2017-11-09	3	-5/+0
\| \| \| \|	Reviewed-by: Nicolai Hähnle <[email protected]>
*	st/dri: use stapi flush instead of pipe flush when creating fences	Nicolai Hähnle	2017-11-09	1	-5/+6
\| \| \| \| \| \| \| \| \|	There may be pending operations (e.g. vertices) that need to be flushed by the state tracker. Found by inspection. Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: use a threaded context even for debug contexts	Nicolai Hähnle	2017-11-09	1	-9/+2
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: record and dump time of flush	Nicolai Hähnle	2017-11-09	3	-1/+8
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	ddebug: optionally handle transfer commands like draws	Nicolai Hähnle	2017-11-09	4	-66/+288
\| \| \| \| \| \| \| \|	Transfer commands can have associated GPU operations. Enabled by passing GALLIUM_DDEBUG=transfers. Reviewed-by: Marek Olšák <[email protected]>
*	ddebug: dump context and before/after times of draws	Nicolai Hähnle	2017-11-09	2	-0/+10
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	ddebug: generalize print_named_xxx via a PRINT_NAMED macro	Nicolai Hähnle	2017-11-09	1	-15/+10
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	ddebug: rewrite to always use a threaded approach	Nicolai Hähnle	2017-11-09	4	-515/+546
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch has multiple goals: 1. Off-load the writing of records in 'always' mode to another thread for performance. 2. Allow using ddebug with threaded contexts. This really forces us to move some of the "after_draw" handling into another thread. 3. Simplify the different modes of ddebug, both in the code and in the user interface, i.e. GALLIUM_DDEBUG. In particular, there's no 'pipelined' anymore, since we're always pipelined; and 'noflush' is replaced by 'flush', since we no longer flush by default. 4. Fix the fences in pipelining mode. They previously relied on writes via pipe_context::clear_buffer. However, on radeonsi, those could (quite reasonably) end up in the SDMA buffer. So we use the newly added PIPE_FLUSH_{TOP,BOTTOM}_OF_PIPE fences instead. 5. Improve pipelined mode overall, using the finer grained information provided by the new fences. Overall, the result is that pipelined mode should be more useful, and using ddebug in default mode is much less invasive, in the sense that it changes the overall driver behavior less (which is kind of crucial for a driver debugging tool). An example of the new hang debug output: Gallium debugger active. Hang detection timeout is 1000ms. GPU hang detected, collecting information... Draw # driver prev BOP TOP BOP dump file ------------------------------------------------------------- 2 YES YES YES NO /home/nha/ddebug_dumps/shader_runner_19919_00000000 3 YES NO YES NO /home/nha/ddebug_dumps/shader_runner_19919_00000001 4 YES NO YES NO /home/nha/ddebug_dumps/shader_runner_19919_00000002 5 YES NO YES NO /home/nha/ddebug_dumps/shader_runner_19919_00000003 Done. We can see that there were almost certainly 4 draws in flight when the hang happened: the top-of-pipe fence was signaled for all 4 draws, the bottom-of-pipe fence for none of them. In virtually all cases, we'd expect the first draw in the list to be at fault, but due to the GPU parallelism, it's possible (though highly unlikely) that one of the later draws causes a component to get stuck in a way that prevents the earlier draws from making progress as well. (In the above example, there were actually only 3 draws truly in flight: the last draw is a blit that waits for the earlier draws; however, its top-of-pipe fence is emitted before the cache flush and wait, and so the fact that the draw hasn't truly started yet can only be seen from a closer inspection of GPU state.) Acked-by: Marek Olšák <[email protected]>
*	ddebug: use an atomic increment when numbering files	Nicolai Hähnle	2017-11-09	1	-1/+3
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	dd/util: extract dd_get_debug_filename_and_mkdir	Nicolai Hähnle	2017-11-09	1	-12/+18
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	gallium/u_dump: add and use util_dump_transfer_usage	Nicolai Hähnle	2017-11-09	4	-16/+61
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	gallium/u_dump: add util_dump_ns	Nicolai Hähnle	2017-11-09	2	-0/+13
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	gallium/u_dump: export util_dump_ptr	Nicolai Hähnle	2017-11-09	2	-2/+5
\| \| \| \| \| \|	Change format to %p while we're at it. Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: implement PIPE_FLUSH_{TOP,BOTTOM}_OF_PIPE	Nicolai Hähnle	2017-11-09	1	-1/+88
\| \| \| \| \| \| \|	v2: use uncached system memory for the fence, and use the CPU to clear it so we never read garbage when checking the fence Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: document some subtle details of fence_finish & fence_server_sync	Nicolai Hähnle	2017-11-09	1	-0/+22
\| \| \| \| \| \| \|	v2: remove the change to si_fence_server_sync, we'll handle that more robustly Reviewed-by: Marek Olšák <[email protected]>
*	gallium: add pipe_context::callback	Nicolai Hähnle	2017-11-09	3	-0/+58
\| \| \| \| \| \| \|	For running post-draw operations inside the driver thread. ddebug will use it. Reviewed-by: Marek Olšák <[email protected]>
*	gallium/u_threaded: implement pipe_context::set_log_context	Nicolai Hähnle	2017-11-09	1	-0/+11
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	gallium/u_threaded: avoid syncs for get_query_result	Nicolai Hähnle	2017-11-09	1	-17/+48
\| \| \| \| \| \| \| \| \| \|	Queries should still get marked as flushed when flushes are executed asynchronously in the driver thread. To this end, the management of the unflushed_queries list is moved into the driver thread. Reviewed-by: Marek Olšák <[email protected]>
*	gallium/u_threaded: implement asynchronous flushes	Nicolai Hähnle	2017-11-09	6	-27/+238
\| \| \| \| \| \| \| \| \| \| \| \| \|	This requires out-of-band creation of fences, and will be signaled to the pipe_context::flush implementation by a special TC_FLUSH_ASYNC flag. v2: - remove an incorrect assertion - handle fence_server_sync for unsubmitted fences by relying on the improved cs_add_fence_dependency - only implement asynchronous flushes on amdgpu Reviewed-by: Marek Olšák <[email protected]>
*	gallium/u_threaded: mark queries flushed only for non-deferred flushes	Nicolai Hähnle	2017-11-09	2	-4/+6
\| \| \| \| \| \| \| \| \| \| \|	The driver uses (and must use) the flushed flag of queries as a hint that it does not have to check for synchronization with currently queued up commands. Deferred flushes do not actually flush queued up commands, so we must not set the flushed flag for them. Found by inspection. Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: move fence functions to si_fence.c	Nicolai Hähnle	2017-11-09	6	-267/+312
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	winsys/amdgpu: handle cs_add_fence_dependency for deferred/unsubmitted fences	Nicolai Hähnle	2017-11-09	4	-12/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The idea is to fix the following interleaving of operations that can arise from deferred fences: Thread 1 / Context 1 Thread 2 / Context 2 -------------------- -------------------- f = deferred flush <------- application-side synchronization -------> fence_server_sync(f) ... flush() flush() We will now stall in fence_server_sync until the flush of context 1 has completed. This scenario was unlikely to occur previously, because applications seem to be doing Thread 1 / Context 1 Thread 2 / Context 2 -------------------- -------------------- f = glFenceSync() glFlush() <------- application-side synchronization -------> glWaitSync(f) ... and indeed they probably have to use this ordering to avoid deadlocks in the GLX model, where all GL operations conceptually go through a single connection to the X server. However, it's less clear whether applications have to do this with other WSI (i.e. EGL). Besides, even this sequence of GL commands can be translated into the Gallium-level sequence outlined above when Gallium threading and asynchronous flushes are used. So it makes sense to be more robust. As a side effect, we no longer busy-wait on submission_in_progress. We won't enable asynchronous flushes on radeon, but add a cs_add_fence_dependency stub anyway to document the potential issue. Reviewed-by: Marek Olšák <[email protected]>
*	gallium: add PIPE_FLUSH_{TOP,BOTTOM}_OF_PIPE bits	Nicolai Hähnle	2017-11-09	2	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These bits are intended to be used by the ddebug hang detection and are named in analogy to the Vulkan stage bits (and the corresponding Radeon pipeline event). Hang detection needs fences on the granularity of individual commands, which nothing else really covers. The closest alternative would have been PIPE_QUERY_GPU_FINISHED, but (a) queries are a per-context object and we really want a per-screen object, (b) queries don't offer a wait with timeout, and (c) in any case, PIPE_QUERY_GPU_FINISHED is meant to imply that GPU caches are flushed, which the new bits explicitly aren't. Reviewed-by: Marek Olšák <[email protected]>
*	gallium: add PIPE_FLUSH_ASYNC and PIPE_FLUSH_HINT_FINISH	Nicolai Hähnle	2017-11-09	3	-1/+18
\| \| \| \| \| \|	Also document some subtleties of pipe_context::flush. Reviewed-by: Marek Olšák <[email protected]>
*	gallium: remove unused and deprecated u_time.h	Nicolai Hähnle	2017-11-09	8	-157/+1
\| \| \| \| \|	Cc: Jose Fonseca <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	util: move os_time.[ch] to src/util	Nicolai Hähnle	2017-11-09	55	-380/+53
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: always use async compiles when creating shader/compute states	Nicolai Hähnle	2017-11-09	2	-34/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	With Gallium threaded contexts, creating shader/compute states is effectively a screen operation, so we should not use context state. In particular, this allows us to avoid using the context's LLVM TargetMachine. This isn't an issue yet because u_threaded_context filters out non-async debug callbacks, and we disable threaded contexts for debug contexts. However, we may want to change that in the future. Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: fix potential use-after-free of debug callbacks	Nicolai Hähnle	2017-11-09	1	-0/+4
\| \| \| \| \| \|	Found by inspection. Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: move pipe debug callback to si_context	Nicolai Hähnle	2017-11-09	6	-19/+19
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	util: move pipe_barrier into src/util and rename to util_barrier	Nicolai Hähnle	2017-11-09	4	-88/+13
\| \| \| \| \| \| \| \| \|	The #if guard is probably not 100% equivalent to the previous PIPE_OS check, but if anything it should be an over-approximation (are there pthread implementations without barriers?), so people will get either a good implementation or compile errors that are easy to fix. Reviewed-by: Marek Olšák <[email protected]>
*	gallium: add async debug message forwarding helper	Nicolai Hähnle	2017-11-09	4	-0/+192
\| \| \| \| \| \|	v2: use util_vasprintf for Windows portability Reviewed-by: Marek Olšák <[email protected]> (v1)
*	gallium: clarify the constraints on sampler_view_destroy	Nicolai Hähnle	2017-11-09	2	-6/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r600 expects the context that created the sampler view to still be alive (there is a per-context list of sampler views). svga currently bails when the context of destruction is not the same as creation. The GL state tracker, which is the only one that runs into the multi-context subtleties (due to share groups), already guarantees that sampler views are destroyed before their context of creation is destroyed. Most drivers are context-agnostic, so the warning message in pipe_sampler_view_release doesn't really make sense. Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: reduce the scope of sel->mutex in si_shader_select_with_key	Nicolai Hähnle	2017-11-09	1	-4/+4
\| \| \| \| \| \| \| \| \| \|	We only need the lock to guard changes in the variant linked list. The actual compilation can happen outside the lock, since we use the ready fence as a guard. v2: fix double-unlock Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: use ready fences on all shaders, not just optimized ones	Nicolai Hähnle	2017-11-09	3	-26/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There's a race condition between si_shader_select_with_key and si_bind_XX_shader: Thread 1 Thread 2 -------- -------- si_shader_select_with_key begin compiling the first variant (guarded by sel->mutex) si_bind_XX_shader select first_variant by default as state->current si_shader_select_with_key match state->current and early-out Since thread 2 never takes sel->mutex, it may go on rendering without a PM4 for that shader, for example. The solution taken by this patch is to broaden the scope of shader->optimized_ready to a fence shader->ready that applies to all shaders. This does not hurt the fast path (if anything it makes it faster, because we don't explicitly check is_optimized). It will also allow reducing the scope of sel->mutex locks, but this is deferred to a later commit for better bisectability. Fixes dEQP-EGL.functional.sharing.gles2.multithread.simple.buffers.bufferdata_render Reviewed-by: Marek Olšák <[email protected]>
*	r600g: use SIMPLE_FLOAT for blending to enable some optimizations	Ilia Mirkin	2017-11-08	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	Radeonsi also sets this flag. Seems to avoid pulling up the desintation RT value when the dst blend factor is zero if it's not otherwise being loaded. Among other things, it allows blending to overwrite infinity/NaN values in the destination RT. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	nv50: make blending work so that zero wins in a multiplication	Ilia Mirkin	2017-11-08	1	-0/+5
\| \| \| \| \| \| \|	This matches nvc0 behavior, tested with the fbo-float-nan piglit. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tobias Klausmann<[email protected]>
*	amdgpu: use simple mtx	Timothy Arceri	2017-11-09	5	-44/+45
\| \| \| \| \|	Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	mesa: use simple mtx in core mesa	Timothy Arceri	2017-11-09	1	-11/+11
\| \| \| \| \| \| \| \| \|	Results from x11perf -copywinwin10 on Eric's SKL: 4.33338% ± 0.905054% (n=40) Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Tested-by: Yogesh Marathe <[email protected]>
*	broadcom/vc5: Add vc5_drm.h to the release tarball	Andreas Boll	2017-11-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Fixes: 45bb8f29571 ("broadcom: Add V3D 3.3 gallium driver called "vc5", for BCM7268.") Cc: 17.3 <[email protected]> Signed-off-by: Andreas Boll <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
*	clover: use the unified check for c++11 instead of the gcc version number	Gert Wollny	2017-11-08	1	-3/+3
\| \| \| \| \| \| \| \|	So far clover based its test for compiler support on the version of gcc, while in reality support for c++11 is required. This patch replaces the version check by the check unified for all modules that require c++11. Reviewed-by: Emil Velikov <[email protected]>
*	swr: Replace the check for c++11 by the unified version	Gert Wollny	2017-11-08	1	-2/+2
\| \| \| \|	Reviewed-by: Emil Velikov <[email protected]>