mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	winsys/amdgpu: handle cs_add_fence_dependency for deferred/unsubmitted fences	Nicolai Hähnle	2017-11-09	4	-12/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The idea is to fix the following interleaving of operations that can arise from deferred fences: Thread 1 / Context 1 Thread 2 / Context 2 -------------------- -------------------- f = deferred flush <------- application-side synchronization -------> fence_server_sync(f) ... flush() flush() We will now stall in fence_server_sync until the flush of context 1 has completed. This scenario was unlikely to occur previously, because applications seem to be doing Thread 1 / Context 1 Thread 2 / Context 2 -------------------- -------------------- f = glFenceSync() glFlush() <------- application-side synchronization -------> glWaitSync(f) ... and indeed they probably have to use this ordering to avoid deadlocks in the GLX model, where all GL operations conceptually go through a single connection to the X server. However, it's less clear whether applications have to do this with other WSI (i.e. EGL). Besides, even this sequence of GL commands can be translated into the Gallium-level sequence outlined above when Gallium threading and asynchronous flushes are used. So it makes sense to be more robust. As a side effect, we no longer busy-wait on submission_in_progress. We won't enable asynchronous flushes on radeon, but add a cs_add_fence_dependency stub anyway to document the potential issue. Reviewed-by: Marek Olšák <[email protected]>
*	gallium: add PIPE_FLUSH_{TOP,BOTTOM}_OF_PIPE bits	Nicolai Hähnle	2017-11-09	2	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These bits are intended to be used by the ddebug hang detection and are named in analogy to the Vulkan stage bits (and the corresponding Radeon pipeline event). Hang detection needs fences on the granularity of individual commands, which nothing else really covers. The closest alternative would have been PIPE_QUERY_GPU_FINISHED, but (a) queries are a per-context object and we really want a per-screen object, (b) queries don't offer a wait with timeout, and (c) in any case, PIPE_QUERY_GPU_FINISHED is meant to imply that GPU caches are flushed, which the new bits explicitly aren't. Reviewed-by: Marek Olšák <[email protected]>
*	gallium: add PIPE_FLUSH_ASYNC and PIPE_FLUSH_HINT_FINISH	Nicolai Hähnle	2017-11-09	3	-1/+18
\| \| \| \| \| \|	Also document some subtleties of pipe_context::flush. Reviewed-by: Marek Olšák <[email protected]>
*	util/u_queue: add util_queue_fence_wait_timeout	Nicolai Hähnle	2017-11-09	4	-26/+121
\| \| \| \| \| \| \| \|	v2: - style fixes - fix missing timeout handling in futex path Reviewed-by: Marek Olšák <[email protected]>
*	threads: update for late C11 changes	Nicolai Hähnle	2017-11-09	4	-64/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	C11 threads were changed to use struct timespec instead of xtime, and thrd_sleep got a second argument. See http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1554.htm and http://en.cppreference.com/w/c/thread/{thrd_sleep,cnd_timedwait,mtx_timedlock} Note that cnd_timedwait is spec'd to be relative to TIME_UTC / CLOCK_REALTIME. v2: Fix Windows build errors. Tested with a default Appveyor config that uses Visual Studio 2013. Judging from Brian's email and random internet sources, Visual Studio 2015 does have timespec and timespec_get, hence the _MSC_VER-based guard which I have not tested. Cc: Jose Fonseca <[email protected]> Cc: Brian Paul <[email protected]> Reviewed-by: Marek Olšák <[email protected]> (v1)
*	gallium: remove unused and deprecated u_time.h	Nicolai Hähnle	2017-11-09	8	-157/+1
\| \| \| \| \|	Cc: Jose Fonseca <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	util: move os_time.[ch] to src/util	Nicolai Hähnle	2017-11-09	57	-78/+76
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: always use async compiles when creating shader/compute states	Nicolai Hähnle	2017-11-09	2	-34/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	With Gallium threaded contexts, creating shader/compute states is effectively a screen operation, so we should not use context state. In particular, this allows us to avoid using the context's LLVM TargetMachine. This isn't an issue yet because u_threaded_context filters out non-async debug callbacks, and we disable threaded contexts for debug contexts. However, we may want to change that in the future. Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: fix potential use-after-free of debug callbacks	Nicolai Hähnle	2017-11-09	1	-0/+4
\| \| \| \| \| \|	Found by inspection. Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: move pipe debug callback to si_context	Nicolai Hähnle	2017-11-09	6	-19/+19
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	u_queue: add util_queue_finish for waiting for previously added jobs	Nicolai Hähnle	2017-11-09	2	-0/+37
\| \| \| \| \| \| \| \| \|	Schedule one job for every thread, and wait on a barrier inside the job execution function. v2: avoid alloca (fixes Windows build error) Reviewed-by: Marek Olšák <[email protected]> (v1)
*	util: move pipe_barrier into src/util and rename to util_barrier	Nicolai Hähnle	2017-11-09	5	-88/+87
\| \| \| \| \| \| \| \| \|	The #if guard is probably not 100% equivalent to the previous PIPE_OS check, but if anything it should be an over-approximation (are there pthread implementations without barriers?), so people will get either a good implementation or compile errors that are easy to fix. Reviewed-by: Marek Olšák <[email protected]>
*	gallium: add async debug message forwarding helper	Nicolai Hähnle	2017-11-09	4	-0/+192
\| \| \| \| \| \|	v2: use util_vasprintf for Windows portability Reviewed-by: Marek Olšák <[email protected]> (v1)
*	st/mesa: guard sampler views changes with a mutex	Nicolai Hähnle	2017-11-09	5	-96/+250
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some locking is unfortunately required, because well-formed GL programs can have multiple threads racing to access the same texture, e.g.: two threads/contexts rendering from the same texture, or one thread destroying a context while the other is rendering from or modifying a texture. Since even the simple mutex caused noticable slowdowns in the piglit drawoverhead micro-benchmark, this patch uses a slightly more involved approach to keep locks out of the fast path: - the initial lookup of sampler views happens without taking a lock - a per-texture lock is only taken when we have to modify the sampler view(s) - since each thread mostly operates only on the entry corresponding to its context, the main issue is re-allocation of the sampler view array when it needs to be grown, but the old copy is not freed Old copies of the sampler views array are kept around in a linked list until the entire texture object is deleted. The total memory wasted in this way is roughly equal to the size of the current sampler views array. Fixes non-deterministic memory corruption in some dEQP-EGL.functional.sharing.gles2.multithread.* tests, e.g. dEQP-EGL.functional.sharing.gles2.multithread.simple.images.texture_source.create_texture_render Reviewed-by: Marek Olšák <[email protected]>
*	st/mesa: re-arrange st_finalize_texture	Nicolai Hähnle	2017-11-09	2	-8/+11
\| \| \| \| \| \| \| \| \| \|	Move the early-out for surface-based textures earlier. This narrows the scope of the locking added in a follow-up commit. Fix one remaining case of initializing a surface-based texture without properly finalizing it. Reviewed-by: Marek Olšák <[email protected]>
*	gallium: clarify the constraints on sampler_view_destroy	Nicolai Hähnle	2017-11-09	3	-7/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r600 expects the context that created the sampler view to still be alive (there is a per-context list of sampler views). svga currently bails when the context of destruction is not the same as creation. The GL state tracker, which is the only one that runs into the multi-context subtleties (due to share groups), already guarantees that sampler views are destroyed before their context of creation is destroyed. Most drivers are context-agnostic, so the warning message in pipe_sampler_view_release doesn't really make sense. Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: reduce the scope of sel->mutex in si_shader_select_with_key	Nicolai Hähnle	2017-11-09	1	-4/+4
\| \| \| \| \| \| \| \| \| \|	We only need the lock to guard changes in the variant linked list. The actual compilation can happen outside the lock, since we use the ready fence as a guard. v2: fix double-unlock Reviewed-by: Marek Olšák <[email protected]>
*	radeonsi: use ready fences on all shaders, not just optimized ones	Nicolai Hähnle	2017-11-09	3	-26/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There's a race condition between si_shader_select_with_key and si_bind_XX_shader: Thread 1 Thread 2 -------- -------- si_shader_select_with_key begin compiling the first variant (guarded by sel->mutex) si_bind_XX_shader select first_variant by default as state->current si_shader_select_with_key match state->current and early-out Since thread 2 never takes sel->mutex, it may go on rendering without a PM4 for that shader, for example. The solution taken by this patch is to broaden the scope of shader->optimized_ready to a fence shader->ready that applies to all shaders. This does not hurt the fast path (if anything it makes it faster, because we don't explicitly check is_optimized). It will also allow reducing the scope of sel->mutex locks, but this is deferred to a later commit for better bisectability. Fixes dEQP-EGL.functional.sharing.gles2.multithread.simple.buffers.bufferdata_render Reviewed-by: Marek Olšák <[email protected]>
*	u_queue: add a futex-based implementation of fences	Nicolai Hähnle	2017-11-09	2	-0/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fences are now 4 bytes instead of 96 bytes (on my 64-bit system). Signaling a fence is a single atomic operation in the fast case plus a syscall in the slow case. Testing if a fence is signaled is the same as before (a simple comparison), but waiting on a fence is now no more expensive than just testing it in the fast (already signaled) case. v2: - style fixes - use p_atomic_xxx macros with the right barriers Acked-by: Marek Olšák <[email protected]>
*	u_queue: add util_queue_fence_reset	Nicolai Hähnle	2017-11-09	2	-3/+14
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	u_queue: export util_queue_fence_signal	Nicolai Hähnle	2017-11-09	2	-1/+2
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	u_queue: group fence functions together	Nicolai Hähnle	2017-11-09	1	-9/+10
\| \| \| \|	Reviewed-by: Marek Olšák <[email protected]>
*	util/u_atomic: add p_atomic_xchg	Nicolai Hähnle	2017-11-09	1	-1/+31
\| \| \| \| \| \| \| \| \|	The closest to it in the old-style gcc builtins is __sync_lock_test_and_set, however, that is only guaranteed to work with values 0 and 1 and only provides an acquire barrier. I also don't know about other OSes, so we provide a simple & stupid emulation via p_atomic_cmpxchg. Reviewed-by: Marek Olšák <[email protected]>
*	util: move futex helpers into futex.h	Nicolai Hähnle	2017-11-09	4	-21/+57
\| \| \| \| \| \|	v2: style fixes Reviewed-by: Marek Olšák <[email protected]> (v1)
*	glsl: Make #pragma STDGL invariant(all) only modify outputs.	Kenneth Graunke	2017-11-08	1	-24/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to the GLSL ES 3.20, GLSL 4.50, and GLSL 1.20 specs: "To force all output variables to be invariant, use the pragma #pragma STDGL invariant(all) before all declarations in a shader." Notably, this is only supposed to affect output variables. Furthermore, "Only variables output from a shader can be candidates for invariance." It looks like this has been wrong since we first supported the pragma in 2011 (commit 86b4398cd158024f6be9fa830554a11c2a7ebe0c). Fixes dEQP-GLES2.functional.shaders.preprocessor.pragmas.pragma_fragment. v2: Now that all cases are identical (other than compute shaders, which have no output variables anyway), we can drop the switch statement entirely. We also don't need the current_function == NULL check; this was a hold over from when we had a single var_mode_out for both function parameters and shader varyings, in the bad old days. Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
*	i965: expose SRGB visuals and turn on EGL_KHR_gl_colorspace	Tapani Pälli	2017-11-09	3	-7/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch exposes sRGB visuals and adds DRI integer query support for __DRI2_RENDERER_HAS_FRAMEBUFFER_SRGB. Further changes make sure that we mark if the app explicitly wanted sRGB and for these framebuffers we don't turn sRGB off in intel_gles3_srgb_workaround. This way we keep compatibility for existing applications relying on default sRGB and ony add more visual support. With this change, following dEQP tests start to pass: dEQP-EGL.functional.wide_color.window_8888_colorspace_srgb dEQP-EGL.functional.wide_color.pbuffer_8888_colorspace_srgb v2: some code cleanup (Emil Velikov) update num_formats correctly (reported by [email protected]) v3: cleanup, remove redundant is_srgb rename explicit_srgb as 'need_srgb' to follow style better Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Emil Velikov <[email protected]> (v2) Reviewed-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102264 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102354 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102503
*	glsl: Transform fb buffers are only active if a variable uses them	Neil Roberts	2017-11-09	1	-9/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The GL spec will soon be revised to clarify that a buffer binding for a transform feedback buffer is only required if a variable is actually defined to use the buffer binding point. Previously a declaration for the default transform buffer would make it require a binding even if nothing was declared to use the default buffer. Affects: KHR-GL44/45.enhanced_layouts.xfb_stride_of_empty_list KHR-GL44/45.enhanced_layouts.xfb_stride_of_empty_list_and_api Reviewed-by: Nicolai Hähnle <[email protected]> Cc: [email protected]
*	intel/nir: Use the correct indirect lowering masks in link_shaders	Jason Ekstrand	2017-11-08	1	-6/+4
\| \| \| \| \| \| \| \| \|	Previously, if we were linking a vec4 VS with a SIMD8/16 FS, we wouldn't lower indirects on the fragment shader which is wrong. Instead of using a single indirect mask, take advantage of our new little helper. Reviewed-by: Timothy Arceri <tarceri at itsqueeze.com> Cc: [email protected]
*	r600g: use SIMPLE_FLOAT for blending to enable some optimizations	Ilia Mirkin	2017-11-08	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	Radeonsi also sets this flag. Seems to avoid pulling up the desintation RT value when the dst blend factor is zero if it's not otherwise being loaded. Among other things, it allows blending to overwrite infinity/NaN values in the destination RT. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	nv50: make blending work so that zero wins in a multiplication	Ilia Mirkin	2017-11-08	1	-0/+5
\| \| \| \| \| \| \|	This matches nvc0 behavior, tested with the fbo-float-nan piglit. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tobias Klausmann<[email protected]>
*	glsl: Minor cleanups after previous commit	Ian Romanick	2017-11-08	1	-18/+11
\| \| \| \| \| \| \| \| \| \| \| \|	I think it's more clear to only call emit_access once. The only difference between the two calls is the value of size_mul used for the offset parameter... but you really have to look at it to be sure. The s/is_64bit/is_double/ change is because there are no int64_t or uint64_t matrix types. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
*	glsl: Use more link_calculate_matrix_stride in lower_buffer_access	Ian Romanick	2017-11-08	1	-20/+2
\| \| \| \| \| \| \| \|	I was going to squash this with the previous commit, but there's a lot of churn in that commit. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
*	glsl: Use link_calculate_matrix_stride in lower_buffer_access and friends	Ian Romanick	2017-11-08	4	-70/+42
\| \| \| \| \|	Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
*	glsl: Refactor matrix stride calculation into a utility function	Ian Romanick	2017-11-08	2	-11/+50
\| \| \| \| \|	Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
*	glsl/linker: Optimize swizzles again after linking	Ian Romanick	2017-11-08	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Without this, the SPIR-V generator has to deal with a bunch of junk like: (swiz z (swiz xxx (swiz x (var_ref packed:binormal.z,light_dir)))) It seems better to cull that stuff out than to add code to deal with it. The problem is the way swizzles to and from scalars have to be handled in SPIR-V. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]>
*	glsl: Combine nop-swizzle optimization with swizzle-swizzle optimization	Ian Romanick	2017-11-08	7	-118/+52
\| \| \| \| \|	Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: <[email protected]>
*	glsl: Make the swizzle-swizzle optimization greedy	Ian Romanick	2017-11-08	1	-30/+29
\| \| \| \| \| \| \| \|	If there is a long sequence of swizzled swizzles, compact all of them down to a single swizzle. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: <[email protected]>
*	glsl: Remove program_resource_visitor::visit_field(const glsl_struct_field *)	Ian Romanick	2017-11-08	2	-18/+0
\| \| \| \| \| \| \|	I could not find any remaining users. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	glsl: Silence unused parameter warning	Ian Romanick	2017-11-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	glsl/lower_shared_reference.cpp: In member function ‘virtual void {anonymous}::lower_shared_reference_visitor::insert_buffer_access(void, ir_dereference, const glsl_type, ir_rvalue, unsigned int, int)’: glsl/lower_shared_reference.cpp:244:58: warning: unused parameter ‘channel’ [-Wunused-parameter] int channel) ^~~~~~~ Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	ac/nir: add support for all intrinsics. (v2)	Dave Airlie	2017-11-09	1	-1/+31
\| \| \| \| \| \| \| \| \| \| \|	This is derived from tgsi/radeonsi code from the GLSL intrinsics. This should pre-fix radv for the upcoming spirv patches. v2: actually use wait_cnt, sleep deprived dad time! (Bas) Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
*	amdgpu: use simple mtx	Timothy Arceri	2017-11-09	5	-44/+45
\| \| \| \| \|	Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	mesa: use simple mtx in core mesa	Timothy Arceri	2017-11-09	14	-88/+89
\| \| \| \| \| \| \| \| \|	Results from x11perf -copywinwin10 on Eric's SKL: 4.33338% ± 0.905054% (n=40) Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> Tested-by: Yogesh Marathe <[email protected]>
*	mesa: Add new fast mtx_t mutex type for basic use cases	Timothy Arceri	2017-11-09	6	-24/+162
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While modern pthread mutexes are very fast, they still incur a call to an external DSO and overhead of the generality and features of pthread mutexes. Most mutexes in mesa only needs lock/unlock, and the idea here is that we can inline the atomic operation and make the fast case just two intructions. Mutexes are subtle and finicky to implement, so we carefully copy the implementation from Ulrich Dreppers well-written and well-reviewed paper: "Futexes Are Tricky" http://www.akkadia.org/drepper/futex.pdf We implement "mutex3", which gives us a mutex that has no syscalls on uncontended lock or unlock. Further, the uncontended case boils down to a cmpxchg and an untaken branch and the uncontended unlock is just a locked decr and an untaken branch. We use __builtin_expect() to indicate that contention is unlikely so that gcc will put the contention code out of the main code flow. A fast mutex only supports lock/unlock, can't be recursive or used with condition variables. We keep the pthread mutex implementation around as for the few places where we use condition variables or recursive locking. For platforms or compilers where futex and atomics aren't available, simple_mtx_t falls back to the pthread mutex. The pthread mutex lock/unlock overhead shows up on benchmarks for CPU bound applications. Most CPU bound cases are helped and some of our internal bind_buffer_object heavy benchmarks gain up to 10%. Signed-off-by: Kristian Høgsberg <[email protected]> Signed-off-by: Timothy Arceri <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	mesa: rework how we free gl_shader_program_data	Timothy Arceri	2017-11-09	3	-42/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When I introduced gl_shader_program_data one of the intentions was to fix a bug where a failed linking attempt freed data required by a currently active program. However I seem to have failed to finish hooking up the final steps required to have the data hang around. Here we create a fresh instance of gl_shader_program_data every time we link. gl_program has a reference to gl_shader_program_data so it will be freed once the program is no longer active. Cc: "17.2 17.3" <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Neil Roberts <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102177
*	glsl: use the correct parent when allocating program data members	Timothy Arceri	2017-11-09	4	-8/+8
\| \| \| \| \| \|	Cc: "17.2 17.3" <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	glsl: drop cache_fallback	Timothy Arceri	2017-11-09	5	-77/+55
\| \| \| \| \| \| \| \| \| \|	This turned out to be a dead end, it is much easier and less error prone to just cache the IR used by the drivers backend e.g. TGSI or NIR. Cc: "17.2 17.3" <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: properly initialize brw->cs.base.stage to MESA_SHADER_COMPUTE	Kenneth Graunke	2017-11-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This has a bit of a surprising effect: For the render pipeline, the upload_sampler_state_table atom emits 3DSTATE_BINDING_TABLE_POINTERS_XS. It tries to avoid this for compute: if (GEN_GEN >= 7 && stage_state->stage != MESA_SHADER_COMPUTE) { /* Emit a 3DSTATE_SAMPLER_STATE_POINTERS_XS packet. */ genX(emit_sampler_state_pointers_xs)(brw, stage_state); } ... However, we were failing to initialize brw->cs.base.stage, so it was left as 0 (MESA_SHADER_VERTEX), causing this condition to break. We then emitted 3DSTATE_SAMPLER_STATE_POINTERS_VS in GPGPU mode, when trying to upload CS samplers. Nothing good can come of this. Found by inspection while debugging a GPU hang. Jordan believes this helps the Deus Ex: Mankind Divided benchmark mode's stability when running with shader cache. Cc: [email protected] Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	intel/nir: Break the linking code into a helper in brw_nir.c	Jason Ekstrand	2017-11-08	3	-34/+40
\| \| \| \| \|	Reviewed-by: Timothy Arceri <tarceri at itsqueeze.com> Cc: [email protected]
*	intel/nir: Add a helper for getting the NoIndirect mask	Jason Ekstrand	2017-11-08	1	-14/+19
\| \| \| \| \|	Reviewed-by: Timothy Arceri <tarceri at itsqueeze.com> Cc: [email protected]
*	nir: Don't print swizzles when there are more than 4 components	Matt Turner	2017-11-08	1	-1/+1
\| \| \| \| \| \| \| \| \|	... as can happen with various types like mat4, or else we'll smash the stack writing past the end of components_local[]. Fixes: 5a0d3e1129b7 ("nir: Print the components referenced for split or packed shader in/outs.") Reviewed-by: Jason Ekstrand <[email protected]>