mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	st/mesa: stop passing gl_linked_shader to set_affected_state_flags()	Timothy Arceri	2017-01-09	1	-7/+6
\| \| \| \| \| \|	We now get everything we need from the gl_program param. Reviewed-by: Nicolai Hähnle <[email protected]>
*	st/mesa/glsl: set num_images directly in shader_info	Timothy Arceri	2017-01-09	6	-20/+13
\| \| \| \| \| \|	This change also removes the now duplicate NumImages field. Reviewed-by: Nicolai Hähnle <[email protected]>
*	st/mesa: pass gl_program to st_bind_ssbos()	Timothy Arceri	2017-01-09	1	-21/+21
\| \| \| \| \| \|	We no longer need to pass gl_shader_program. Reviewed-by: Nicolai Hähnle <[email protected]>
*	nir: add another comparison simplification	Timothy Arceri	2017-01-09	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On BDW: total instructions in shared programs: 13061877 -> 13060965 (-0.01%) instructions in affected programs: 133569 -> 132657 (-0.68%) helped: 566 HURT: 0 total cycles in shared programs: 256611784 -> 256599536 (-0.00%) cycles in affected programs: 861016 -> 848768 (-1.42%) helped: 379 HURT: 73 Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences.	Kenneth Graunke	2017-01-09	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On BDW: total instructions in shared programs: 13074882 -> 13068703 (-0.05%) instructions in affected programs: 1823116 -> 1816937 (-0.34%) helped: 4187 HURT: 537 total cycles in shared programs: 256622718 -> 256425382 (-0.08%) cycles in affected programs: 123790120 -> 123592784 (-0.16%) helped: 3823 HURT: 2037 total spills in shared programs: 15276 -> 14929 (-2.27%) spills in affected programs: 9446 -> 9099 (-3.67%) helped: 352 HURT: 1 total fills in shared programs: 20496 -> 20144 (-1.72%) fills in affected programs: 13040 -> 12688 (-2.70%) helped: 352 HURT: 1 LOST: 2 GAINED: 21 v2: Rely on 'a' being a well-formed boolean (Connor, Eric). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Convert ineg(b2i(a)) to a if it's a boolean.	Kenneth Graunke	2017-01-09	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On BDW: total instructions in shared programs: 13071119 -> 13070371 (-0.01%) instructions in affected programs: 83424 -> 82676 (-0.90%) helped: 505 HURT: 45 (all TCS, all hurt by a single instruction) total cycles in shared programs: 256601322 -> 256588932 (-0.00%) cycles in affected programs: 819410 -> 807020 (-1.51%) helped: 450 HURT: 57 total loops in shared programs: 2950 -> 2942 (-0.27%) loops in affected programs: 8 -> 0 helped: 7 HURT: 0 v2: Drop unnecessary 'a@bool' annotation (Connor, Eric). Add a comment explaining the rule (Ian). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> [v1] Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965: Move TES input VUE map calculation out a layer.	Kenneth Graunke	2017-01-07	3	-9/+11
\| \| \| \| \| \| \| \| \| \| \|	In Vulkan, we'll compile the TCS and TES at the same time, so I can just pass the TCS output VUE map to brw_compile_tes as the TES input VUE map. So, we only need to do this in GL. Move it to the GL-specific layer. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Pass NULL for gl_program when compiling TES.	Kenneth Graunke	2017-01-07	1	-1/+1
\| \| \| \| \| \| \| \|	This isn't needed, and Vulkan doesn't have one. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Move TES spacing/domain/topology setup to brw_compile_tes().	Kenneth Graunke	2017-01-07	2	-33/+34
\| \| \| \| \| \| \| \|	Moving this down a layer lets us share code between Vulkan and GL. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Access TES shader info via NIR.	Kenneth Graunke	2017-01-07	1	-6/+6
\| \| \| \| \| \| \| \|	NIR exists in both GL and Vulkan, but gl_program is GL specific. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	mesa: Introduce a compiler enum for tessellation spacing.	Kenneth Graunke	2017-01-07	11	-47/+54
\| \| \| \| \| \| \| \| \| \|	It feels weird using GL_* enums in a Vulkan driver. v2: Fix the TESS_SPACING -> PIPE_TESS_SPACING conversion. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	compiler: Change shader_info->tes.vertex_order into a ccw boolean.	Kenneth Graunke	2017-01-07	4	-13/+7
\| \| \| \| \| \| \| \| \| \|	The vertex order is either clockwise or counterclockwise. We can just store a "ccw" boolean rather than GLenum values. I don't want to use GLenums in a Vulkan driver, and even in GL a simple boolean works fine. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	anv/pipeline: Call NIR passes using NIR_PASS_V	Jason Ekstrand	2017-01-07	1	-31/+15
\| \| \| \| \| \|	This lets us get validation without having to do it manually. Reviewed-by: Timothy Arceri <[email protected]>
*	anv/pipeline: Only call remove_dead_variables once	Jason Ekstrand	2017-01-07	1	-3/+3
\| \| \| \| \| \| \|	It can handle multiple modes at a time now so there's no reason to call it repeatedly. Reviewed-by: Timothy Arceri <[email protected]>
*	Revert recent GLSL slot counting fiasco.	Kenneth Graunke	2017-01-07	5	-62/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I apparently broke mark_whole_variable in ir_set_program_inouts. It was passing a type that wasn't var->type, so the wrapper didn't work out. It's all broken, revert it and start over. Fixes all kinds of things on other drivers. Revert "glsl: Make is_fixed_function_array actually check for varyings." This reverts commit 42699e12711668a142b7acf11c168cf4301c1295. Revert "glsl: Mark whole variable used for ClipDistance and TessLevel." This reverts commit 5c580e64cc206ab160e1767c42e4d6c81f67da4d. Revert "glsl: Override the # of varying slots for ClipDistance and TessLevel." This reverts commit 8b5749f65ac434961308ccb579fb8a816e4f29d5. Revert "glsl: Create and use a new ir_variable::count_attribute_slots() wrapper." This reverts commit 6aa5cb34d03765b7be8611aa516bc201bd337f73.
*	glsl: Make is_fixed_function_array actually check for varyings.	Kenneth Graunke	2017-01-07	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	We can't check VARYING_SLOT_* locations until we've determined that the variable is actually a varying. Fixes assert failures in drivers which actually use this path, such as radeonsi and i915. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99314 Signed-off-by: Kenneth Graunke <[email protected]>
*	drirc: Allow extension midshader for Divinity: Original Sin (EE)	Kai Wasserbäch	2017-01-07	1	-0/+4
\| \| \| \| \| \| \| \|	See also <https://bugs.freedesktop.org/show_bug.cgi?id=93551#c27> where this was first observed as a requirement. Signed-off-by: Kai Wasserbäch <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
*	glsl: fix opt_minmax redundancy checks against baserange	Timothy Arceri	2017-01-07	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Marking operations as redundant if they are equal to the base range is fine when the tree structure is something like this: max / \ max b / \ 3 max / \ 3 a But the opt falls apart with a tree like this: max / \ max max / \ / \ 3 a b 3 The problem is that both branches are treated the same: descending in the left branch will prune the constant, and then descending the right branch will prune the constant there as well, because limits[0] wasn't updated to take the change on the left branch into account, and so we still get [3,\infty) as baserange. In order to fix the bug we just disable the marking of redundant expressions when they match the baserange. NIR algebraic opt will clean up the first tree for anyway, hopefully other backends are smart enough to do this also. Cc: "13.0" <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	i965/compiler: Use the new nir_opt_copy_prop_vars pass	Jason Ekstrand	2017-01-06	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We run this after nir_lower_vars_to_ssa so that as many load/store_var intrinsics as possible before copy_prop_vars executes. This is because the pass isn't particularly efficient (it does a lot of linear walks of a linked list) so we'd like as much of the work as possible to be done before copy_prop_vars runs. Shader DB results on Sky Lake: total instructions in shared programs: 12020290 -> 12013627 (-0.06%) instructions in affected programs: 26033 -> 19370 (-25.59%) helped: 16 HURT: 13 total cycles in shared programs: 137772848 -> 137549012 (-0.16%) cycles in affected programs: 6955660 -> 6731824 (-3.22%) helped: 217 HURT: 237 total loops in shared programs: 3208 -> 3208 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 4112 -> 4057 (-1.34%) spills in affected programs: 483 -> 428 (-11.39%) helped: 2 HURT: 0 total fills in shared programs: 5519 -> 5102 (-7.56%) fills in affected programs: 993 -> 576 (-41.99%) helped: 2 HURT: 0 LOST: 0 GAINED: 0 Broadwell had similar results. On older hardware, the impact isn't as large because they don't advertise GL 4.5. Of the hurt programs, all but one are hurt by a single instruction and the one is hurt by 3 instructions. All of the helped programs, on the other hand, are helped by at least 3 instructions and one kerbal space program shader is helped by 44.59%. The real star of the show, however, is the Gl43CSDof synmark2 benchmark which has two shaders which are cut by 28% and 40% and the over-all runtime performance of the benchmark on my Sky Lake laptop is improved by around 25-30% (it's a bit hard to be exact due to thermal throttling). Reviewed-by: Timothy Arceri <[email protected]>
*	nir: Add a local variable-based copy propagation pass	Jason Ekstrand	2017-01-06	3	-0/+816
\| \| \| \|	Reviewed-by: Timothy Arceri <[email protected]>
*	nir/builder: Add a helper for getting the most recently added instruction	Jason Ekstrand	2017-01-06	1	-0/+7
\| \| \| \|	Reviewed-by: Timothy Arceri <[email protected]>
*	nir/builder: Add a load_deref_var helper	Jason Ekstrand	2017-01-06	1	-0/+16
\| \| \| \|	Reviewed-by: Timothy Arceri <[email protected]>
*	nir/dead_variables: Remove shader-local variables that are only written	Jason Ekstrand	2017-01-06	1	-9/+60
\| \| \| \|	Reviewed-by: Timothy Arceri <[email protected]>
*	nir/dead_variables: Removed shared variables when requested	Jason Ekstrand	2017-01-06	1	-0/+3
\| \| \| \|	Reviewed-by: Timothy Arceri <[email protected]>
*	anv/formats: Use the real format for B4G4R4A4_UNORM_PACK16 on gen8	Jason Ekstrand	2017-01-06	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because border color is handled pre-swizzle, when we move the alpha channel around in the format, the OPAQUE_BLACK border colors don't work correctly on B4G4R4A4_UNORM_PACK16 with the hack. This fixes the following Vulkan CTS tests on Broadwell: dEQP-VK.pipeline.sampler.view_type.2d_array.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black dEQP-VK.pipeline.sampler.view_type.1d_array.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black dEQP-VK.pipeline.sampler.view_type.2d.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black dEQP-VK.pipeline.sampler.view_type.1d.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black dEQP-VK.pipeline.sampler.view_type.3d.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black Reviewed-by: Kenneth Graunke <[email protected]> Cc: "13.0" <[email protected]>
*	isl: Mark A4B4G4R4_UNORM as supported on gen8	Jason Ekstrand	2017-01-06	1	-1/+4
\| \| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]> Cc: "13.0" <[email protected]>
*	radv: fix depth transitions with layerCount = VK_REMAINING_ARRAY_LAYERS	Pierre-Loup A. Griffais	2017-01-07	1	-1/+1
\| \| \| \| \| \| \| \| \|	Interpreting layerCount literally would try to create billions of image views in radv_process_depth_image_inplace(). Signed-off-by: Pierre-Loup A. Griffais <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	i965: Rework gl_TessLevel*[] handling to use NIR compact arrays.	Kenneth Graunke	2017-01-06	10	-364/+92
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Treating everything as scalar arrays allows us to drop a bunch of special case input/output munging all throughout the backend. Instead, we just need to remap the TessLevel components to the appropriate patch URB header locations in remap_patch_urb_offsets(). We also switch to treating the TES input versions of these as ordinary shader inputs rather than system values, as remap_patch_urb_offsets() just makes everything work out without special handling. This regresses one Piglit test: arb_tessellation_shader-large-uniforms/GL_TESS_CONTROL_SHADER-array-at-limit The compiler starts promoting the constant arrays assigned to gl_TessLevel* to uniform arrays. Since the shader also has a uniform array that uses the maximum number of uniform components, this puts it over the uniform component limit enforced by the linker. This is arguably a bug in the constant array promotion code (it should avoid pushing us over limits), but is unlikely to penalize any real application. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Inline store_output helper in quads workaround code.	Kenneth Graunke	2017-01-06	1	-14/+10
\| \| \| \| \| \| \| \| \|	It's only used in one place, it ignores the offset parameter currently, and I want to add more parameters...at which point, passing in a bunch of integers seems less obvious than writing it out. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: Make glsl_to_nir compact scalar TessLevel arrays.	Kenneth Graunke	2017-01-06	1	-1/+12
\| \| \| \| \|	Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Make unify_interfaces not spread VARYING_BIT_TESS_LEVEL_*.	Kenneth Graunke	2017-01-06	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \|	This is harmless today because gl_TessLevelInner/Outer in the TES is currently treated as system values. However, when we move to treating them as inputs, this would cause a bug: with no TCS present, it would propagate TES reads of VARYING_SLOT_TESS_LEVEL into the VS output VUE map slots. This is totally bogus - those don't even exist in the VS. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	glsl: Support gl_TessLevelInner/Outer[] as TES input variables.	Kenneth Graunke	2017-01-06	2	-4/+17
\| \| \| \| \| \| \| \| \| \|	Upcoming reworks in i965 are going to make it easy to handle this like any other input. Having it as a system value will just require additional code for no benefit. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
*	glsl: Mark whole variable used for ClipDistance and TessLevel*.	Kenneth Graunke	2017-01-06	1	-3/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There's no point in trying to mark partial array access for gl_ClipDistance, gl_TessLevelOuter, or gl_TessLevelInner - they're special built-in variables that control fixed function hardware, and will likely be used in an all-or-nothing fashion. Since these arrays only occupy 1-2 varying slots, we have to avoid our normal processing which increments the slot value by the array index. (I wrote this code before i965 switched from ir_set_program_inouts to nir_shader_gather_info. It's not used by anyone today, and I'm not sure how valuable it is...the alternative to GLSL IR lowering is NIR compact arrays, at which point you should use nir_gather_info.) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
*	glsl: Override the # of varying slots for ClipDistance and TessLevel*.	Kenneth Graunke	2017-01-06	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \|	Right now, this shouldn't have any effect, as all drivers use LowerClipDist and LowerTessFactors to turn the float[] arrays into vectors. However, it should help make it possible for drivers to avoid that lowering. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
*	glsl: Create and use a new ir_variable::count_attribute_slots() wrapper.	Kenneth Graunke	2017-01-06	5	-11/+17
\| \| \| \| \| \| \| \| \|	This wraps glsl_type::count_attribute_slots(), but will soon contain a couple of overrides for a couple of GLSL built-ins variables. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
*	gallium/radeon: use the internal clear_buffer callback to fix r600g	Marek Olšák	2017-01-06	1	-1/+3
\| \| \| \| \| \| \| \|	r600g doesn't set pipe_context::clear_buffer. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99303 Reviewed-by: Alex Deucher <[email protected]>
*	llvmpipe: do transpose/untwiddle after conversion for 8bit formats	Roland Scheidegger	2017-01-06	1	-7/+143
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Generally we should do tranpose after conversion, if the format has less than 32 bits per channel (if it has 32 bits, conversion is going to be a no-op anyway...). This is obviously because there's less vectors to deal with. Though the advantage for 16 bit formats isn't that big, and in fact with AVX there isn't really any (as the 32bit unpacks can be done with 256bit, but the smaller ones cannot, although that would change again with proper AVX2 support). Only makes sense for 2d and not 1d cases. And to keep things easy, only handle 1,2 and 4 channels (rgbx is just fine). For rgba unorm8 format the backend conversion sums up to these instruction totals (not counting the movs for SSE2 due to 2-op syntax - generally every 2 unpacks need an additional mov). SSE2 AVX transpose: 32 unpack 16 unpack untwiddle: 0 8 (128bit low/high permutes) convert: 16 mul + 16 cvt 8 mul + 8 cvt 32->8bit: 12 pack 8 (128bit extract) + 12 pack When doing transpose/untwiddle afterwards we get: convert: 16 mul + 16 cvt 8 mul + 8 cvt 32->8bit: 12 pack 8 (128bit extract) + 12 pack transpose/untwiddle 12 unpack 12 unpack So for SSE2, this drops 20 unpacks (total instruction count 76->56) whereas for AVX it replaces the 16 256bit unpacks with 8 128bit ones and drops the 8 lo/hi permutes (in total 60->48). (Albeit to be fair, the permutes could be dropped even when doing the transpose first, they are extremely pointless but we'd need to be able to tell lp_build_conv to reorder the vectors, for AVX2 we're going to need to be able to tell lp_build_conv about ordering in any case.) (With different ordering going into conversion, it would be possible to do 4 unpacks + 4 pshufbs instead of 12 unpacks, but that might not be better, and not all cpus can do it. Proper AVX2 support should eliminate the 8 128bit extracts, reduce these 12 packs to 6 and the 12 unpacks to 2 pshufb + 2 permq ideally (+ 2 final 128bit extracts).) Reviewed-by: Jose Fonseca <[email protected]>
*	gallivm: generalize 4x4f->1x16ub special case conversion	Roland Scheidegger	2017-01-06	1	-56/+118
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This special packing path can be easily extended to handle not just float->unorm8 but also float->snorm8 and uint32->uint8 and int32->int8 (i.e. all interesting cases for llvmpipe fs backend code). The packing parts all stay the same (only the last step packing will be signed->signed instead of signed->unsigned but luckily even sse2 can do both). While here also note some bugs with that (we keep the bugs identical to what we did before on x86, albeit other archs may differ). In particular float->unorm8 too large values will still get clamped to 0, not 255, and for float->snorm8 NaNs will end up as -1, not 0 (but we do the clamp against 1.0 there to prevent too large values ending up as -1.0 - this is inconsistent to unorm8 handling but is what we ended up before, I'm not sure we can get away without it). This is quite fishy in any case as we depend on arch-dependent behavior of the iround (my understanding is in fact with altivec the conversion would actually saturate although I've no idea about NaNs, so probably wouldn't need to do anything for snorm). (There are only minimal piglit tests for unorm clamping behavior AFAICT, in particular nothing seems to test values which are too large to be handled by the float->int conversion.) For uint32->uint8 we also do a min against MAX_INT, since the source for the packs is always signed (again, on x86 - should probably be able to express these arch-dependent bits better some day). Reviewed-by: Jose Fonseca <[email protected]>
*	llvmpipe: use alpha from already converted color if possible	Roland Scheidegger	2017-01-06	2	-18/+54
\| \| \| \| \| \| \| \| \| \| \| \| \|	For rgbx formats, there is no point in doing alpha conversion again (and with different tranpose even, so llvm can't eliminate it). Albeit it looks like there's some minimal changes needed in the blend code (found by code inspection, no test seemed to complain) if we do this - the blend factors are already sanitized if we have no destination alpha, however for src_alpha_saturate it looks like it still might make a difference (note that we forced has_alpha to true before for some formats and nothing complained, but this seems safer). Reviewed-by: Jose Fonseca <[email protected]>
*	llvmpipe: use scalar load instead of vectors for small vectors in fs backend	Roland Scheidegger	2017-01-06	1	-6/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	llvm has _huge_ problems trying to load things like <4 x i8> vectors and stitching such loads together to form 128bit vectors. My understanding of the problem is that the type legalizer tries to extend that to really a <4 x i32> vector and not a <16 x i8> vector with the 4 elements first then followed by padding, so the shuffles for then combining things together are more or less impossible - you can in fact see the pmovzxd llvm generates. Pre-4.0 llvm just gives up on it completely and does a 30+ pextrb/pinsrb sequence instead. It looks like current llvm has fixed this behavior (my guess would be due to better shuffle combination and load/shuffle folds), but we can avoid this by just loading as <1 x i32> values, combine that and only cast at the end. (I suspect it might also work if we'd pad the loaded vectors immediately before shuffling them together, instead of directly stitching 2 such vectors together pairwise before combining the pair. But this _might_ lose the ability to load the values directly into their right place in the vector with pinsrd.). But using 32bit values is probably easier for llvm as it will never give it funny ideas how the vector should look like. (This is possibly only a problem for 1x8bit formats, since 2x8bit will end up fetching 64bit hence only two vectors are stitched together, not 4, but we use the same strategy anyway.) Reviewed-by: Jose Fonseca <[email protected]>
*	i965: Enable several GLES 3.1 extensions on HSW+	Ian Romanick	2017-01-06	3	-6/+9
\| \| \| \| \| \| \| \| \|	The only reason we didn't previously enable this was the dependency on OpenGL ES 3.1. These should have been enabled as soon as HSW got stencil texturing. We also needed to fixup setting MaxViewports. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Always set MaxViewports and related limits	Ian Romanick	2017-01-06	1	-2/+1
\| \| \| \| \| \| \| \| \| \|	Since 9d6ca7c3, there should be no performance hit for having MaxViewports > 1. Always set this context state. This eliminates the need to update this conditional as we add support for OES_viewport_array on older GPUs. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	winsys/amdgpu: fix a race condition between fence updates and IB submissions	Marek Olšák	2017-01-06	2	-18/+22
\| \| \| \| \| \| \| \| \| \|	The CS thread is needed to ensure proper ordering of operations and can't be disabled (without complicating the code). Discovered by Nine CSMT, which ended up in a deadlock. Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: add TC L2 prefetch for shaders and VBO descriptors	Marek Olšák	2017-01-06	3	-1/+50
\| \| \| \| \|	Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: add CP DMA flags for greater control over synchronization	Marek Olšák	2017-01-06	3	-16/+31
\| \| \| \| \| \| \|	for L2 prefetch Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: cleanly communicate which CP DMA packet is first	Marek Olšák	2017-01-06	1	-11/+21
\| \| \| \| \|	Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	gallium/radeon: add new HUD query num-SDMA-IBs	Marek Olšák	2017-01-06	9	-2/+21
\| \| \| \| \|	Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	gallium/radeon: rename the num-ctx-flushes query to num-GFX-IBs	Marek Olšák	2017-01-06	9	-14/+14
\| \| \| \| \|	Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: add HUD queries for cache flush stats	Marek Olšák	2017-01-06	4	-0/+32
\| \| \| \| \|	Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
*	radeonsi: don't count fast clears and prefetches into CP DMA stats	Marek Olšák	2017-01-06	1	-2/+6
\| \| \| \| \|	Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>