mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	i965: Remove unused hw_must_use_separate_stencil	Ben Widawsky	2016-01-13	3	-5/+1
\| \| \| \| \| \| \| \| \| \|	I spotted this while looking for what needs updating in future platforms. I'm too lazy to go through the git logs, but it was probably missed by Jason when all the brw refactoring happened. Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Drop extra newline from shader compile messages.	Matt Turner	2016-01-13	2	-2/+2
\| \| \| \| \|	Ilia changed shader-db's run.c to not expect messages to contain a newline in shader-db commit 51bbc8035.
*	glsl: Delete the ir_binop_bfm and ir_triop_bfi opcodes.	Kenneth Graunke	2016-01-13	4	-29/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	TGSI doesn't use these - it just translates ir_quadop_bitfield_insert directly. NIR can handle ir_quadop_bitfield_insert as well. These opcodes were only used for i965, and with Jason's recent patches, we can do this lowering in NIR (which also gains us SPIR-V handling). So there's not much point to retaining this GLSL IR lowering code. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
*	i965/fs: Skip assertion on NaN.	Matt Turner	2016-01-13	1	-1/+2
\| \| \| \| \| \| \| \| \|	A shader in Unreal4 uses the result of divide by zero in its color output, producing NaN and triggering this assertion since NaN is not equal to itself. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93560 Reviewed-by: Iago Toral Quiroga <[email protected]>
*	i965/fs: Add debugging to constant combining pass.	Matt Turner	2016-01-13	1	-1/+20
\| \| \| \|	Reviewed-by: Iago Toral Quiroga <[email protected]>
*	meta: remove const qualifier on _mesa_meta_fb_tex_blit_begin()	Brian Paul	2016-01-13	2	-2/+2
\| \| \| \| \| \|	To silence a compiler warning about a const/non-const mismatch. Reviewed-by: Ian Romanick <[email protected]>
*	i965/gen9: Don't allow the RGBX formats for texturing/rendering	Neil Roberts	2016-01-13	1	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The RGBX surface formats aren't renderable so we internally remap them to RGBA when rendering. They are retained as RGBX when used as textures. However since the previous patch fast clears are disabled for surfaces that use a different format for rendering than for texturing. To avoid this situation we can just pretend not to support RGBX formats at all. This will cause the upper layers of mesa to pick an RGBA format internally instead. This should be safe because we always override the alpha component to 1.0 for RGBX in the texture swizzle anyway. We could also do this for all gens except that it's a bit more difficult when the hardware doesn't support texture swizzling. Gens using the blorp have further problems because that doesn't implement this swizzle override. Reviewed-by: Anuj Phogat <[email protected]>
*	i965: Mark TCS URB writes as having side effects.	Kenneth Graunke	2016-01-12	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds barrier dependencies around TCS_OPCODE_URB_WRITE, preventing reads and writes from being incorrectly scheduled. Fixes rendering in GFXBench 4.0's tessellation demo. For some reason, we haven't ever listed URB writes as having side-effects. This hasn't been a problem because in most stages, we never read from the URB, and only write to each location once. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93526 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
*	meta: Unconditionally set GL_SKIP_DECODE_EXT	Ian Romanick	2016-01-11	2	-11/+4
\| \| \| \| \| \| \| \| \| \| \| \|	The path that depends on this will be avoided (by fallback_required) if the extension is not supported. _mesa_set_sampler_srgb_decode does not generate GL errors (by design), so there are no problems there. I kept this change separate and last because it is one of the few in the series that is not a candidate for the stable branch. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	meta: Only bind the sampler in one place	Ian Romanick	2016-01-11	2	-8/+4
\| \| \| \| \| \| \| \| \| \| \|	All of the calls after the first _mesa_bind_sampler call are DSA style calls that don't depend on the current binding. I kept this change separate and last because it is one of the few in the series that is not a candidate for the stable branch. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	meta/decompress: Don't pollute the sampler object namespace	Ian Romanick	2016-01-11	1	-7/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tl;dr: For many types of GL object, we can NEVER use the Gen function. In OpenGL ES (all versions!) and OpenGL compatibility profile, applications don't have to call Gen functions. The GL spec is very clear about how you can mix-and-match generated names and non-generated names: you can use any name you want for a particular object type until you call the Gen function for that object type. Here's the problem scenario: - Application calls a meta function that generates a name. The first Gen will probably return 1. - Application decides to use the same name for an object of the same type without calling Gen. Many demo programs use names 1, 2, 3, etc. without calling Gen. - Application calls the meta function again, and the meta function replaces the data. The application's data is lost, and the app fails. Have fun debugging that. Signed-off-by: Ian Romanick <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363 Reviewed-by: Jason Ekstrand <[email protected]>
*	meta/decompress: Save and restore the sampler using gl_sampler_object ↵	Ian Romanick	2016-01-11	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \|	instead of GL API object handle Some meta operations can be called recursively. Future changes (the "Don't pollute the ... namespace" changes) will cause objects with invalid names to be used. If a nested meta operation tries to restore an object named 0xDEADBEEF, it will fail. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	meta/decompress: Track sampler using gl_sampler_object instead of GL API ↵	Ian Romanick	2016-01-11	2	-12/+12
\| \| \| \| \| \| \|	object handle Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	meta/decompress: Use internal functions for sampler object access	Ian Romanick	2016-01-11	1	-4/+9
\| \| \| \| \|	Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	meta/generate_mipmap: Don't pollute the sampler object namespace	Ian Romanick	2016-01-11	1	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tl;dr: For many types of GL object, we can NEVER use the Gen function. In OpenGL ES (all versions!) and OpenGL compatibility profile, applications don't have to call Gen functions. The GL spec is very clear about how you can mix-and-match generated names and non-generated names: you can use any name you want for a particular object type until you call the Gen function for that object type. Here's the problem scenario: - Application calls a meta function that generates a name. The first Gen will probably return 1. - Application decides to use the same name for an object of the same type without calling Gen. Many demo programs use names 1, 2, 3, etc. without calling Gen. - Application calls the meta function again, and the meta function replaces the data. The application's data is lost, and the app fails. Have fun debugging that. Signed-off-by: Ian Romanick <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363 Reviewed-by: Jason Ekstrand <[email protected]>
*	meta/generate_mipmap: Save and restore the sampler using gl_sampler_object ↵	Ian Romanick	2016-01-11	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \|	instead of GL API object handle Some meta operations can be called recursively. Future changes (the "Don't pollute the ... namespace" changes) will cause objects with invalid names to be used. If a nested meta operation tries to restore an object named 0xDEADBEEF, it will fail. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	meta/generate_mipmap: Track sampler using gl_sampler_object instead of GL ↵	Ian Romanick	2016-01-11	2	-14/+17
\| \| \| \| \| \| \|	API object handle Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	meta/generate_mipmap: Use internal functions for sampler object access	Ian Romanick	2016-01-11	1	-9/+11
\| \| \| \| \|	Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	meta/blit: Don't pollute the sampler object namespace in ↵	Ian Romanick	2016-01-11	1	-7/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	_mesa_meta_setup_sampler tl;dr: For many types of GL object, we can NEVER use the Gen function. In OpenGL ES (all versions!) and OpenGL compatibility profile, applications don't have to call Gen functions. The GL spec is very clear about how you can mix-and-match generated names and non-generated names: you can use any name you want for a particular object type until you call the Gen function for that object type. Here's the problem scenario: - Application calls a meta function that generates a name. The first Gen will probably return 1. - Application decides to use the same name for an object of the same type without calling Gen. Many demo programs use names 1, 2, 3, etc. without calling Gen. - Application calls the meta function again, and the meta function replaces the data. The application's data is lost, and the app fails. Have fun debugging that. Signed-off-by: Ian Romanick <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363 Reviewed-by: Jason Ekstrand <[email protected]>
*	meta/blit: Save and restore the sampler using gl_sampler_object instead of ↵	Ian Romanick	2016-01-11	2	-5/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GL API object handle Some meta operations can be called recursively. Future changes (the "Don't pollute the ... namespace" changes) will cause objects with invalid names to be used. If a nested meta operation tries to restore an object named 0xDEADBEEF, it will fail. v2: Add a comment explaining why samp_obj_save is set to NULL in _mesa_meta_fb_tex_blit_begin. This came out of review feedback from Jason. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	meta/blit: Use internal functions for sampler object access	Ian Romanick	2016-01-11	3	-19/+25
\| \| \| \| \| \| \| \|	This requires tracking the sampler object using the gl_sampler_object* instead of the object name. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	meta/blit: Group the SamplerParameteri calls with the other sampler operations	Ian Romanick	2016-01-11	1	-4/+4
\| \| \| \| \|	Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Upload 3DSTATE_BINDING_TABLE_POINTERS_HS when !TCS on Gen9+.	Kenneth Graunke	2016-01-11	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Gen9+ requires us to emit 3DSTATE_BINDING_TABLE_POINTERS_HS for the hull shader push constants to take effect. The passthrough TCS uses push constants for the default tessellation levels. So, when those change, we need to re-upload the binding table as well. Fixes five Piglit tests on Skylake: - spec/arb_tessellation_shader/vs-tes-vertex - spec/arb_tessellation_shader/vs-tes-tessinner-tessouter-inputs-quads - spec/arb_tessellation_shader/vs-tes-tessinner-tessouter-inputs-tris - spec/arb_tessellation_shader/tes-read-texture - spec/arb_tessellation_shader/tess_with_geometry Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	Add missing platform information for KBL	Mark Janes	2016-01-11	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In testing KBL, I found: - urb size was not set for slices gt1.5, gt2, and gt3. The value I used for these slices (384) was taken from an earlier patch authored by Ben Widawsky. - slice count was missing. This field was added by a403ad4f5a034e52a3cd845e91c4aa3e6927b731 With this commit, KBL passes piglit at parity with SKL. Note: As requested by Kristian, Sarah modified this patch to drop setting urb size for gt1.5, gt2, and gt3, since the correct default is set in the GEN9 macro by commit c1e38ad37042b0ec261eb0ba5631b7ff0ee7a9da "i965/skl: Use larger URB size where available." Signed-off-by: Mark Janes <[email protected]> Signed-off-by: Sarah Sharp <[email protected]> Reviewed-by: Kristian Høgsberg Kristensen <[email protected]> Cc: "11.1" <[email protected]>
*	glsl: Move _mesa_shader_stage_to_string/abbrev to shader_enums.c	Kristian Høgsberg Kristensen	2016-01-08	4	-4/+0
\| \| \| \| \| \| \|	These are used by code that doesn't necessarily link to libglsl.la. Move them to shader_enums.[ch] where we keep similar helpers. Reviewed-by: Matt Turner <[email protected]>
*	i965: Move GLSL lowering passes out of libi965_compiler.la	Kristian Høgsberg Kristensen	2016-01-08	1	-5/+5
\| \| \| \| \| \| \| \| \|	The scope of libi965_compiler.la is to be able to take nir shaders and generate i965 EU code. As such, we don't want the GLSL IR lowering passes in the library. With this change, libi965_compiler.la no longer needs to link to libglsl.la. Reviewed-by: Matt Turner <[email protected]>
*	i965/compiler: Enable more lowering in NIR	Jason Ekstrand	2016-01-07	1	-0/+7
\| \| \| \| \| \| \|	We don't need these for GLSL or ARB, but we need them for SPIR-V Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965: use _mesa_delete_buffer_object	Nicolai Hähnle	2016-01-07	1	-1/+1
\| \| \| \| \| \| \| \| \|	This is more future-proof, plugs the memory leak of Label and properly destroys the buffer mutex. Reviewed-by: Marek Olšák <[email protected]> Cc: "11.0 11.1" <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i915: use _mesa_delete_buffer_object	Nicolai Hähnle	2016-01-07	1	-1/+1
\| \| \| \| \| \| \| \| \|	This is more future-proof, plugs the memory leak of Label and properly destroys the buffer mutex. Reviewed-by: Marek Olšák <[email protected]> Cc: "11.0 11.1" <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	radeon: use _mesa_delete_buffer_object	Nicolai Hähnle	2016-01-07	1	-1/+1
\| \| \| \| \| \| \| \| \|	This is more future-proof, plugs the memory leak of Label and properly destroys the buffer mutex. Reviewed-by: Marek Olšák <[email protected]> Cc: "11.0 11.1" <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	mesa: Add KBL PCI IDs and platform information.	Sarah Sharp	2016-01-06	1	-0/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add PCI IDs for the Intel Kabylake platforms. The IDs are taken directly from the Linux kernel patches, which are under review: http://lists.freedesktop.org/archives/intel-gfx/2015-October/078967.html http://cgit.freedesktop.org/~vivijim/drm-intel/log/?h=kbl-upstream-v2 The Kabylake PCI IDs taken from the kernel are rearranged to be in order of GT type, then PCI ID. Please note that if this patch is backported, the following fixes will need to be added before this patch: commit 28ed1e08e8ba98e "i965/skl: Remove early platform support" commit c1e38ad37042b0e "i965/skl: Use larger URB size where available." Thanks to Ben for fixing a bug around setting urb.size, and being patient with my questions about what the various fields mean. Signed-off-by: Sarah Sharp <[email protected]> Suggested-by: Ben Widawsky <[email protected]> Tested-by: Rodrigo Vivi <[email protected]> (KBL-GT2) Cc: "11.1" <[email protected]>
*	nir: Add a lower_fdiv option, turn fdiv into fmul/frcp.	Kenneth Graunke	2016-01-05	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The nir_opt_algebraic rule (('fadd', ('flog2', a), ('fneg', ('flog2', b))), ('flog2', ('fdiv', a, b))), can produce new fdiv operations, which need to be lowered on i965, as we don't actually implement fdiv. (Normally, we handle this in GLSL IR's lower_instructions pass, but in the above case we introduce an fdiv after that point. So, make NIR do it for us.) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: [email protected]
*	i965: Only turn on ARB_compute_shader if we can write registers.	Kenneth Graunke	2016-01-05	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Compute shaders require reconfiguring the L3 for shared local memory support. We have to be able to write the L3 registers to do that. This effectively turns off compute shaders prior to Kernel 4.2. (Previously, the extension enable was in an API_OPENGL_CORE conditional. However, that isn't necessary - core Mesa extension handling already restricts it properly. I've moved it out in this patch.) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	i965: Use rcp in brw_lower_texture_gradients rather than 1.0 / x.	Kenneth Graunke	2016-01-05	1	-1/+1
\| \| \| \| \| \| \| \|	That's what it's for. Plus, we actually implement rcp. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965/gen9: Modify the conditions to use blitter on skl+	Anuj Phogat	2016-01-05	1	-3/+9
\| \| \| \| \| \| \| \| \|	Conditions modified allow skl+ to use blitter: - for all tiling formats - to write data to YF/YS tiled surfaces Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965/gen9: Return false in place of assert in intelEmitCopyBlit()	Anuj Phogat	2016-01-05	1	-3/+4
\| \| \| \| \| \| \|	This allows the fallback paths to handle it correctly. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965/gen9: Remove regions overlap check in fast copy blit	Anuj Phogat	2016-01-05	1	-5/+0
\| \| \| \| \| \| \| \|	Overlapping blits are anyway undefined in OpenGL. So no need of overlap check here. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965/gen9: Don't use fast copy blit in case of non power of 2 cpp	Anuj Phogat	2016-01-05	1	-2/+4
\| \| \| \| \| \| \|	Fast copy blit is currently enabled for use only with Yf/Ys tiling. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i915/i965: Fix typo in perf_debug message	Ian Romanick	2016-01-05	2	-2/+2
\| \| \| \| \| \|	Trivial Signed-off-by: Ian Romanick <[email protected]>
*	i965: quieten compiler warning about out-of-bounds access	Ilia Mirkin	2016-01-05	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gcc 4.9.3 shows the following error: brw_vue_map.c:260:20: warning: array subscript is above array bounds [-Warray-bounds] return brw_names[slot - VARYING_SLOT_MAX]; This is because BRW_VARYING_SLOT_COUNT is a valid value for the enum type. Adding an assert will generate no additional code but will teach the compiler to not complain. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
*	i965/wm: use binding size for ubo/ssbo when automatic size is unset	Ilia Mirkin	2016-01-05	1	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes the same tests that commit 8cf2e892f was attempting to fix: ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeOffset ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeSize as confirmed by Samuel. Signed-off-by: Ilia Mirkin <[email protected]> Cc: Samuel Iglesias Gonsálvez <[email protected]> Cc: Marta Lofstedt <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
*	Revert "i965/wm: use proper API buffer size for the surfaces."	Ilia Mirkin	2016-01-05	2	-9/+4
\| \| \| \| \| \| \| \| \| \| \|	This reverts commit 8cf2e892fca20c4776b4a07c39918343cb2d4e0e. It's entirely bogus to attempt to store anything about the binding in the buffer object itself, which might be bound any number of times. Signed-off-by: Ilia Mirkin <[email protected]> Cc: Samuel Iglesias Gonsálvez <[email protected]> Cc: Marta Lofstedt <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
*	i965/wm: use proper API buffer size for the surfaces.	Samuel Iglesias Gonsálvez	2016-01-04	2	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 5bb5eeea fixes a bug indicating that the surfaces should have the API buffer size. Hovewer it picked the wrong value. This patch adds a new variable, which takes into account glBindBufferRange() values. This patch fixes the following CTS regressions: ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeOffset ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeSize Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Marta Lofstedt <[email protected]>
*	nouveau: fix double-const qualifier	Ilia Mirkin	2016-01-03	2	-2/+2
\| \| \| \| \| \| \|	Reported by Tom^ on IRC. The original intent was to mark the pointer constant as well as the data being pointed to, so move the *. Signed-off-by: Ilia Mirkin <[email protected]>
*	nir: extract out helper macros for running passes	Rob Clark	2016-01-03	1	-36/+9
\| \| \| \| \| \| \| \| \|	Note these are a bit uglier, due to avoidance of GNU C extensions. But drivers which do not need to be built with compilers that don't support the extension can wrap these macros with their own. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Make TCS precompile use the TES primitive mode when available.	Kenneth Graunke	2016-01-02	1	-1/+3
\| \| \| \| \| \| \| \|	If there's a linked TES program, we should just use the actual primitive mode. If not, just guess triangles (as we did before). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965: Push most TES inputs in SIMD8 mode.	Kenneth Graunke	2016-01-02	1	-12/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using the push model for inputs is much more efficient than pulling inputs - the hardware can simply copy a large chunk into URB registers at thread creation time, rather than having the thread send messages to request data from the L3 cache. Unfortunately, it's possible to have more TES inputs than fit in registers, so we have to fall back to the pull model in some cases. However, it turns out that most tessellation evaluation shaders are fairly simple, and don't use many inputs. An arbitrary cut-off of 32 vec4 slots (16 registers) is more than sufficient to ensure that 100% of TES inputs are pushed for Shadow of Mordor, Unigine Heaven, GPUTest/TessMark, and SynMark. Note that unlike most SIMD8 stages, this actually reads packed vec4 data, since that is what our vec4 TCS programs write. Improves performance in GPUTest's tessmark_x64 microbenchmark by 93.4426% +/- 5.35541% (n = 25) on my Lenovo X250 at 1024x768. Improves performance in Synmark's Gl40TerrainFlyTess microbenchmark by 22.74% +/- 0.309394% (n = 5). Improves performance in Shadow of Mordor at low settings with tessellation enabled at 1280x720 by 2.12197% +/- 0.478553% (n = 4). shader-db statistics for files containing tessellation shaders: total instructions in shared programs: 184358 -> 181181 (-1.72%) instructions in affected programs: 27971 -> 24794 (-11.36%) helped: 226 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965: Use LOAD_PAYLOAD for SIMD8 TES input loads, not MOV.	Kenneth Graunke	2016-01-02	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need a MOV to replicate g0.0<0,1,0> to all 8 channels. Since the message payload is a single register, MOV seemed more sensible than LOAD_PAYLOAD. However, MOV cannot be CSE'd, while LOAD_PAYLOAD can. All input loads can use the same header - we don't need to re-expand g0 every time. CSE accomplishes this, saving instructions. shader-db statistics for files containing tessellation shaders: total instructions in shared programs: 186923 -> 184358 (-1.37%) instructions in affected programs: 30536 -> 27971 (-8.40%) helped: 226 HURT: 0 total cycles in shared programs: 1009850 -> 1005356 (-0.45%) cycles in affected programs: 168206 -> 163712 (-2.67%) helped: 226 HURT: 0 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965: Move 3-src subnr swizzle handling into the vec4 backend.	Kenneth Graunke	2016-01-02	2	-6/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While most align16 instructions only support a SubRegNum of 0 or 4 (using swizzling to control the other channels), 3-src instructions actually support arbitrary SubRegNums. When the RepCtrl bit is set, we believe it ignores the swizzle and uses the equivalent of a <0,1,0> region from the subnr. In the past, we adopted a vec4-centric approach of specifying subnr of 0 or 4 and a swizzle, then having brw_eu_emit.c convert that to a proper SubRegNum. This isn't a great fit for the scalar backend, where we don't set swizzles at all, and happily set subnrs in the range [0, 7]. This patch changes brw_eu_emit.c to use subnr and swizzle directly, relying on the higher levels to set them sensibly. This should fix problems where scalar sources get copy propagated into 3-src instructions in the FS backend. I've only observed this with TES push model inputs, but I suppose it could happen in other cases. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	drirc: Disable ARB_blend_func_extended for Heaven 4.0/Valley 1.0.	Kenneth Graunke	2015-12-30	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unigine Heaven 4.0 and Valley 1.0 use dual color blending but don't specify which fragment shader output is which, so there's at best a 50/50 chance of us guessing it correctly. This is invalid. Unigine fixed this in 4.1 and 1.1 versions over a year and a half ago, but hasn't actually released them for whatever reason. So, add the workaround back so that it works for most people. Fixes Heaven 4.0/Valley 1.0 rendering on Ivybridge. For whatever reason, Broadwell worked. 4.1 and 1.1 have always worked. Signed-off-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92233 Reviewed-by: Marek Olšák <[email protected]> Cc: [email protected]