mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	glsl: Add new atomic_uint built-in GLSL type.	Francisco Jerez	2013-10-29	6	-0/+9
\| \| \| \| \| \| \| \| \|	v2: Fix GLSL version in which the type became available. Add contains_atomic() convenience method. Split off atomic counter comparison error checking to a separate patch that will handle all opaque types. Include new ir_variable fields for atomic types. Reviewed-by: Ian Romanick <[email protected]>
*	mesa: Add support for ARB_shader_atomic_counters.	Francisco Jerez	2013-10-29	10	-3/+263
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements the common support code required for the ARB_shader_atomic_counters extension. It defines the necessary data structures for tracking atomic counter buffer objects (from now on "ABOs") associated with some specific context or shader program, it implements support for binding buffers to an ABO binding point and querying the existing atomic counters and buffers declared by GLSL shaders. v2: Fix extension checks. Drop unused MAX_ATOMIC_BUFFERS constant. Acked-by: Paul Berry <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	glapi: Add support for ARB_shader_atomic_counters.	Francisco Jerez	2013-10-29	3	-1/+10
\| \| \| \| \| \| \| \|	Add XML file for the dispatch code generator, update the dispatch_sanity test and add stub definition for the new entry point. Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i965: Handle deallocation of some private ralloc contexts explicitly.	Francisco Jerez	2013-10-29	4	-4/+4
\| \| \| \| \| \| \| \| \|	These ralloc contexts belong to a specific object and are being deallocated manually from the class destructor. Now that we've hooked up destructors to ralloc there's no reason for them to be children of any other context, and doing so might to lead to double frees under some circumstances. The class destructor has all the responsibility of freeing class memory resources now.
*	mesa: Define introspection macro to determine whether a type is trivially ↵	Francisco Jerez	2013-10-29	1	-1/+23
\| \| \| \| \| \| \| \| \| \|	destructible. Only implemented on GCC and Clang for now. Other compilers use a dummy implementation that always returns false, which should be a safe [but slightly inefficient] assumption in all cases. Reviewed-by: Ian Romanick <[email protected]>
*	glsl: Generalize MSVC fix for strcasecmp().	Paul Berry	2013-10-29	1	-0/+1
\| \| \| \| \| \| \| \|	This will let us use strcasecmp() from anywhere inside Mesa without having to worry about the fact that it doesn't exist in MSVC. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	st/mesa: move out of memory check in st_draw_vbo()	Brian Paul	2013-10-29	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \|	Before we were only checking the st->vertex_array_out_of_memory flag after updating array state. But if there's two consecutive glDrawArrays calls and the first one is skipped because of OOM, the second one should be skipped too. Cc: 9.2 <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
*	i965/vec4: Reduce working set size of live variables computation.	Eric Anholt	2013-10-29	2	-23/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Orbital Explorer was generating a 4000 instruction geometry shader, which was taking 275 trips through dead code elimination and register coalescing, each of which updated live variables to get its work done, and invalidated those live variables afterwards. By using bitfields instead of bools (reducing the working set size by a factor of 8) in live variables analysis, it drops from 88% of the profile to 57%, and reduces overall runtime from I-got-bored-and-killed-it (Paul says 3+ minutes) to 10.5 seconds. Compare to f179f419d1d0a03fad36c2b0a58e8b853bae6118 on the FS side. Reviewed-by: Paul Berry <[email protected]>
*	Remove error when calling glGenQueries/glDeleteQueries while a query is active	Carl Worth	2013-10-28	1	-15/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is nothing in the OpenGL specification which prevents the user from calling glGenQueries to generate a new query object while another object is active. Neither is there anything in the Mesa implementation which prevents this. So remove the INVALID_OPERATION errors in this case. Similarly, it is explicitly allowed by the OpenGL specification to delete an active query, so remove the assertion for that case, replacing it with the necesssary state updates to end the query, (clear the bindpt pointer and call into the driver's EndQuery hook). CC: <[email protected]> Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]>
*	i965: Also emit HiZ and Stencil packets when disabling depth on Gen6.	Kenneth Graunke	2013-10-28	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \|	The normal drawing path does this, and it's necessary on Ivybridge, so let's try it on Sandybridge too. It's not explicitly documented as necessary, but might help with hangs. Signed-off-by: Kenneth Graunke <[email protected]> Tested-by: Xinkai Chen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Cc: "9.2" <[email protected]>
*	i965: Also emit HIER_DEPTH and STENCIL packets when disabling depth.	Kenneth Graunke	2013-10-28	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From the documentation: "[DevIVB] 3DSTATE_DEPTH_BUFFER must always be programmed along with the other Depth/Stencil state commands(i.e. 3DSTATE_CLEAR_PARAMS, 3DSTATE_STENCIL_BUFFER, or 3DSTATE_HIER_DEPTH_BUFFER)." We normally do this, but BLORP was failing to do so in the case where it disables depth. Not observed to fix anything yet. Signed-off-by: Kenneth Graunke <[email protected]> Tested-by: Xinkai Chen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Cc: "9.2" <[email protected]>
*	i965: Move post-sync non-zero flush for 3DSTATE_MULTISAMPLE.	Kenneth Graunke	2013-10-28	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	For some reason, we put the flush in the caller, rather than just before emitting the packet. This is more than a cosmetic problem: BLORP calls gen6_emit_3dstate_multisample() directly, and so it missed the flush. Signed-off-by: Kenneth Graunke <[email protected]> Tested-by: Xinkai Chen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Cc: "9.2" <[email protected]>
*	i965: Also guard 3DSTATE_DRAWING_RECTANGLE with a flush in blorp.	Kenneth Graunke	2013-10-28	1	-0/+3
\| \| \| \| \| \| \| \| \|	Non-pipelined commands need this flush. Signed-off-by: Kenneth Graunke <[email protected]> Tested-by: Xinkai Chen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Cc: "9.2" <[email protected]>
*	i965: Emit post-sync non-zero flush before 3DSTATE_DRAWING_RECTANGLE.	Kenneth Graunke	2013-10-28	1	-0/+4
\| \| \| \| \| \| \| \| \|	This is another non-pipelined command that needs a flush on Sandybridge. Signed-off-by: Kenneth Graunke <[email protected]> Tested-by: Xinkai Chen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Cc: "9.2" <[email protected]>
*	i965: Emit post-sync non-zero flush before 3DSTATE_GS_SVB_INDEX.	Kenneth Graunke	2013-10-28	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From the comments above intel_emit_post_sync_nonzero_flush: "[DevSNB-C+{W/A}] Before any depth stall flush (including those produced by non-pipelined state commands), software needs to first send a PIPE_CONTROL with no bits set except Post-Sync Operation != 0." This suggests that every non-pipelined (0x79xx) command needs a post-sync non-zero flush before it. Signed-off-by: Kenneth Graunke <[email protected]> Tested-by: Xinkai Chen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Cc: "9.2" <[email protected]>
*	i965: CS writes/reads should use I915_GEM_INSTRUCTION	Daniel Vetter	2013-10-28	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise the gen6 w/a in the kernel won't kick in and the write will land nowhere. Inspired by a patch Ken pointed me at which had the same issue (but isn't yet merged and also for a gen7+ feature). An audit of the entire driver didn't reveal any other case than the one in in the write_reg helper used by the gen6 queryobj code. Acked-by: Kenneth Graunke <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Tested-by: Xinkai Chen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Cc: "9.2" <[email protected]>
*	i965: Do not set bilinear_filter flag in case of multisample blits	Anuj Phogat	2013-10-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Setting bilinear_filter flag in case of multisample blits with GL_LINEAR filter causes incorrect behavior in translate_dst_to_src() function. This broke Modern Warfare (1, 2 and 3) on SNB, IVB and HSW. Tested on SNB and IVB, no Piglit regressions. Trace file of the game (taken with apitrace) works fine with this patch. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=69078 Cc: [email protected] Signed-off-by: Anuj Phogat <[email protected]> Reported-by: Armin K <[email protected]> Tested-by: Armin K <[email protected]> Reviewed-by: Paul Berry <[email protected]>
*	mesa: Remove trailing whitespace in texparam.c	Rico Schüller	2013-10-28	1	-6/+6
\| \| \| \| \|	Signed-off-by: Rico Schüller <[email protected]> Signed-off-by: Brian Paul <[email protected]>
*	mesa: use void in _mesa_VDPAUFiniNV() as in the header file	Brian Paul	2013-10-28	1	-1/+1
\|
*	i965: Make fs gl_PrimitiveID input work even when there's no gs.	Paul Berry	2013-10-27	2	-5/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a geometry shader is present, the fragment shader gl_PrimitiveID input acts like an ordinary varying, receiving data from the gs gl_PrimitiveID output. When there's no geometry shader, we have to ask the fixed function SF hardware to provide the primitive ID to the fragment shader instead. Previously, the SF setup code would handle this situation by recognizing that the FS gl_PrimitiveID input didn't match to any VS output; since normally an FS input with no corresponding VS output leads to undefined data, the SF setup code used to just arbitrarily assign it to receive data from attribute 0. This patch changes the SF setup code so that instead of arbitrarily using attribute 0, it assigns the unmatched FS input to receive gl_PrimitiveID. In the case where the FS input really is gl_PrimitiveID, this produces the intended result. In all other cases, no harm is done since GL specifies that the behaviour is undefined. Fixes piglit test primitive-id-no-gs. v2: If an attribute is already being overridden with point coordinates, don't try to also override it with gl_PrimitiveID. This is necessary to avoid regressing piglit tests such as shaders/glsl-fs-pointcoord. Reviewed-by: Eric Anholt <[email protected]>
*	mesa: Add GL_NV_vdpau_interop functions to dispatch_sanity.cpp.	Vinson Lee	2013-10-26	1	-0/+12
\| \| \| \| \| \| \| \|	Fixes 'make check' failures introduced with commit 80964226e9b8a05c39157f9305c06c0b2861e080. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70900 Signed-off-by: Vinson Lee <[email protected]>
*	mesa: add vdpau.c and st_vdpau.c to src/mesa/SConscript	Brian Paul	2013-10-26	1	-0/+2
\| \| \| \|	Fixes SCons build.
*	implement NV_vdpau_interop v7	Christian König	2013-10-26	10	-1/+751
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	v2: Actually implement interop between the gallium state tracker and the VDPAU backend. v3: Make it also available in non legacy contexts, fix video buffer sharing. v4: deny interop if we don't have the same screen object v5: rebased on upstream changes v6: implemented VDPAUGetSurfaceivNV, improved error handling, unregister all surfaces in VDPAUFiniNV v7: squash merge with Mareks changes Signed-off-by: Christian König <[email protected]>
*	i965: Remove ir_txf coord+offset special case in visitors	Chris Forbes	2013-10-26	2	-65/+16
\| \| \| \| \| \| \|	Just let it be handled by the lowering pass. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Generalize coord+offset lowering pass for ir_txf	Chris Forbes	2013-10-26	1	-3/+26
\| \| \| \| \| \| \| \| \| \|	ir_txf expects an ivec* coordinate, and may be larger than ivec2; shuffle things around so that this will work. V2: Fix style nits, use ir_builder Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Add lowering pass to fold offset into unnormalized coords	Chris Forbes	2013-10-26	4	-0/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It turns out that nonzero offsets with gsampler2DRect don't work -- they just return garbage. Work around this by folding the offset into the coord. Done as an IR pass rather than yet another hack in the visitors because it's clear what's going on this way. Can possibly reuse this to replace the existing txf coord+offset hacks. V2: Use ir_builder Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Add lowering pass for splitting textureGatherOffsets	Chris Forbes	2013-10-26	4	-0/+92
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rewrites textureGatherOffsets(s, p, offsets) into gvec4( textureGatherOffset(s, p, offsets[0]).w, textureGatherOffset(s, p, offsets[1]).w, textureGatherOffset(s, p, offsets[2]).w, textureGatherOffset(s, p, offsets[3]).w ) V2: Use ir_builder to be slightly clearer. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Add asserts to ensure that ir_tg4 offset arrays are lowered	Chris Forbes	2013-10-26	2	-0/+6
\| \| \| \| \| \| \| \| \|	We don't have a message that does 4 independent offsets; a lowering pass needs to lower it to 4 normal gather4s before reaching this point. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Add support for shadow comparitors with gather4	Chris Forbes	2013-10-26	2	-3/+15
\| \| \| \| \| \| \| \|	Note that gather4_po_c's parameters are too long for SIMD16. It might be worth emitting 2xSIMD8 messages in this case at some point. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/vs: Add support for shadow comparitors with gather4	Chris Forbes	2013-10-26	2	-3/+16
\| \| \| \| \| \| \| \| \| \|	gather4_c's argument layout is straightforward -- refz just goes on the end. gather4_po_c's layout however -- the array index is replaced with refz. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Add Gen7 gather4_c and gather4_po_c message types	Chris Forbes	2013-10-26	1	-0/+2
\| \| \| \| \| \|	Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/vs: add support for gather4 with nonconstant offsets	Chris Forbes	2013-10-26	1	-1/+15
\| \| \| \|	Signed-off-by: Chris Forbes <[email protected]>
*	i965/fs: add support for gather4 with nonconstant offsets	Chris Forbes	2013-10-26	1	-7/+46
\| \| \| \| \| \| \| \|	V3: fixup crazy check for whether we need to emit the coordinate after custom handling. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: relax brw_texture_offset assert	Chris Forbes	2013-10-26	4	-5/+10
\| \| \| \| \| \| \| \|	Some texturing ops are about to have nonconstant offset support; the offset in the header in these cases should be zero. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i965: Add SHADER_OPCODE_TG4_OFFSET for gather with nonconstant offsets.	Chris Forbes	2013-10-26	6	-3/+20
\| \| \| \| \| \| \| \| \|	The generator code ends up clearer this way than if we had to sniff via the message length. Implemented via the gather4_po message in hardware, which is present in Gen7 and later. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i965: add missing tg4 case in brw_instruction_name	Chris Forbes	2013-10-26	1	-0/+2
\| \| \| \| \|	Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i965: Weaken the flushing in gen7_end_transform_feedback().	Kenneth Graunke	2013-10-25	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Since 062317d6671 (i965: Go back to using the kernel SOL reset feature.) we've been flushing the batch on BeginTransformFeedback(). So it's not necessary to do it on EndTransformFeedback(). A PIPE_CONTROL will work. This makes gen7_end_transform_feedback() exactly the same as the gen6 variant. However, they'll diverge again shortly. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965/fs: Stop trying to hack around MRF dep chains on gen7+ LIFO scheduling.	Eric Anholt	2013-10-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was a hack to avoid choosing to schedule all texturing before consumption of any texture results due to the way dependency chains worked out in the presence of MRFs. On gen7, we don't have MRFs, so the problem doesn't apply, and this was just badly constraining our scheduling. total instructions in shared programs: 1615306 -> 1612534 (-0.17%) instructions in affected programs: 9958 -> 7186 (-27.84%) GAINED: 259 LOST: 9 Reviewed-by: Matt Turner <[email protected]>
*	i965: Try not to reverse-schedule things when doing LIFO scheduling.	Eric Anholt	2013-10-25	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The LIFO plan was simple: Take the most recently made available instructions, and pick those first. But because of the order we were pushing things onto our list of available-to-schedule instructions, it meant that when a set of instructions was made available at the same time (for example, everything at the start of the program that didn't depend on other instructions) we'd schedule them in reverse order. If you had 10 texture calls in a row in your program, each with independent argument setup, we'd set up the last texture call's args and execute it first, even though we wouldn't be able to consume its results until we'd finished the other 9 texture calls (assuming consumption of texture results happens near each texture call, and combines it with another texture result, which is normal for a convolution shader). To fix this, walk the list for doing LIFO in the order that instructions were originally generated in the program, but choose to push newly-made-available instructions to the other end of the list instead. total instructions in shared programs: 1587242 -> 1586290 (-0.06%) instructions in affected programs: 7801 -> 6849 (-12.20%) GAINED: 76 LOST: 67 Thanks to Chia-I Wu for pointing out the bug in my first version of the patch that made it a huge loss. Reviewed-by: Matt Turner <[email protected]>
*	mesa/st: disable ARB_framebuffer_object when no driver support.	Ilia Mirkin	2013-10-26	1	-2/+5
\| \| \| \| \| \| \| \|	When PIPE_CAP_MIXED_FRAMEBUFFER_SIZES is not provided, parts of ARB_framebuffer_object can't be supported, such as on NV30. Signed-off-by: Ilia Mirkin <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
*	i965/fs: Match commutative expressions with reversed arguments.	Matt Turner	2013-10-25	1	-3/+23
\| \| \| \| \| \| \|	total instructions in shared programs: 1645011 -> 1644938 (-0.00%) instructions in affected programs: 17543 -> 17470 (-0.42%) Reviewed-by: Eric Anholt <[email protected]>
*	i965: s/Muchnik/Muchnick/.	Matt Turner	2013-10-25	4	-4/+4
\| \| \| \|	Reviewed-by: Eric Anholt <[email protected]>
*	mesa: Fix geometry shader program queries.	Paul Berry	2013-10-24	1	-60/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The queries GEOMETRY_VERTICES_OUT, GEOMETRY_INPUT_TYPE, and GEOMETRY_OUTPUT_TYPE (defined by GL 3.2) differ from the corresponding queries in ARB_geometry_shader4 in the following ways: - They use different enum values - They can only be queried; they cannot be set. - Attempting to query them yields INVALID_OPERATION if the program is not linked, or lacks a geometry shader. This patch switches us over from the ARB_geometry_shader4 behaviour to the GL 3.2 behaviour. Fixes piglit test query-gs-prim-types. v2: Improve comment above has_core_gs. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965: Reduce gl_MaxGeometryInputComponents to 64.	Paul Berry	2013-10-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Although in principle there is no hardware limitation that prevents gl_MaxGeometryInputComponents from being set to 128 on Gen7, we have the following limitations in the vec4 compiler back end: - Registers assigned to geometry shader inputs can't be spilled or later re-used for any other purpose. - The last 16 registers are set aside for the "MRF hack", meaning they can only be used to send messages, and not for general purpose computation. - Up to 32 registers may be reserved for push constants, even if there is sufficient register pressure to make this impractical. A shader using 128 geometry input components, and having an input type of triangles_adjacency, would use up: - 1 register for r0 (which holds URB handles and various pieces of control information). - 1 register for gl_PrimitiveID. - 102 registers for geometry shader inputs (17 registers per input vertex, assuming DUAL_INSTANCED dispatch mode and allowing for one register of overhead for gl_Position and gl_PointSize, which are present in the URB map even if they are not used). - Up to 32 registers for push constants. - 16 registers for the "MRF hack". That's a total of 152 registers, which is well over the 128 registers the hardware supports. Fortunately, the GLSL 1.50 spec allows us to reduce gl_MaxGeometryInputComponents to 64. Doing that frees up 48 registers, brining the total down to 104 registers, leaving 24 registers available to do computation. Fixes piglit test spec/glsl-1.50/execution/geometry/max-input-components. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965/gs: If a DUAL_OBJECT gs would spill, fall back to DUAL_INSTANCED.	Paul Berry	2013-10-24	3	-2/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is similar to what we do for 16-wide vs 8-wide fragment shaders. First we try compiling the geometry shader in DUAL_OBJECT mode. If we can't do that without spilling, we fall back on DUAL_INSTANCED mode, which should require less spilling (since it uses an interleaved layout of payload registers). In an ideal world we'd fall back to SINGLE mode, which would allow us to interleave general-purpose registers too (resulting in even less likelihood of spilling). But at the moment, the vec4 generator and visitor classes don't have the infrastructure to interleave general purpose registers, so DUAL_INSTANCED is the best we can do. As a side benefit this paves the way for implementing instanced geometry shaders (which are incompatible with DUAL_OBJECT mode). Since most geometry shaders used in piglit testing are small, DUAL_INSTANCED mode won't get exercised very much in a normal piglit run. To force DUAL_INSTANCED mode to be used for all geometry shaders, set INTEL_DEBUG=nodualobj. Reviewed-by: Eric Anholt <[email protected]>
*	i965/gs: Fix up gl_PointSize input swizzling for DUAL_INSTANCED gs.	Paul Berry	2013-10-24	2	-1/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Geometry shaders that run in "DUAL_INSTANCED" mode store their inputs in vec4's. This means that when compiling gl_PointSize input swizzling (a MOV instruction which uses a geometry shader input as both source and destination), we need to do two things: - Set force_writemask_all to ensure that the MOV happens regardless of which channels are enabled. - Set the source register region to <4;4,1> (instead of <0;4,1> to satisfy register region restrictions. v2: move the source register region fixup to the top of vec4_generator::generate_vec4_instruction(), so that it applies to all instructions rather than just MOV. Reviewed-by: Eric Anholt <[email protected]>
*	i965/gs: Add the ability to compile a DUAL_INSTANCED geometry shader.	Paul Berry	2013-10-24	4	-8/+30
\| \| \| \| \| \|	Not yet enabled. Reviewed-by: Eric Anholt <[email protected]>
*	i965/vec4: Add the ability to suppress register spilling.	Paul Berry	2013-10-24	7	-10/+23
\| \| \| \| \| \| \| \| \|	In future patches, this will allow us to first try compiling a geometry shader in DUAL_OBJECT mode (which is more efficient but uses more registers) and then if spilling is required, fall back on DUAL_INSTANCED mode. Reviewed-by: Eric Anholt <[email protected]>
*	i965/vec4: if register allocation fails, don't try to schedule.	Paul Berry	2013-10-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Otherwise the scheduler would be invoked with prog_data->total_grf == 0, causing havoc. In a future patch, this will allow us to try compiling a geometry shader in DUAL_OBJECT mode with spilling disabled, and then fall back to DUAL_INSTANCED mode if that failed. Reviewed-by: Eric Anholt <[email protected]>
*	i965/vec4: Add the ability for attributes to be interleaved.	Paul Berry	2013-10-24	3	-6/+27
\| \| \| \| \| \| \| \| \| \| \| \|	When geometry shaders are operated in "single" or "dual instanced" mode, a single set of geometry shader inputs is interleaved into the thread payload (with each payload register containing a pair of inputs) in order to save register space. This patch modifies vec4_visitor::lower_attributes_to_hw_regs so that it can handle the interleaved format. Reviewed-by: Eric Anholt <[email protected]>