mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	i965/fs: Define logical framebuffer read opcode and lower it to physical reads.	Francisco Jerez	2016-08-25	1	-0/+1
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Define framebuffer read virtual opcode.	Francisco Jerez	2016-08-25	1	-0/+3
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/eu: Add codegen support for the Gen9+ render target read message.	Francisco Jerez	2016-08-25	1	-0/+4
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Fix undefined signed overflow in INTEL_MASK for bitfields of 31 bits.	Francisco Jerez	2016-08-25	1	-1/+1
\| \| \| \| \| \| \|	Most likely we had only ever used this macro on bitfields of less than 31 bits -- That's going to change shortly. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Roll intel_reg.h into brw_defines.h	Jason Ekstrand	2016-08-19	1	-0/+273
\| \| \| \| \| \| \| \|	More than half of the stuff in intel_reg.h had nothing whatsoever to do with registers and really belongs in brw_defines.h anyway. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Delete the FS_OPCODE_INTERPOLATE_AT_CENTROID virtual opcode.	Kenneth Graunke	2016-07-20	1	-1/+0
\| \| \| \| \| \| \| \| \|	We no longer use this message. As far as I can tell, it's fairly useless - the equivalent information is provided in the payload. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Rename brw_wm_barycentric_interp_mode to brw_barycentric_mode.	Kenneth Graunke	2016-07-15	1	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	brw_wm_barycentric_interp_mode is wordy, brw_barycentric_mode is less typing and suffers from fewer line wrapping problems. The enum values themselves don't really benefit from "WM" in the name, either. Put "BARYCENTRIC" first instead of at the end and drop "WM". Generated by: for file in .c .cpp .h; do sed -i \ -e 's/brw_wm_barycentric_interp_mode/brw_barycentric_mode/g' \ -e 's/BRW_WM_$[A-Z_]$_BARYCENTRIC/BRW_BARYCENTRIC_\1/g' \ -e 's/BRW_WM_BARYCENTRIC_INTERP_MODE_COUNT/BRW_BARYCENTRIC_MODE_COUNT/g' \ $file; done with a few whitespace changes. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: enable the emission of the DIM instruction	Samuel Iglesias Gonsálvez	2016-07-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	v2 (Matt): - Take a DF source argument for the DIM instruction emission in the visitors. - Indentation. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965/docs: update Intel Linux Graphics URLs	Eric Engestrom	2016-07-06	1	-1/+1
\| \| \| \|	Signed-off-by: Eric Engestrom <[email protected]>
*	i965: Support new local ID push constant & cross-thread constants	Jordan Justen	2016-06-01	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The cross thread constant support appears on Haswell. It allows us to upload a set of uniform data for all threads without duplicating it per thread. We also support per-thread data which allows us to store a per-thread ID in one of the uniforms that can be used to calculate the gl_LocalInvocationIndex and gl_LocalInvocationID variables. v4: * Support the old local ID push constant layout as well (Jason) Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Implement opt_sampler_eot() in terms of logical sends.	Francisco Jerez	2016-05-29	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This makes the whole LOAD_PAYLOAD munging unnecessary which simplifies the code and will allow the optimization to succeed in more cases independent of whether the LOAD_PAYLOAD instruction can be found or not. The following patch is squashed in: SQUASH: i965/fs: Add basic dataflow check to opt_sampler_eot(). The sampler EOT optimization pass naively assumes that the texturing instruction provides all the data used by the FB write just because they're standing next to each other. The least we should be checking is whether the source and destination regions of the FB write and texturing instructions match. Without this the previous seemingly harmless patch would have caused opt_sampler_eot() to misoptimize a shader from dota-2 causing DCE to eliminate all of its 78 instructions except for the final sampler EOT message (!). Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/ir: Make BROADCAST emit an unmasked single-channel move.	Francisco Jerez	2016-05-27	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Alternatively we could have extended the current semantics to 32-wide mode by changing brw_broadcast() to emit multiple indexed MOV instructions in the generator copying the selected value to all destination registers, but it seemed rather silly to waste EU cycles unnecessarily copying the exact same value 32 times in the GRF. The vstride change in the Align16 path is required to avoid assertions in validate_reg() since the change causes the execution size of the MOV and SEL instructions to be equal to the source region width. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Remove FS_OPCODE_PACK_STENCIL_REF virtual instruction.	Francisco Jerez	2016-05-27	1	-1/+0
\| \| \| \| \| \|	It's just a byte MOV with strided source. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Remove extract virtual opcodes.	Francisco Jerez	2016-05-27	1	-12/+0
\| \| \| \| \| \| \|	These can be easily represented in the IR as a MOV instruction with strided source so they seem rather redundant. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Handle SAMPLEINFO consistently like other texturing instructions.	Francisco Jerez	2016-05-27	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Seems like this texturing opcode was missing its logical counterpart which would prevent it from taking advantage of the SIMD lowering infrastructure, define it and plumb it through the back-end. At some point we'll likely want to emit a single SAMPLEINFO message shared among all channels irrespective of this change, but for the moment this should be enough to get the intrinsic working in SIMD32 mode. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Rename Gen4 physical varying pull constant load opcode.	Francisco Jerez	2016-05-27	1	-1/+1
\| \| \| \| \| \| \| \| \|	For consistency with the Gen7 variant. I'm not doing the same to the uniform pull constant message at this point because the non-GEN7 one is still overloaded to be either an expression-like logical instruction or a Gen4-specific physical send message. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Hide varying pull constant load message setup behind logical opcode.	Francisco Jerez	2016-05-27	1	-0/+1
\| \| \| \| \| \| \| \|	This will allow the SIMD lowering pass to split 32-wide varying pull constant loads (not natively supported by the hardware) into 16-wide instructions. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965, anv: Use NIR FragCoord re-center and y-transform passes.	Kenneth Graunke	2016-05-20	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This handles gl_FragCoord transformations and other window system vs. user FBO coordinate system flipping by multiplying/adding uniform values, rather than recompiles. This is much better because we have no decent way to guess whether the application is going to use a shader with the window system FBO or a user FBO, much less the drawable height. This led to a lot of recompiles in many applications. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965: Add infrastucture for sample lod-zero operations.	Matt Turner	2016-05-19	1	-0/+5
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/blorp: Delete the old blorp shader emit code	Jason Ekstrand	2016-05-14	1	-1/+0
\| \| \| \|	Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965/gen9: Prepare surface state setup for lossless compression	Topi Pohjolainen	2016-05-12	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	v2 (Ben): Use combination of msaa_layout and number of samples instead of introducing explicit type for lossless compression (intel_miptree_is_lossless_compressed()). v3 (Ben): Do not set fast claer state in surface state setup. Moved into brw_postdraw_set_buffers_need_resolve() using a separate patch. v4: Support for blorp v5 (Ben): Re-use gen8_get_aux_mode() Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
*	i965/fs: add PACK opcode	Connor Abbott	2016-05-10	1	-0/+9
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/gen7: Use predicated rendering for indirect compute	Jordan Justen	2016-02-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On gen7 (Ivy Bridge, Haswell), we will get a GPU hang if an indirect dispatch is used, but one of the dimensions is 0. Therefore we use predicated rendering on the GPGPU_WALKER command to handle this case. Fixes piglit test: spec/arb_compute_shader/zero-dispatch-size From the ARB_compute_shader spec, under DispatchCompute: "If the work group count in any dimension is zero, no work groups are dispatched." And then for DispatchComputeIndirect: ... "is equivalent (assuming no errors are generated) to calling DispatchCompute with <num_groups_x>, <num_groups_y> and <num_groups_z>" ... Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94100 Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Tested-by: Ilia Mirkin <[email protected]>
*	i965: Add resolve option for lossless compression	Topi Pohjolainen	2016-02-16	1	-0/+1
\| \| \| \| \| \| \| \| \|	v2 (Ben): Use combination of msaa_layout and number of samples instead of introducing explicit type for lossless compression (intel_miptree_is_lossless_compressed()). Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
*	i965/fs: Plumb separate surfaces and samplers through from NIR	Jason Ekstrand	2016-02-09	1	-1/+3
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Add an enum for keeping track of texture instruciton sources	Jason Ekstrand	2016-02-09	1	-13/+27
\| \| \| \| \| \| \|	These logical texture instructions can have a lot of sources. It's much safer if we have symbolic names for them. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Implement support for extract_word.	Matt Turner	2016-02-01	1	-0/+12
\| \| \| \| \| \|	The vec4 backend will lower it. Reviewed-by: Iago Toral Quiroga <[email protected]>
*	i965: Don't set interleave or complete on TCS EOT message.	Kenneth Graunke	2015-12-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Setting interleave on the TCS EOT message causes Ivybridge hardware to GPU hang like crazy. Individual tests would pass, but running even a simple test like nop.shader_test in a loop would hang within 1-3 runs. Adding sleep delays worked around the problem, somehow. Interleave doesn't make much sense given that we only have one patch URB handle, not two. Complete doesn't seem useful either. There's no reason to actually set those bits. We were just being lazy. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965: Relase input URB Handles on Gen7/7.5 when TCS threads finish.	Kenneth Graunke	2015-12-28	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Pre-Broadwell hardware requires us to manually release the ICP Handles by issuing URB read messages with the "Complete" bit set. We can do this in pairs to use fewer URB read messages. Based heavily on work from Chris Forbes. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965: Port tessellation evaluation shaders to vec4 mode.	Kenneth Graunke	2015-12-28	1	-0/+4
\| \| \| \| \| \| \| \| \|	This can be used on Broadwell by setting INTEL_SCALAR_TES=0. More importantly, it will be used for Ivybridge and Haswell. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965: Add tessellation control shaders.	Kenneth Graunke	2015-12-22	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The TCS is the first tessellation shader stage, and the most complicated. It has access to each of the control points in the input patch, and computes a new output patch. There is one logical invocation per output control point; all invocations run in parallel, and can communicate by reading and writing output variables. One of the main responsibilities of the TCS is to write the special gl_TessLevelOuter[] and gl_TessLevelInner[] output variables which control how much new geometry the hardware tessellation engine will produce. Otherwise, it simply writes outputs that are passed along to the TES. We run in SIMD4x2 mode, handling two logical invocations per EU thread. The hardware doesn't properly manage the dispatch mask for us; it always initializes it to 0xFF. We wrap the whole program in an IF..ENDIF block to handle an odd number of invocations, essentially falling back to SIMD4x1 on the last thread. v2: Update comments (requested by Jordan Justen). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965: Enable shared local memory for CS shared variables	Jordan Justen	2015-12-09	1	-0/+2
\| \| \| \| \| \| \| \| \|	v3: * Check shared variable size at link time Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965: Define and use REG_MASK macro to make masked MMIO writes slightly more ↵	Francisco Jerez	2015-12-09	1	-0/+6
\| \| \| \| \| \| \| \|	readable. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965: Add defines for gather push constants	Abdiel Janulgue	2015-12-07	1	-0/+19
\| \| \| \| \| \| \| \| \|	v2 (Francisco Jerez): - Rename HSW_GATHER_CONSTANTS_RESERVED to HSW_GATHER_POOL_ALLOC_MUST_BE_ONE. - Rename BRW_GATHER_* prefix to HSW_GATHER_CONSTANT_*. Reviewed-by: Francisco Jerez <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
*	i965: Add symbolic defines for some magic dataport surface indices.	Francisco Jerez	2015-11-26	1	-0/+13
\| \| \| \|	Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965: Add more MAX_*_URB_ENTRY_SIZE_BYTES #defines.	Kenneth Graunke	2015-11-17	1	-0/+6
\| \| \| \| \|	Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Abdiel Janulgue <[email protected]>
*	i965: Introduce a MOV_INDIRECT opcode.	Kenneth Graunke	2015-11-14	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The geometry and tessellation control shader stages both read from multiple URB entries (one per vertex). The thread payload contains several URB handles which reference these separate memory segments. In GLSL, these inputs are represented as per-vertex arrays; the outermost array index selects which vertex's inputs to read. This array index does not necessarily need to be constant. To handle that, we need to use indirect addressing on GRFs to select which of the thread payload registers has the appropriate URB handle. (This is before we can even think about applying the pull model!) This patch introduces a new opcode which performs a MOV from a source using VxH indirect addressing (which allows each of the 8 SIMD channels to select distinct data.) Based on a patch by Jason Ekstrand. v2: Rename from INDIRECT_THREAD_PAYLOAD_MOV to MOV_INDIRECT; make it a bit more generic. Use regs_read() instead of hacking up the register allocator. (Suggested by Jason Ekstrand.) v3: Fix regs_read() to be more accurate for small unaligned regions. Also rebase on Matt's work. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> [v3] Reviewed-by: Abdiel Janulgue <[email protected]> [v1]
*	i965: Add a SHADER_OPCODE_URB_READ_SIMD8_PER_SLOT opcode.	Kenneth Graunke	2015-11-13	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \|	We need to use per-slot offsets when there's non-uniform indexing, as each SIMD channel could have a different index. We want to use them for any non-constant index (even if uniform), as it lives in the message header instead of the descriptor, allowing us to set offsets in GRFs rather than immediates. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Abdiel Janulgue <[email protected]>
*	i965: Combine register file field.	Matt Turner	2015-11-13	1	-0/+11
\| \| \| \| \| \| \| \|	The first four values (2-bits) are hardware values, and VGRF, ATTR, and UNIFORM remain values used in the IR. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Add and use enum brw_reg_file.	Matt Turner	2015-11-13	1	-4/+6
\| \| \| \| \|	Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Fill out instruction list.	Matt Turner	2015-11-12	1	-7/+31
\| \| \| \| \| \| \| \| \| \| \|	Add some instructions: illegal, movi, sends, sendsc. Remove some instructions with reused opcodes: msave, mrestore, push, pop, goto. I did have some gross code for disassembling opcodes per-generation, but there's very little meaningful overlap so it's probably not needed. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Map GL_PATCHES to 3DPRIM_PATCHLIST_n.	Kenneth Graunke	2015-11-11	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Inspired by a patch by Fabian Bieler. Fabian defined a _3DPRIM_PATCHLIST_0 macro (which isn't actually a valid topology type); I instead chose to make a macro that takes an argument. He also took the number of patch vertices from _mesa_prim (which was set to ctx->TessCtrlProgram.patch_vertices) - I chose to use it directly to avoid the need for the VBO patch. v2: Change macro to 0x20 + (n - 1) instead of 0x1F + n to better match the documentation (suggested by Ian). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	i965/fs/skl+: Use ld2dms_w instead of ld2dms	Neil Roberts	2015-11-05	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \|	In order to support 16x MSAA, skl+ has a wider version of ld2dms that takes two parameters for the MCS data. The MCS data retrieved from the ld_mcs instruction already returns 4 or 8 registers and is documented to return zeroes for the mcsh value when the sample count is less than 16. v2: Use get_lowered_simd_width to fall back to SIMD8 instructions when the message length would be too long in SIMD16. Reviewed-by: Ben Widawsky <[email protected]>
*	i965: Implement ARB_shader_stencil_export (gen9+)	Ben Widawsky	2015-10-21	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	v2: remove useless source_stencil_to_render_target (Ken) Squash in the actual packing function, which also got to v2: Move the definition of the OPCODE outside of FB_WRITE opcodes (Matt) Reorder the regioning to be in VWH order (Matt) Don't retype src in the backend, just assert instead (Matt) Rename the debug prints to something better (Matt) Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Enumerate logical fb writes arguments	Ben Widawsky	2015-10-21	1	-9/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Gen9 adds the ability to write out a stencil value, so we need to expand the virtual payload by one. Abstracting this now makes that change easier to read. I was admittedly confused early on about some of the hardcoding. If people believe the resulting code is inferior, I am not super attached to the patch. v2: Remove explicit numbering from the enumeration (Matt). Use a real naming scheme, and reference it in the opcode definition (Curro) Add a missed hardcoded logical position in get_lowered_simd_width (Ben) Add an assertion to make sure the component numbering is correct (Ben) Cc: Matt Turner <[email protected]> Cc: Francisco Jerez <[email protected]> Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Introduce a new SHADER_OPCODE_URB_READ_SIMD8 opcode.	Kenneth Graunke	2015-10-21	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \|	In scalar mode, geometry shader inputs can easily take up hundreds of registers. This makes pushing VUE entries impractical; we'll need to resort to the pull model in some cases. To support this, we introduce a new opcode corresponding to the "URB Read SIMD8" message. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965: Introduce new SHADER_OPCODE_URB_WRITE_SIMD8_MASKED/PER_SLOT opcodes.	Kenneth Graunke	2015-10-21	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	In the vec4 backend, we have a vec4_instruction::urb_write_flags field. There are many kinds of flags for SIMD4x2 messages. However, there are really only two (per-slot offset, use channel masks) for SIMD8 messages. Rather than adding a boolean flag for per-slot offsets (polluting all instructions), I decided to just make three new opcodes. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965: (trivial) rename computes stencil to gen9	Ben Widawsky	2015-10-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All the documentation I can find says that this bit (and functionality) only exists on SKL+. Since the bit isn't yet used, there is no real impact here. The original code was added by Ken here (a surprisingly long time ago): commit f3c6d6f1e151f6a44a76038dccebe4434038dcb1 Author: Kenneth Graunke <[email protected]> Date: Thu Nov 29 21:00:27 2012 -0800 i965: Update 3DSTATE_PS, 3DSTATE_WM, and add 3DSTATE_PS_EXTRA. Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Correct the comment about fb write payload	Ben Widawsky	2015-10-21	1	-2/+2
\| \| \| \| \|	Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
*	i965: Implement "Static Vertex Count" geometry shader optimization.	Kenneth Graunke	2015-09-26	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Broadwell's 3DSTATE_GS contains new "Static Output" and "Static Vertex Count" fields, which control a new optimization. Normally, geometry shaders can output arbitrary numbers of vertices, which means that resource allocation has to be done on the fly. However, if the number of vertices is statically known, the hardware can pre-allocate resources up front, which is more efficient. Thanks to the new NIR GS intrinsics, this is easy. We just call the function introduced in the previous commit to get the vertex count. If it obtains a count, we stop emitting the extra 32-bit "Vertex Count" field in the VUE, and instead fill out the 3DSTATE_GS fields. Improves performance of Gl32GSCloth by 5.16347% +/- 0.12611% (n=91) on my Lenovo X250 laptop (Broadwell GT2) at 1024x768. shader-db statistics for geometry shaders only: total instructions in shared programs: 3227 -> 3207 (-0.62%) instructions in affected programs: 242 -> 222 (-8.26%) helped: 10 v2: Don't break non-NIR paths (just skip this optimization). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>