mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	intel: Apply Geminilake "Barrier Mode" workaround.	Kenneth Graunke	2018-01-09	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Apparently, Geminilake requires you to whack a chicken bit to select either compute or tessellation mode for barriers. The recommendation is to switch between them at PIPELINE_SELECT time. We may not need to do this all the time, but I don't know that it hurts either. PIPELINE_SELECT is already a pretty giant stall. This appears to fix hangs in tessellation control shaders with barriers on Geminilake. Note that this requires a corresponding kernel change, drm/i915: Whitelist SLICE_COMMON_ECO_CHICKEN1 on Geminilake. in order for the register write to actually happen. Without an updated kernel, this register write will be noop'd and the fix will not work. Reviewed-by: Rafael Antognolli <[email protected]>
*	i965: Move PIPE_CONTROL defines and prototypes to brw_pipe_control.h.	Kenneth Graunke	2017-12-04	1	-43/+0
\| \| \| \| \| \| \| \| \|	We need to be able to emit PIPE_CONTROLs from genX_state_upload.c, which can't safely include brw_defines.h because it conflicts with genxml. Move all the PIPE_CONTROL related stuff together into a separate header. Reviewed-by: Matt Turner <[email protected]>
*	i965: Remove DWord length from MI_FLUSH_DW definition	Anuj Phogat	2017-11-17	1	-1/+1
\| \| \| \| \| \| \| \|	Fixes: 6165fda59b8 ("i965: Program DWord Length in MI_FLUSH_DW") Cc: <[email protected]> Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Nanley Chery <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/gen10: Implement Wa3DStateMode	Anuj Phogat	2017-11-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This workaround doesn't fix any of the piglit hangs we've seen on CNL. But it might be fixing something we haven't tested yet. V2: Remove the bits enabling Float blend optimization. It is enabled through CACHE_MODE_SS register. Update the comment. Move gen10 if block on top of gen9 if block. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
*	i965/gen10: Enable float blend optimization	Anuj Phogat	2017-11-03	1	-0/+3
\| \| \| \| \| \| \| \| \|	This optimization is enabled for previous generations too. See Mesa commit c17e214a6b On CNL this bit has been moved to CACHE_MODE_SS register. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
*	i965/gen10: Implement WaSampleOffsetIZ workaround	Anuj Phogat	2017-11-03	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are few other (duplicate) workarounds which have similar recommendations: WaFlushHangWhenNonPipelineStateAndMarkerStalled WaCSStallBefore3DSamplePattern WaPipeControlBefore3DStateSamplePattern WaPipeControlBefore3DStateSamplePattern has some extra recommendations if driver is using mid batch context restore. Ignoring it for now because We're not doing mid-batch context restore in Mesa. This workaround doesn't fix any of the piglit hangs we've seen on CNL. But it might be fixing something we haven't tested yet. V2: Use brw_load_register_imm32() to program CACHE_MODE_0. Get rid of brw_flush_gpu_caches(). V3: Make the workaround helper functions static. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]> Reviewed-by :Nanley Chery <[email protected]>
*	i965: Set "Subslice Hashing Mode" to 16x16 on Apollolake.	Kenneth Graunke	2017-08-02	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As of 4.11, the kernel isn't bothering to set the subslice hashing mode on Apollolake, leaving it at the default of 8x8. (It initializes it to 16x4 on most platforms.) Performance data for GPUTest Triangle on Apollolake at 1024x640: X-tiled RT: ----------- 8x8 -> 16x4: 2.4325% +/- 0.383683% (n=107) 8x8 -> 8x4: -3.75105% +/- 0.592491% (n=40) 8x8 -> 16x16: 6.17238% +/- 0.67157% (n=30) Y-tiled RT: ----------- 8x8 -> 16x4: 1.30307% +/- 0.297292% (n=205) 8x8 -> 8x4: -0.769282% +/- 0.729557% (n=35) 8x8 -> 16x16: 3.00254% +/- 0.715503% (n=40) 8x MSAA RT (INTEL_FORCE_MSAA=8): -------------------------------- 8x8 -> 16x4: 1.38889% +/- 0.93729% (n=7) 8x8 -> 8x4: -2.10643% +/- 1.15153% (n=3) 8x8 -> 16x16: 3.87183% +/- 1.08851% (n=5) Based on this, we choose 16x16 for Apollolake. Skylake GT2 with X-tiled buffers appears to be a toss-up between 16x4 and 16x16, and with Y-tiled buffers it doesn't seem to really matter. So we'll leave Skylake alone for now. The hashing mode doesn't seem to make a measurable impact on more complex benchmarks. Acked-by: Matt Turner <[email protected]>
*	i965: Switch to absolute addressing for constant buffer 0.	Kenneth Graunke	2017-07-13	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By default, 3DSTATE_CONSTANT_* Constant Buffer 0 is relative to dynamic state base address. This makes it unusable for pushing UBOs. I'd like to be able to use all four push buffers. There is a bit in the INSTPM register (or CS_DEBUG_MODE2 on Skylake) which controls whether buffer 0 is relative to dynamic state base address, or simply a normal pointer. Setting that gives us full flexibility. We can't currently write this on Haswell and earlier, and will need to update the kernel command parser, and then do the whole version checking song and dance. Reviewed-by: Matt Turner <[email protected]>
*	i965: Add Gen8+ INTEL_performance_query support	Robert Bragg	2017-06-27	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Enables access to OA unit metrics on Gen8+ via INTEL_performance_query. v2: make use of new parameters coming from gen_device_info (Lionel) Signed-off-by: Robert Bragg <[email protected]> Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Set the "Float Blend Optimization Enable" bit on Gen9+.	Kenneth Graunke	2017-05-30	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is woefully undocumented. It's some kind of optimization that avoids unnecessary render target reads when blending with a floating point render target, using independent alpha blending modes. The internal documentation indicates that this bit exists on Cherryview as well, but the other driver doesn't appear to set it on that platform. There's also some confusing wording that indicates that it may exist on Broadwell, but the documentation says it's reserved, so who knows. I was not able to find any workload that benefited from setting this bit, but it seems like a good idea to set it nonetheless. Reviewed-by: Plamena Manolova <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
*	i965: Move clip program compilation to the compiler	Jason Ekstrand	2017-05-26	1	-7/+0
\| \| \| \|	Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965: Move SF compilation to the compiler	Jason Ekstrand	2017-05-26	1	-2/+0
\| \| \| \|	Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965: Move MOCS macros to brw_context.h.	Rafael Antognolli	2017-05-03	1	-42/+0
\| \| \| \| \| \| \| \| \| \| \| \|	These macros are defined in brw_defines.h, which contains a lot of macros that conflict with autogenerated code from genxml. But we need to use them (the MOCS macros) in some of that same genxml code. Moving them to brw_context.h solves that problem and we don't have to include brw_defines.h. Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Delete tile resource mode code	Anuj Phogat	2017-03-27	1	-9/+0
\| \| \| \| \| \| \| \| \| \| \|	Yf/Ys tiling never got used in i965 due to not delivering the expected performance benefits. So, this patch is deleting this dead code in favor of adding it later in ISL when we actually find it useful. ISL can then share this code between vulkan and GL. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	intel: fix compiler build	Iago Toral Quiroga	2017-03-13	1	-8/+0
\| \| \| \| \| \| \| \| \|	compiler/brw_vec4_gs_visitor.cpp:744:39: error: ‘GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES’ was not declared in this scope output_vertex_size_bytes <= GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES); Fixes: d0d4a5f43b4 ("i965: split EU defines to brw_eu_defines.h") Reviewed-by: Emil Velikov <[email protected]>
*	i965: split EU defines to brw_eu_defines.h	Emil Velikov	2017-03-13	1	-1188/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Split out the EU defines from the 'generic' ones, as the former are more compiler oriented. With a later commit we'll move brw_eu_defines.h alongside the compiler infra to src/intel/. Pulling all the defines in there seems overzealous. Some defines are used by both i965 and the i965 compiler. Those are moved to brw_eu_defines.h, and annotated accordingly. The i965 users were updated to have the extre include to indicate that. With future work we might provide a better, split but for now this seems reasonable. Cc: Kenneth Graunke <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: move brw_define.h ifndef guard to the top	Emil Velikov	2017-03-13	1	-3/+3
\| \| \| \| \| \|	Cc: [email protected] Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: remove unused macros from brw_defines.h	Emil Velikov	2017-03-13	1	-19/+1
\| \| \| \| \| \| \| \| \| \| \|	The follow three groups are not used by neither the DRI module nor the compiler. BRW_POLYGON__FACING BRW_POLYGON_FACING_ BRW_STATELESS_BUFFER_* Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Delete vestiges of resource streamer code.	Kenneth Graunke	2017-03-06	1	-52/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	We never actually used the resource streamer in any shipping build of Mesa. We have no plans to do so in the future. We looked into using it in Vulkan, and concluded that it was unusable. We're not the only ones to arrive at the conclusion that it's not worth using. So, drop the last vestiges of resource streamer support and move on. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
*	i965: Replace BRW_SURFACEFORMAT_* with ISL_FORMAT_*.	Kenneth Graunke	2017-03-02	1	-247/+0
\| \| \| \| \| \| \| \| \| \| \| \|	One less set of enums. Dropped the #defines from brw_defines.h and ran: $ for file in .cpp .c .h; do sed -i \ -e 's/BRW_SURFACEFORMAT_/ISL_FORMAT_/g' \ -e 's/ISL_FORMAT_ASTC_[A-Zxs0-9_]/\U&/g' $file; \ done Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
*	i965: Reduce cross-pollination between the DRI driver and compiler	Jason Ekstrand	2017-03-01	1	-0/+2
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Get rid of BRW_PRIM_OFFSET	Jason Ekstrand	2017-03-01	1	-8/+0
\| \| \| \| \| \| \|	This is a relic of when we wired up meta to be able to use RECTLIST primitives. It's no longer needed. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/vec4: Rename DF to/from F generator opcodes	Iago Toral Quiroga	2017-01-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	The opcodes are not specific for conversions to/from float since we need the same for conversions to/from other 32-bit types. Rename the opcodes accordingly and change the asserts to check the size of the types involved instead. v2: - Rename to VEC4_OPCODE_TO_DOUBLE and VEC4_OPCODE_FROM_DOUBLE (Matt) Reviewed-by: Matt Turner <[email protected]>
*	i965/vec4: add VEC4_OPCODE_SET_{LOW,HIGH}_32BIT opcodes	Iago Toral Quiroga	2017-01-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These opcodes will set the low/high 32-bit in each 64-bit data element using Align1 mode. We will use this to implement packDouble2x32. We use Align1 mode because in order to implement this in Align16 mode we would need to use 32-bit logical swizzles (XZ for low, YW for high), but the IR works in terms of 64-bit logical swizzles for DF operands all the way up to codegen. v2: - use suboffset() instead of get_element_ud() - no need to set the width on the dst Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965/vec4: add VEC4_OPCODE_PICK_{LOW,HIGH}_32BIT opcodes	Iago Toral Quiroga	2017-01-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These opcodes will pick the low/high 32-bit in each 64-bit data element using Align1 mode. We will use this, for example, to do things like unpackDouble2x32. We use Align1 mode because in order to implement this in Align16 mode we would need to use 32-bit logical swizzles (XZ for low, YW for high), but the IR works in terms of 64-bit logical swizzles for DF operands all the way up to codegen. v2: - use suboffset() instead of get_element_ud() - no need to set the width on the dst Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965/vec4: add double/float conversion pseudo-opcodes	Iago Toral Quiroga	2017-01-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These need to be emitted as align1 MOV's, since they need to have a stride of 2 on the float register (whether src or dest) so that data from another thread doesn't cross the middle of a SIMD8 register. v2 (Iago): - The float-to-double needs to align 32-bit data to 64-bit before doing the conversion. This was doable in align16 when we tried to use an execsize of 4, but with an execsize of 8 we would need another align1 opcode to do that (since we need data to cross the middle of a SIMD register). Just making the opcode handle this internally seems more practical that adding another opcode just for this purpose and having the caller know about this before converting. - The double-to-float conversion produces 32-bit elements aligned to 64-bit so we make the opcode re-pack the result to 32-bit and fit in one register, as expected by SIMD4x2 operation. This still requires that callers reserve two registers for the float data destination because we need to produce 64-bit aligned data first, and repack it later on the same destination register, but it saves the need for a re-pack opcode only to achieve this making the operation complete in a single opcode. Hopefully that is worth the weirdness of the double register allocation... Signed-off-by: Connor Abbott <[email protected]> Signed-off-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965/fs: Remove the FS_OPCODE_SET_SIMD4X2_OFFSET virtual opcode.	Francisco Jerez	2016-12-14	1	-1/+0
\| \| \| \| \| \|	Not used anymore. It was just a scalar MOV. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Factor out oword block read and write message control calculation.	Francisco Jerez	2016-12-14	1	-0/+6
\| \| \| \| \| \| \| \| \|	We'll need roughly the same logic in other places and it would be annoying to duplicate it. Instead factor it out into a function-like macro that takes the number of dwords per block (which will prove more convenient than taking the same value in owords or some other unit). Reviewed-by: Kenneth Graunke <[email protected]>
*	treewide: s/comparitor/comparator/	Ilia Mirkin	2016-12-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	git grep -l comparitor \| xargs sed -i 's/comparitor/comparator/g' Just happened to notice this in a patch that was sent and included one of the tokens in question. Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Nicolai Hähnle <[email protected]>
*	i965: enable INTEL_conservative_rasterization on Gen9+	Lionel Landwerlin	2016-12-07	1	-0/+1
\| \| \| \| \|	Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
*	i965/fs: Refactor handling of constant tg4 offsets	Jason Ekstrand	2016-11-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, we had an OFFSET_VALUE source for logical texture instructions that was intended to mean exactly what it says, "offset". In reality, we only fully used it for tg4 offsets. We used offset_value.file == IMM to mean, "you have a constant offset, go look in instr->offset" and didn't actually use the contents of the register at all in that case except for in nir_emit_texture where we used it as a temporary before we copy it into instr->offset. This commit renames OFFSET_VALUE to TG4_OFFSET and restricts its usage to indirect tg4 offsets only. The nir_emit_texture code is refactored so that we explicitly build a header_bits value which is placed in instr->offset and the constant offset values (both for tg4 and regular texture operations) are used to construct header_bits and don't go through the offset source at all. Finally, we stop passing offset_value in to lower_sampler_logical_send_gen5 because we can't do indirect offsets until gen7 anyway. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Use 3DSTATE_CLIP's User Clip Distance Enable bitmask on Gen8+.	Kenneth Graunke	2016-11-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Gen6-7.5 specify the user clip distance enable bitmask in 3DSTATE_CLIP. Gen8+ normally uses the new internal signalling mechanism to select the one specified in the last enabled shader stage (3DSTATE_VS, DS, or GS). This is a pretty good fit for Vulkan, or even newer GL, where the bitmask comes entirely from the shader. But with glClipPlane(), this is dynamic state, and we have to listen to _NEW_TRASNFORM. Clip plane enables are the only reason the VS/DS/GS atoms need to listen to _NEW_TRANSFORM. 3DSTATE_CLIP already has to listen to it in order to support ARB_clip_control settings. Setting the "Use the 3DSTATE_CLIP bitmask" force enable bit allows us to drop _NEW_TRANSFORM from all the shader stage atoms, so we can re-emit them less often. Improves performance of OglBatch7 (version 6) by 2.70773% +/- 0.491257% (n = 38) at 1024x768 on Cherryview. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965/ir: Update several stale comments.	Francisco Jerez	2016-09-14	1	-1/+1
\| \| \| \|	Reviewed-by: Iago Toral Quiroga <[email protected]>
*	i965/fs: Define logical framebuffer read opcode and lower it to physical reads.	Francisco Jerez	2016-08-25	1	-0/+1
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Define framebuffer read virtual opcode.	Francisco Jerez	2016-08-25	1	-0/+3
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/eu: Add codegen support for the Gen9+ render target read message.	Francisco Jerez	2016-08-25	1	-0/+4
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Fix undefined signed overflow in INTEL_MASK for bitfields of 31 bits.	Francisco Jerez	2016-08-25	1	-1/+1
\| \| \| \| \| \| \|	Most likely we had only ever used this macro on bitfields of less than 31 bits -- That's going to change shortly. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Roll intel_reg.h into brw_defines.h	Jason Ekstrand	2016-08-19	1	-0/+273
\| \| \| \| \| \| \| \|	More than half of the stuff in intel_reg.h had nothing whatsoever to do with registers and really belongs in brw_defines.h anyway. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Delete the FS_OPCODE_INTERPOLATE_AT_CENTROID virtual opcode.	Kenneth Graunke	2016-07-20	1	-1/+0
\| \| \| \| \| \| \| \| \|	We no longer use this message. As far as I can tell, it's fairly useless - the equivalent information is provided in the payload. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Rename brw_wm_barycentric_interp_mode to brw_barycentric_mode.	Kenneth Graunke	2016-07-15	1	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	brw_wm_barycentric_interp_mode is wordy, brw_barycentric_mode is less typing and suffers from fewer line wrapping problems. The enum values themselves don't really benefit from "WM" in the name, either. Put "BARYCENTRIC" first instead of at the end and drop "WM". Generated by: for file in .c .cpp .h; do sed -i \ -e 's/brw_wm_barycentric_interp_mode/brw_barycentric_mode/g' \ -e 's/BRW_WM_$[A-Z_]$_BARYCENTRIC/BRW_BARYCENTRIC_\1/g' \ -e 's/BRW_WM_BARYCENTRIC_INTERP_MODE_COUNT/BRW_BARYCENTRIC_MODE_COUNT/g' \ $file; done with a few whitespace changes. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: enable the emission of the DIM instruction	Samuel Iglesias Gonsálvez	2016-07-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	v2 (Matt): - Take a DF source argument for the DIM instruction emission in the visitors. - Indentation. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965/docs: update Intel Linux Graphics URLs	Eric Engestrom	2016-07-06	1	-1/+1
\| \| \| \|	Signed-off-by: Eric Engestrom <[email protected]>
*	i965: Support new local ID push constant & cross-thread constants	Jordan Justen	2016-06-01	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The cross thread constant support appears on Haswell. It allows us to upload a set of uniform data for all threads without duplicating it per thread. We also support per-thread data which allows us to store a per-thread ID in one of the uniforms that can be used to calculate the gl_LocalInvocationIndex and gl_LocalInvocationID variables. v4: * Support the old local ID push constant layout as well (Jason) Cc: "12.0" <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Implement opt_sampler_eot() in terms of logical sends.	Francisco Jerez	2016-05-29	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This makes the whole LOAD_PAYLOAD munging unnecessary which simplifies the code and will allow the optimization to succeed in more cases independent of whether the LOAD_PAYLOAD instruction can be found or not. The following patch is squashed in: SQUASH: i965/fs: Add basic dataflow check to opt_sampler_eot(). The sampler EOT optimization pass naively assumes that the texturing instruction provides all the data used by the FB write just because they're standing next to each other. The least we should be checking is whether the source and destination regions of the FB write and texturing instructions match. Without this the previous seemingly harmless patch would have caused opt_sampler_eot() to misoptimize a shader from dota-2 causing DCE to eliminate all of its 78 instructions except for the final sampler EOT message (!). Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/ir: Make BROADCAST emit an unmasked single-channel move.	Francisco Jerez	2016-05-27	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Alternatively we could have extended the current semantics to 32-wide mode by changing brw_broadcast() to emit multiple indexed MOV instructions in the generator copying the selected value to all destination registers, but it seemed rather silly to waste EU cycles unnecessarily copying the exact same value 32 times in the GRF. The vstride change in the Align16 path is required to avoid assertions in validate_reg() since the change causes the execution size of the MOV and SEL instructions to be equal to the source region width. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Remove FS_OPCODE_PACK_STENCIL_REF virtual instruction.	Francisco Jerez	2016-05-27	1	-1/+0
\| \| \| \| \| \|	It's just a byte MOV with strided source. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Remove extract virtual opcodes.	Francisco Jerez	2016-05-27	1	-12/+0
\| \| \| \| \| \| \|	These can be easily represented in the IR as a MOV instruction with strided source so they seem rather redundant. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Handle SAMPLEINFO consistently like other texturing instructions.	Francisco Jerez	2016-05-27	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Seems like this texturing opcode was missing its logical counterpart which would prevent it from taking advantage of the SIMD lowering infrastructure, define it and plumb it through the back-end. At some point we'll likely want to emit a single SAMPLEINFO message shared among all channels irrespective of this change, but for the moment this should be enough to get the intrinsic working in SIMD32 mode. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Rename Gen4 physical varying pull constant load opcode.	Francisco Jerez	2016-05-27	1	-1/+1
\| \| \| \| \| \| \| \| \|	For consistency with the Gen7 variant. I'm not doing the same to the uniform pull constant message at this point because the non-GEN7 one is still overloaded to be either an expression-like logical instruction or a Gen4-specific physical send message. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Hide varying pull constant load message setup behind logical opcode.	Francisco Jerez	2016-05-27	1	-0/+1
\| \| \| \| \| \| \| \|	This will allow the SIMD lowering pass to split 32-wide varying pull constant loads (not natively supported by the hardware) into 16-wide instructions. Reviewed-by: Jason Ekstrand <[email protected]>