aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_fs.cpp
Commit message (Collapse)AuthorAgeFilesLines
* i965 Gen4/5: Generalize SF interpolation setup for GLSL1.3Chris Forbes2013-08-011-1/+1
| | | | | | | | | | | | | | | | | | | | Previously the SF only handled the builtin color varying specially. This patch generalizes that support to cover user-defined varyings, driven by the interpolation mode array set up alongside the VUE map. Based on the following patches from Olivier Galibert: - http://lists.freedesktop.org/archives/mesa-dev/2012-July/024335.html - http://lists.freedesktop.org/archives/mesa-dev/2012-July/024339.html With this patch, all the GLSL 1.3 interpolation tests that do not clip (spec/glsl-1.30/execution/interpolation/*-none.shader_test) pass. V5: Move key.do_flat_shading to brw_sf_compile.has_flat_shading; drop vestigial hunks. V6: Real bools. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move intel_context::gen and gt fields to brw_context.Kenneth Graunke2013-07-091-32/+30
| | | | | | | | | | Most functions no longer use intel_context, so this patch additionally removes the local "intel" variables to avoid compiler warnings. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::is_<platform> flags to brw_context.Kenneth Graunke2013-07-091-1/+1
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::perf_debug to brw_context.Kenneth Graunke2013-07-091-2/+2
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::batch to brw_context.Kenneth Graunke2013-07-091-3/+3
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* glsl: Remove ir_print_visitor.h includes and usageEric Anholt2013-06-211-1/+0
| | | | | | | | | | | | | We have ir->print() to do the old declaration of a visitor and having the IR accept the visitor (yuck!). And now you can call _mesa_print_ir() safely anywhere that you know what an ir_instruction is. A couple of missing printf("\n")s are added in error paths -- when an expression is handed to the visitor, it doesn't print '\n' (since it might be a step in printing a whole expression tree). Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: fix problem with constant out of bounds access (v3)Dave Airlie2013-06-041-1/+14
| | | | | | | | | | | | | | | | | | | | | | | Okay I now understand why Frank would want to run away, this is my attempt at fixing the CVE out of bounds access to constants outside the range. This attempt converts any illegal constants to constant 0 as per the GL spec, and is undefined behaviour. A future patch should add some debug for users to find this out, but this needs to be backported to stable branches. CVE-2013-1872 v2: drop the last hunk which was a separate fix (now in master). hopefully fix the indentations. v3: don't fail piglit, the whole 8/16 dispatch stuff was over my head, and I spent a while figuring it out, but this one is definitely safe, one piglit pass extra on my Ironlake. NOTE: This is a candidate for stable branches. Signed-off-by: Dave Airlie <[email protected]>
* Revert "i965: fix problem with constant out of bounds access (v2)"Kenneth Graunke2013-05-291-11/+1
| | | | | | | | This reverts commit 98dfd59a0445666060c97b0dccaf0e9f030b547a. The patch was clearly not Piglit tested, as it caused at least 225 tests to start crashing with assertion failures. That was before my desktop tanked and the test run died completely.
* i965: fix problem with constant out of bounds access (v2)Dave Airlie2013-05-301-1/+11
| | | | | | | | | | | | | | | | | | | | This is my attempt at fixing this as the CVE is making RH security team care enough to make me look at this. (please upstream, security fixes are more important than whatever else you are doing, if for no other reason than it saves me having to fix stuff I've no real clue about). Since Frank's original fix was denied, here is my attempt to just alias all constants that are out of bounds < 0 or > nr_params to constant 0, hopefully this provides the undefined behaviour idr requires.. CVE-2013-1872 v2: drop the last hunk which was a separate fix (now in master). hopefully fix the indentations. NOTE: This is a candidate for stable branches. Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* i965/fs: Make virtual grf live intervals actually cover their used range.Eric Anholt2013-05-091-10/+11
| | | | | | | | | | | | | | | | | | Previously, we would sometimes not consider a write to a register to extend the end of the interval, nor would we consider a read before a write to extend the start. This made for a bunch of complicated logic related to how to treat the results when dead code might be present. Instead, just extend the interval and fix dead code elimination to know how to remove it. Interestingly, this actually results in a tiny bit more optimization: total instructions in shared programs: 1391220 -> 1390799 (-0.03%) instructions in affected programs: 14037 -> 13616 (-3.00%) v2: Fix a theoretical problem with the simd16 workaround if dst == src, where we would revert the bump of the live range. Reviewed-by: Ian Romanick <[email protected]> (v1)
* i965/fs: Add support for bit instructions.Matt Turner2013-05-061-0/+7
| | | | | | | | | | | | Don't bother scalarizing ir_binop_bfm, since its results are identical for all channels. v2: Subtract result of FBH from 31 (unless an error) to convert MSB counts to LSB counts. v3: Use op0->clone() in ir_triop_bfi to prevent (var_ref channel_expressions) from appearing multiple times in the IR. Reviewed-by: Chris Forbes <[email protected]> [v2]
* i965: Share the register file enum between the two backends.Eric Anholt2013-05-021-6/+6
| | | | | | | | | | I need this so I can look at vec4 and fs registers' files from the same .cpp file without namespaces. As far as I can tell we never rely on the particular numerical values of the files, though I thought it sounded like a good idea when doing the VS (it turns out having 0 be BAD_FILE is nicer). Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Make dump_instructions be a virtual method of the visitor.Eric Anholt2013-05-021-12/+3
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Allow LRPs with uniform registers.Eric Anholt2013-04-291-0/+6
| | | | | | | | | Improves GLB2.7 performance on my HSW by 0.671455% +/- 0.225037% (n=62). v2: Make is_valid_3src() a method of the fs_reg. (recommended by Ken) Reviewed-by: Matt Turner <[email protected]> (v1) Reviewed-by: Kenneth Graunke <[email protected]> (v1)
* i965: Move is_math/is_tex/is_control_flow() to backend_instruction.Kenneth Graunke2013-04-291-45/+0
| | | | | | | | | | | | | | | These are entirely based on the opcode, which is available in backend_instruction. It makes sense to only implement them in one place. This changes the VS implementation of is_tex() slightly, which now accepts FS_OPCODE_TXB and SHADER_OPCODE_LOD. However, since those aren't generated in the VS anyway, it should be fine. This also makes is_control_flow() available in the VS. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Don't try to use bogus interpolation modes pre-Gen6.Chris Forbes2013-04-301-9/+17
| | | | | | | | | | | | | | | | | | | | | Interpolation modes other than perspective-barycentric-pixel-center (and their associated coefficients in the WM payload) only exist in Gen6 and later. Unfortunately, if a varying was declared as `centroid`, we would blindly read the nonexistant values, and so produce all manner of bad behavior -- texture swimming, snow, etc. Fixes rendering in Counter-Strike Source and Team Fortress 2 on Ironlake. NOTE: This is a candidate for the 9.1 branch. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Tested-by: Jordan Justen <[email protected]>
* i965: Avoid recompiles for fragment clamping on non-clamping APIs.Eric Anholt2013-04-251-1/+1
| | | | | | | | | | Removes 75/78 state-dependent recompiles in GLB2.7 (the remaining 3 are due to FBO-rendering size predictions). We currently expose GL_ARB_color_buffer_float on GL core, so we may mis-predict there, but I'm about to send a patch for removing that silly extension in that case. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Silence one more compile warning.Eric Anholt2013-04-121-0/+1
| | | | | | | | We don't want to store this thing in the class, and we do need the definition to be at the top of the function and held onto until the end here, so there's not much to do besides (void) reference it. Reviewed-by: Matt Turner <[email protected]>
* i965: Fix an unused variable warning in the release build.Eric Anholt2013-04-121-1/+0
| | | | | | It's used in an assert, but we have this as a member of the class anyway. Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Fix some untriggered optimization bugs with uncompressed/sechalf.Eric Anholt2013-04-121-4/+4
| | | | | | | | | We have this support for firsthalf/sechalf instructions, which would be called in the !has_compr4 (aka original gen4) 16-wide case. We currently only support 16-wide for gen5+, so we weren't tripping over this, but it would have been a problem if we ever try to enable it. Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Add basic-block-level dead code elimination.Eric Anholt2013-04-121-0/+160
| | | | | | | | | | | | | | | | | This is a poor substitute for proper global dead code elimination that could replace both our current paths, but it was very easy to write. It particularly helps with Valve's shaders that are translated out of DX assembly, which has been register allocated and thus have a bunch of unrelated uses of the same variable (some of which get copy-propagated from and then left for dead). shader-db results: total instructions in shared programs: 1735753 -> 1731698 (-0.23%) instructions in affected programs: 492620 -> 488565 (-0.82%) v2: Fix comment typo Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Remove incorrect note of writing attr in centroid workaround.Eric Anholt2013-04-121-1/+1
| | | | | | | | This instruction doesn't update its IR destination, it just moves from payload to f0. This caused the dead code elimination pass I'm adding to dead-code-eliminate the first step of interpolation. Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Add a helper function for checking for partial register updates.Eric Anholt2013-04-121-14/+20
| | | | | | | | These checks were all over, and every time I wrote one I had to try to decide again what the cases were for partial updates. v2: Fix inadvertent reladdr check removal. Reviewed-by: Matt Turner <[email protected]>
* i965: NULL check prog on shader compilation failure.Matt Turner2013-04-111-4/+6
| | | | | | Also change if (shader) to if (prog) for consistency. Reviewed-by: Eric Anholt <[email protected]>
* i965: Rename backend_visitor::prog to shader_prog.Paul Berry2013-04-111-4/+4
| | | | | | | | | | | | The next patch is going to change the type of vec4_visitor::vp from struct gl_vertex_program * to struct gl_program *, and rename it. The sensible name to change it to is vec4_visitor::prog. However, prog is already used in backend_visitor (which vec4_visitor derives from). Since backend_visitor::prog is of type struct gl_shader_program *, it makes sense to rename it to shader_prog. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove now dead brw_wm_prog_key::proj_attrib_mask field.Kenneth Graunke2013-04-041-12/+0
| | | | | | | | | The previous commit removed the last user of this field, so there's no longer any point in setting it. Removing this should eliminate state-dependent recompiles, and make the precompile more reliable. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Remove fixed-function texture projection avoidance optimization.Kenneth Graunke2013-04-041-25/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This optimization attempts to avoid extra attribute interpolation instructions for texture coordinates where the W-component is 1.0. Unfortunately, it requires a lot of complexity: the brw_wm_input_sizes state atom (all the brw_vs_constval.c code) needs to run on each draw. It computes the input_size_masks array, then uses that to compute proj_attrib_mask. Differences in proj_attrib_mask can cause state-dependent fragment shader recompiles. We also often fail to guess proj_attrib_mask for the fragment shader precompile, causing us to needlessly compile it twice. Furthermore, this optimization only applies to fixed-function programs; it does not help modern GLSL-based programs at all. Generally, older fixed-function programs run fine on modern hardware anyway. The optimization has existed in some form since the initial commit. When we rewrote the fragment shader backend, we dropped it for a while. Eric readded it in commit eb30820f268608cf451da32de69723036dddbc62 as part of an attempt to cure a ~1% performance regression caused by converting the fixed-function fragment shader generation code from Mesa IR to GLSL IR. However, no performance data was included in the commit message, so it's unclear whether or not it was successful. Time has passed, so I decided to re-measure this. Surprisingly, Eric's OpenArena timedemo actually runs /faster/ after removing this and the brw_wm_input_sizes atom. On Ivybridge at 1024x768, I measured a 1.39532% +/- 0.91833% increase in FPS (n = 55). On Ironlake, there was no statistically significant difference (n = 37). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Use LD messages for pre-gen7 varying-index uniform loadsEric Anholt2013-04-011-48/+44
| | | | | | | | | | This comes at a minor performance cost at the moment (-3.2% +/- 0.2%, n=14 on my GM45 forced to load all uniforms through the varying-index path), but we get a whole vec4 at a time to reuse in the next commit. v2: Fix comment about channels in the other message. Reviewed-by: Kenneth Graunke <[email protected]> NOTE: This is a candidate for the 9.1 branch.
* i965/fs: Don't double-emit SEND dependency workarounds at control flow.Eric Anholt2013-04-011-0/+2
| | | | | | | | | We weren't setting needs_dep[i] in the loops, so we'd continue on to potentially add the same workaround MOVs to the later basic block boundaries, too. We can either set needs_dep[i] to exit through the normal path, or we can just return since we know we're done. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Bake regs_written into the IR instead of recomputing it later.Eric Anholt2013-04-011-19/+10
| | | | | | | | | For sampler messages, it depends on the target gen, and on gen4 SIMD16-sampler-on-SIMD8-execution we were returning 4 instead of 8 like we should. Reviewed-by: Kenneth Graunke <[email protected]> NOTE: This is a candidate for the 9.1 branch.
* i965/fs: Improve performance of varying-index uniform loads on IVB.Eric Anholt2013-04-011-5/+24
| | | | | | | | | | | | | | | | | | | Like we have done for the VS and for constant-index uniform loads, we use the sampler engine to get caching in front of the L3 to avoid tickling the IVB L3 bug. This is also a bit of a functional change, as we're now loading a vec4 instead of a single dword, though we're not taking advantage of the other 3 components of the vec4 (yet). With the driver hacked to always take the varying-index path for all uniforms, improves performance of my old GLSL demo by 315% +/- 2% (n=4). This a major fix for some blur shaders in compositors from the varying-index uniforms support I introduced in 9.1. v2: Move old offset computation into the pre-gen7 path. Reviewed-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554 NOTE: This is a candidate for the 9.1 branch.
* i965/fs: Avoid inappropriate optimization with regs_written > 1.Eric Anholt2013-04-011-0/+6
| | | | | | | | Right now we don't have anything with regs_written() > 1 and !inst->mlen, but that's about to change. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make the fragment shader pull constants index by dwords, not vec4s.Eric Anholt2013-04-011-1/+4
| | | | | | | | | | | | | | | | | We want to load vec4s, since loading a vec4 instead of a dword is basically no increased latency. But for variable indexed access, the previous requirement of aligned vec4s for a sampler LD was hard to implement. Note that this change only affects those messages that use the surface format, like sampler LDs, but not to the untyped data cache loads we've used in other cases. No significant performance difference on my GLSL demo with uniforms forced to take the varying pull constants path (n=4). NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Move varying uniform offset compuation into the helper func.Eric Anholt2013-04-011-7/+9
| | | | | | | | I'm going to want to change the math for gen7 using sampler LD instructions in a way that gets CSE to occur like we'd hope. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Remove creation of a MOV instruction that's never used.Eric Anholt2013-04-011-1/+0
| | | | | | | | We weren't inserting it into the list, so it did nothing. This line was replaced by the MOV/MUL block above. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix INTEL_DEBUG=shader_time for fragment shaders with discards.Kenneth Graunke2013-03-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "discard" instructions generate HALT instructions which jump to a final HALT near the end of the shader. Previously, fs_generator created this final jump target when it saw the first FS_OPCODE_FB_WRITE, causing it to jump right before the FB write epilogue. This is normally good. However, INTEL_DEBUG=shader_time also has an epilogue section which records the final timestamp. The frontend emits IR for this just before FS_OPCODE_FB_WRITE. Unfortunately, this led to the following ordering: 1. Shader Time Epilogue 2. Final HALT (where discards jump) 3. Framebuffer Write Epilogue This meant that discarded pixels completely skipped the shader time epilogue, causing no ending timestamp to be written. This obviously led to inaccurate results. This patch adds a new FS_OPCODE_PLACEHOLDER_HALT in the IR stream just before any epilogue sections. This is where the final HALT should be generated, and makes it easy to ensure the correct ordering: 1. Final HALT 2. Shader Time Epilogue 3. Framebuffer Write Epilogue For shaders that don't discard, this opcode compiles away to nothing. The scheduler adds barrier dependencies to make sure that it doesn't get moved above any FS_OPCODE_DISCARD_JUMP instructions. One 8-wide shader in GLBenchmark 2.7 dropped from 2291.67 Gcycles to a mere 5.13 Gcycles. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Add names for all instructions to dump_instruction() in FS and VS.Eric Anholt2013-03-291-19/+1
| | | | | | | I'd previously added the minimum names to understand my dumps, but this makes dumps in general much easier to read. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Generate LOD sampler message from ir_lod.Matt Turner2013-03-291-1/+3
| | | | | v2: Support Ironlake as well. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Include everything but the final FB write in shader_time.Eric Anholt2013-03-281-3/+0
| | | | | | | | Previously, if you just wrote a constant color to the render target, no time got noted at all. This is convenient for doing single-instruction timings, but not so much for actual program analysis. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Switch shader_time writes to using GRFs.Eric Anholt2013-03-281-13/+16
| | | | | | | | | This avoids conflicts between shader_time and FB writes, so we can include more of the program under our profiling. This does mean hiding more of the message setup from the optimizer, which doesn't have a way to handle multi-reg sends from GRFs. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Track ARB program state along with GLSL state for shader_time.Eric Anholt2013-03-281-12/+2
| | | | | | This will let us do much better printouts for non-GLSL programs. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Rename vp_outputs_written to input_slots_valid.Paul Berry2013-03-241-3/+3
| | | | | | | | | | | | | | With the introduction of geometry shaders, fragment inputs will no longer come exclusively from the vertex shader; sometimes they come from the geometry shader. So the name "vp_outputs_written" will become a misnomer. This patch renames vp_outputs_written to input_slots_valid, to reflect the true meaning of the bitfield from the fragment shader's point of view: it indicates which of the possible input slots contain valid data that was written by the previous shader stage. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Avoid unnecessary recompiles due to POS bit of proj_attrib_mask.Paul Berry2013-03-191-0/+6
| | | | | | | | | | | | | | | | | | | | | | Previous to this patch, when using fixed function fragment shading, bit VARYING_BIT_POS of brw_wm_prog_key::proj_attrib_mask was being set differently during precompiles and normal usage. During precompiles it was being set only if the fragment shader reads from window position (which it never does), so it was always being set to 0. During normal usage it was being set if the vertex shader writes to all 4 components of gl_Position (which it usually does), so it was usually being set to 1. As a result, we were almost always doing an extra recompile for the fixed function fragment shader. The recompile was totally unnecessary, though, because brw_wm_prog_key::proj_attrib_mask is only consulted for fs_visitor::emit_general_interpolation(), which isn't used for VARYING_SLOT_POS. This patch avoids the unnecessary recompile by always setting bit VARYING_BIT_POS of brw_wm_prog_key::proj_attrib_mask to 1. Reviewed-by: Kenneth Graunke <[email protected]>
* Replace gl_frag_attrib enum with gl_varying_slot.Paul Berry2013-03-151-12/+12
| | | | | | | | | | | | This patch makes the following search-and-replace changes: gl_frag_attrib -> gl_varying_slot FRAG_ATTRIB_* -> VARYING_SLOT_* FRAG_BIT_* -> VARYING_BIT_* Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Brian Paul <[email protected]>
* Get rid of _mesa_vert_result_to_frag_attrib().Paul Berry2013-03-151-8/+4
| | | | | | | | | | | | | Now that there is no difference between the enums that represent vertex outputs and fragment inputs, there's no need for a conversion function. But we still need to be able to detect when a given vertex output has no corresponding fragment input. So it is replaced by a new function, _mesa_varying_slot_in_fs(), which tells whether the given varying slot exists as an FS input or not. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Brian Paul <[email protected]>
* Replace gl_vert_result enum with gl_varying_slot.Paul Berry2013-03-151-4/+4
| | | | | | | | | | | This patch makes the following search-and-replace changes: gl_vert_result -> gl_varying_slot VERT_RESULT_* -> VARYING_SLOT_* Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Brian Paul <[email protected]>
* i965: Change fragment input related bitfields to 64-bit.Paul Berry2013-03-151-3/+4
| | | | | | | | | | | | | | This patch updates the bitfields brw_context::wm.input_size_masks, tracker::size_masks, and brw_wm_prog_key::proj_attrib_mask, all of which are indexed by gl_frag_attrib, from 32-bit to 64-bit. This paves the way for supporting geometry shaders, and for merging the gl_frag_attrib and gl_vert_result enums. The combination of these two will require at least 55 bits in the bitfields. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Tested-by: Brian Paul <[email protected]>
* i965: Split shader_time entries into separate cachelines.Eric Anholt2013-03-141-1/+1
| | | | | | | | | | | | | This avoids some snooping overhead between EUs processing separate shaders (so VS versus FS). Improves performance of a minecraft trace with shader_time by 28.9% +/- 18.3% (n=7), and performance of my old GLSL demo by 93.7% +/- 0.8% (n=4). v2: Add a define for the stride with a comment explaining its units and why. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Also do the gen4 SEND dependency workaround against other SENDs.Eric Anholt2013-03-111-9/+15
| | | | | | | | | | | | We were handling the the dependency workaround for the first written reg of a send preceding the one we're fixing up, but didn't consider the other regs. Thus if you had two sampler calls that got allocated to the same set of regs, one might, rarely, ovewrite the other. This was occurring in XBMC's GLSL shaders. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44567 NOTE: This is a candidate for the stable branches. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Switch to using sampler LD messages for uniform pull constants.Eric Anholt2013-03-111-18/+21
| | | | | | | | | When forcing the compiler to always generate pull constants instead of push constants (in order to have an easy to use testcase), improves performance of my old GLSL demo 23.3553% +/- 1.42968% (n=7). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60866 Reviewed-by: Kenneth Graunke <[email protected]>