mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	i965/fs: Improve performance of copy/constant propagation.	Eric Anholt	2012-10-08	1	-2/+1
\| \| \| \| \| \| \| \|	Use a simple chaining hash table for the ACP. This is not really very good, because we still do a full walk of the tree per destination write, but it still reduces fp-long-alu runtime from 5.3 to 3.9s. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Move constant propagation to the same codebase as copy prop.	Eric Anholt	2012-10-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	This means that we don't get constant prop across into the first block after a BRW_OPCODE_IF or a BRW_OPCODE_DO, but we have hope for properly doing it across control flow at some point. More importantly, with the next commit it will help avoid O(n^2) with instruction count runtime for shaders that have many constant moves. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Replace brw_wm_* with dumping code into the fs_visitor.	Eric Anholt	2012-10-08	1	-2/+28
\| \| \| \| \| \| \| \| \| \| \|	This makes a giant pile of code newly dead. It also fixes TXB on newer chipsets, which has been totally broken (I now have a piglit test for that). It passes the same set of Ian's ARB_fragment_program tests. It also improves high-settings ETQW performance by 3.2 +/- 1.9% (n=3), thanks to better optimization and having 8-wide along with 16-wide shaders. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=24355 Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Pull ir_binop_min/ir_binop_max handling to a separate function.	Eric Anholt	2012-10-08	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	This will be reused from the ARB_fp compiler. I touched up the pre-gen6 path to not overwrite dst in the first instruction, which prevents the need for aliasing checks (we'll need that in the ARB_fp compiler, but it actually hasn't been needed in this codebase since the revert of the nasty old MOV-avoidance code). I also made the conditional_mod between gen6 and pre-gen6 consistent, which shouldn't matter except for denorm/(+/-)0 comparisons where the choice between left and right hand side of the comparison changes. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Refactor rectangle/GL_CLAMP texture coordinate adjustment.	Eric Anholt	2012-10-08	1	-1/+2
\| \| \| \| \| \| \| \|	We'll want to reuse this for ARB_fp handling. v2: Fold the remaining bit of emit_texcoord back into visit(ir_texture). Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Pass fragment depth to the fb write as a fs_reg, not an ir_variable.	Eric Anholt	2012-10-08	1	-1/+1
\| \| \| \| \| \|	This will be used for the ARB_fp change to use this backend. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Index sampler program key data by linker-assigned index.	Kenneth Graunke	2012-08-27	1	-1/+1
\| \| \| \| \| \| \| \| \|	Now that most things are based on the linker-assigned index, it makes sense to convert the arrays in the VS/WM program key as well. It seems silly to leave them indexed by texture unit. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965/fs: Remove unused 'sampler' parameter in emit_texture_genX().	Kenneth Graunke	2012-08-25	1	-6/+3
\| \| \| \| \| \| \|	Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965/fs: Communicate the pull constant block read parameters through fs_regs.	Eric Anholt	2012-08-07	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \|	I wanted to add the surface index as a variable value for UBO support, and a reg seemed like the obvious way to go. This exposes more of the information to CSE, which we'll probably want to apply to pull constant loads for UBOs eventually (you might access 4 floats in a row, each of which would produce an oword block read of the same block). Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Don't clobber sampler message MRFs with subexpressions.	Kenneth Graunke	2012-08-06	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Consider a texture call such as: textureLod(s, coordinate, log2(...)) First, we begin setting up the sampler message by loading the texture coordinates into MRFs, starting with m2. Then, we realize we need the LOD, and go to compute it with: ir->lod_info.lod->accept(this); On Gen4-5, this will generate a SEND instruction to compute log2(), loading the operand into m2, and clobbering our texcoord. Similar issues exist on Gen6+. For example, nested texture calls: textureLod(s1, c1, texture(s2, c2).x) Any texturing call where evaluating the subexpression trees for LOD or shadow comparitor would generate SEND instructions could potentially break. In some cases (like register spilling), we get lucky and avoid the issue by using non-overlapping MRF regions. But we shouldn't count on that. Fixes four Piglit test regressions on Gen4-5: - glsl-fs-shadow2DGradARB-{01,04,07,cumulative} NOTE: This is a candidate for stable release branches. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52129 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965/fs: Factor out texcoord setup into a helper function.	Kenneth Graunke	2012-08-06	1	-0/+1
\| \| \| \| \| \| \| \| \|	With the textureRect support and GL_CLAMP workarounds, it's grown sufficiently that it deserves its own function. Separating it out makes the original function much more readable. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965/fs: Move message header and texture offset setup to generate_tex().	Kenneth Graunke	2012-08-06	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Setting the texture offset bits in the message header involves very specific hardware register descriptions. As such, I feel it's better suited for the lower level "generate" layer that has direct access to the weird register layouts, rather than at the fs_inst abstraction layer. This also parallels the approach I took in the VS backend. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Replace fs_visitor::kill_emitted with gl_fragment_program::UsesKill.	Paul Berry	2012-07-20	1	-1/+0
\| \| \| \| \| \| \|	The kill_emitted variable was duplicating the functionality of gl_fragment_program::UsesKill. There's no need for both. Reviewed-by: Eric Anholt <[email protected]>
*	i965/fs.h: Refactor tests for instructions modifying a register.	Eric Anholt	2012-07-18	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	There's one instance of a potential behavior change: propagate_constants may now propagate into a part of a vgrf after a different part of it was overwritten by a send that returns multiple registers. I don't think we ever generate IR that meets that condition, but it's something to note if we bisect behavior change to this. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Rename virtual_grf_next to virtual_grf_count.	Eric Anholt	2012-07-18	1	-1/+1
\| \| \| \| \| \| \|	"count" is a more useful name, since most of the time we're using it for looping over the variables. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Move class functions from the header to .cpp files.	Eric Anholt	2012-07-06	1	-278/+26
\| \| \| \| \| \| \|	Cuts compile time for brw_fs.h changes from 2.7s to .7s and reduces i965_dri.so size by 70k. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Move copy propagation test out to a separate function.	Eric Anholt	2012-07-03	1	-0/+4
\| \| \| \| \| \|	It's going to get more complicated in a moment. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Add FS_OPCODE_MOV_DISPATCH_TO_FLAGS to fragment shader backend.	Paul Berry	2012-07-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to compute centroid varyings correctly, the fragment shader needs to be able to load the current pixel/sample mask into a flag register. This patch adds an opcode to the fragment shader back-end to do this; the opcode gets translated into the instruction mov(1) f0<1>UW g1.14<0,1,0>UW { align1 WE_all } Since this instruction clobbers f0, instruction scheduling has to treat it the same as instructions that have a conditional modifier. Reviewed-by: Eric Anholt <[email protected]>
*	i965/msaa: Add backend support for centroid interpolation.	Paul Berry	2012-06-25	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch causes the fragment shader to be configured correctly (and the correct code to be generated) for centroid interpolation. This required two changes: brw_compute_barycentric_interp_modes() needs to determine when centroid barycentric coordinates need to be included in the pixel shader thread payload, and fs_visitor::emit_general_interpolation() needs to interpolate using the correct set of barycentric coordinates. Fixes piglit tests "EXT_framebuffer_multisample/interpolation {2,4} centroid-edges" on i965. Reviewed-by: Eric Anholt <[email protected]>
*	i965/fs: Refactor interpolation code to prepare for adding centroid support.	Paul Berry	2012-06-25	1	-0/+2
\| \| \| \|	Reviewed-by: Eric Anholt <[email protected]>
*	i965: Compute dFdy() correctly for FBOs.	Paul Berry	2012-06-22	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On i965, dFdx() and dFdy() are computed by taking advantage of the fact that each consecutive set of 4 pixels dispatched to the fragment shader always constitutes a contiguous 2x2 block of pixels in a fixed arrangement known as a "sub-span". So we calculate dFdx() by taking the difference between the values computed for the left and right halves of the sub-span, and we calculate dFdy() by taking the difference between the values computed for the top and bottom halves of the sub-span. However, there's a subtlety when FBOs are in use: since FBOs use a coordinate system where the origin is at the upper left, and window system framebuffers use a coordinate system where the origin is at the lower left, the computation of dFdy() needs to be negated for FBOs. This patch modifies the fragment shader back-ends to negate the value of dFdy() when an FBO is in use. It also modifies the code that populates the program key (brw_wm_populate_key() and brw_fs_precompile()) so that they always record in the program key whether we are rendering to an FBO or to a window system framebuffer; this ensures that the fragment shader will get recompiled when switching between FBO and non-FBO use. This will result in unnecessary recompiles of fragment shaders that don't use dFdy(). To fix that, we will need to adapt the GLSL and NV_fragment_program front-ends to record whether or not a given shader uses dFdy(). I plan to implement this in a future patch series; I've left FIXME comments in the code as a reminder. Fixes Piglit test "fbo-deriv". NOTE: This is a candidate for stable release branches. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Fix user-defined FS outputs with less than four components.	Kenneth Graunke	2012-06-05	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	OpenGL allows you to declare user-defined fragment shader outputs with less than four components: out ivec2 color; This makes sense if you're rendering to an RG format render target. Previously, we assumed that all color outputs had four components (like the built-in gl_FragColor/gl_FragData variables). This caused us to call emit_color_write for invalid indices, incrementing the output virtual GRF's reg_offset beyond the size of the register. This caused cascading failures: split_virtual_grfs would allocate new size-1 registers based on the virtual GRF size, but then proceed to rewrite the out-of-bounds accesses assuming that it had allocated enough new (contiguously numbered) registers. This resulted in instructions that accessed size-1 GRFs which register numbers beyond virtual_grf_next (i.e. registers that were never allocated). Finally, this manifested as live variable analysis and instruction scheduling accessing their temporary array with an out of bounds index (as they're all sized based on virtual_grf_next), and the program would segfault. It looks like the hardware's Render Target Write message requires you to send four components, even for RT formats such as RG or RGB. This patch continues to use all four MRFs, but doesn't bother to fill any data for the last few, which should be unused. +2 oglconforms. NOTE: This is a candidate for stable release branches. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965/gen6+: Add support for GL_ARB_blend_func_extended.	Eric Anholt	2012-05-23	1	-0/+1
\| \| \| \| \| \| \|	v2: Add support for gen6, and don't turn it on if blending is disabled. (fixes GPU hang), and note it in docs/GL3.txt Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Do more register coalescing by using the interference graph.	Eric Anholt	2012-05-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	By using the live variables code for determining interference, we can handle coalescing in the presence of control flow, which the other register coalescing path couldn't. Total instructions: 207184 -> 206990 74/1246 programs affected (5.9%) 33993 -> 33799 instructions in affected programs (0.6% reduction) There is a newerth shader that loses out, because of some extra MOVs that now get their dead-code nature obscured by coalescing. This should be fixed by doing better at dead code elimination.
*	Revert "i965/fs: Jump from discard statements to the end of the program when ↵	Eric Anholt	2012-05-14	1	-22/+0
\| \| \| \| \| \| \| \| \| \|	done." This reverts commit 31866308fcf989df992ace28b5b986c3d3770e90. Fixes piglit glsl-fs-discard-exit-3 and unigine tropics rendering. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Add support for copy propagation.	Eric Anholt	2012-05-14	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \|	We could do more by handling abs/negate and non-GRF sources, but this is a good start. Improves tropics performance 0.30% +/- .17% (n=43). shader-db results: Total instructions: 208032 -> 207184 60/1246 programs affected (4.8%) 23286 -> 22438 instructions in affected programs (3.6% reduction) Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Add a local common subexpression elimination pass.	Kenneth Graunke	2012-05-14	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Total instructions: 18210 -> 17836 49/163 programs affected (30.1%) 12888 -> 12514 instructions in affected programs (2.9% reduction) This reduces Lightsmark's "Scale down filter" shader from 395 instructions to 283, a whopping 28%. It also reduces register pressure significantly: the SIMD8 program now uses 29 registers instead of 101, giving us more than enough room for a SIMD16 program. v2: Add && !inst->conditional_mod to the "skip some instructions" check. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965/fs: Use a const reference in fs_reg::equals instead of a pointer.	Kenneth Graunke	2012-05-14	1	-14/+14
\| \| \| \| \| \| \| \| \| \|	This lets you omit some ampersands and is more idiomatic C++. Using const also marks the function as not altering either register (which was obvious, but nice to enforce). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965/fs: Fix regression in comparison handling from ANDs change.	Eric Anholt	2012-05-04	1	-0/+1
\| \| \| \| \| \| \|	I had fixed up the logic ops for delayed ANDing, but not equality comparisons on bools. Fixes new piglit fs-bool-less-compare-true. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48629
*	i965: Add basic block generator.	Eric Anholt	2012-04-19	1	-0/+4
\| \| \| \| \| \|	This takes the fs_inst list generated by the visitor, and generates a list of basic blocks with edges between them. This is a building block for data-flow analysis.
*	i965/fs: Try to avoid generating extra MOVs to do saturates.	Eric Anholt	2012-04-11	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \|	This change (before the previous two) produced a .23% +/- .11% performance improvement in Unigine Tropics at 1024x768 on IVB. Total instructions: 269270 -> 262649 614/2148 programs affected (28.6%) 179386 -> 172765 instructions in affected programs (3.7% reduction) v2: Move some of the logic of finding the instruction that produced the result of an expression tree to a helper.
*	i965/fs: Jump from discard statements to the end of the program when done.	Eric Anholt	2012-03-16	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From the GLSL 1.30 spec: The discard keyword is only allowed within fragment shaders. It can be used within a fragment shader to abandon the operation on the current fragment. This keyword causes the fragment to be discarded and no updates to any buffers will occur. Control flow exits the shader, and subsequent implicit or explicit derivatives are undefined when this control flow is non-uniform (meaning different fragments within the primitive take different control paths). v2: Don't emit the final HALT if no other HALTs were emitted. Reviewed-by: Kenneth Graunke <[email protected]> (v1)
*	i965/fs: Add a new fs_inst::regs_written function.	Kenneth Graunke	2012-02-15	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	Certain instructions write more than one register. Texturing, for example, returns 4 registers. (We set rlen to 4 even for TXS and float shadow sampling.) Some math functions return 2. Most return 1. The next commit introduces a use of this function. NOTE: This is a candidate for the 8.0 branch (dependency of a fix). Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
*	i965/fs: Add support for generating MADs.	Eric Anholt	2012-02-10	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Improves nexuiz performance 0.65% +/- .10% (n=5) on my gen6, and .39% +/- .11% (n=10) on gen7. No statistically significant performance difference on warsow (n=5, but only one shader has MADs). v2: Add support for MADs in 16-wide by using compression control. v3: Don't generate MADs when it will force an immediate to be moved to a temp. (it's not clear whether this is a win or not, but it should result in less questionable change to codegen compared to v2). Reviewed-by: Kenneth Graunke <[email protected]> (v2)
*	i965/fs: Fix rendering corruption in unigine tropics.	Eric Anholt	2012-01-30	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	We were allocating registers into the MRF hack region, resulting in sparkly renering in a few of the scenes. We could do better allocation by making an MRF class, having MRFs conflict with the corresponding GRFs, and tracking the live intervals of the "MRF"s and setting up the conflicts. But this is way easier for the moment. NOTE: This is a candidate for the 8.0 branch. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Rename texturing ops from FS_OPCODE to SHADER_OPCODE, except TXB.	Kenneth Graunke	2011-12-18	1	-5/+5
\| \| \| \| \| \| \| \|	We'll be reusing most of these for the VS shortly. The one exception is TXB (texturing with LOD bias), which is explicitly forbidden in the VS. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965/fs: Make register file enum 0 be the undefined register file.	Eric Anholt	2011-11-30	1	-6/+6
\| \| \| \| \| \| \| \| \| \|	In 6d874d0ee18b3694c49e0206fa519bd8b746ec24, I checked whether a register that had been stored was BAD_FILE (as opposed to a legitimate GRF), but actually the unset register was ARF NULL because it had been memset to 0. Finding BAD_FILE for unset values in debugging was my intention with that file, so make it the case more often by rearranging the enum. There was only one place we relied on the magic enum register_file to hardware register file correspondance anyway.
*	i965/fs: Add support for user-defined out variables.	Eric Anholt	2011-11-09	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Before, I was tracking the ir_variable * found for gl_FragColor or gl_FragData[]. Instead, when visiting those variables, set up an array of per-render-target fs_regs to copy the output data from. This cleans up the color emit path, while making handling of multiple user-defined out variables easier. v2: incorporate idr's feedback about ir->location (changes by Kenneth Graunke) Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Enable faster workaround-free math on Ivybridge.	Kenneth Graunke	2011-11-07	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to the documentation, Ivybridge's math instruction works in SIMD16 mode for the fragment shader, and no longer forbids align16 mode for the vertex shader. The documentation claims that SIMD16 mode isn't supported for INT DIV, but empirical evidence shows that it works fine. Presumably the note is trying to warn us that the variant that returns both quotient and remainder in (dst, dst + 1) doesn't work in SIMD16 mode since dst + 1 would be sechalf(dst), trashing half your results. Since we don't use that variant, we don't care and can just enable SIMD16 everywhere. The documentation also still claims that source modifiers and conditional modifiers aren't supported, but empirical evidence and study of the simulator both show that they work just fine. Goodbye workarounds. Math just works now. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
*	i965/gen6+: Parameterize barycentric interpolation modes.	Paul Berry	2011-10-27	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch modifies the fragment shader back-end so that instead of using a single delta_x/delta_y register pair to store barycentric coordinates, it uses an array of such register pairs, one for each possible intepolation mode. When setting up the WM, we intstruct it to only provide the barycentric coordinates that are actually needed by the fragment shader--that is computed by brw_compute_barycentric_interp_modes(). Currently this function returns just BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC, because this is the only interpolation mode we support. However, that will change in a later patch. Reviewed-by: Eric Anholt <[email protected]>
*	i965/fs: Fix comparisions with uint negation.	Eric Anholt	2011-10-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	The condmod instruction ends up generating garbage condition codes, because apparently the comparison happens on the accumulator value (33 bits for UD), not the truncated value that would be written. Fixes fs-op-neg-* Reviewed-by: Ian Romanick <[email protected]>
*	intel: Convert from GLboolean to 'bool' from stdbool.h.	Kenneth Graunke	2011-10-18	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I initially produced the patch using this bash command: for file in {intel,i915,i965}/*.{c,cpp,h}; do [ ! -h $file ] && sed -i 's/GLboolean/bool/g' $file && sed -i 's/GL_TRUE/true/g' $file && sed -i 's/GL_FALSE/false/g' $file; done Then I manually added #include <stdbool.h> to fix compilation errors, and converted a few functions back to GLboolean that were used in core Mesa's function pointer table to avoid "incompatible pointer" warnings. Finally, I cleaned up some whitespace issues introduced by the change. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chad Versace <[email protected]> Acked-by: Paul Berry <[email protected]>
*	mesa: Use gl_shader_program::_LinkedShaders instead of FragmentProgram	Ian Romanick	2011-10-07	1	-1/+2
\| \| \| \| \| \|	Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: Implement integer quotient and remainder math operations.	Kenneth Graunke	2011-10-02	1	-0/+2
\| \| \| \| \| \| \|	Signed-off-by: Kenneth Graunke <[email protected]> Tested-by: Ian Romanick <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965/fs: Split generate_math into gen4/gen6 and 1/2 operand variants.	Kenneth Graunke	2011-09-26	1	-1/+10
\| \| \| \| \| \| \| \| \| \|	This mirrors the structure Eric used in the new VS backend, and seems simpler. In particular, the math1/math2 split will avoid having to figure out how many operands there are, as this is already known by the caller. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965/fs: Implement texelFetch() on Ironlake and Sandybridge.	Kenneth Graunke	2011-09-19	1	-0/+1
\| \| \| \| \|	Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	i965: Fix Android build by removing relative includes	Chad Versace	2011-08-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Replace each occurence of #include "../glsl/.h" with #include "glsl/.h" Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Chad Versace <[email protected]>
*	i965: Avoid generating MOVs for most ir_assignment handling.	Kenneth Graunke	2011-08-29	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a port of vec4_visitor::try_rewrite_rhs_to_dst to fs_visitor. Not only is this technique less invasive and more robust, it also generates better code. Over and above the previous technique, this reduced instruction count in shader-db by 0.28% on average and 1.4% in the best case. In no case did this technique result in more code than the prior method. Signed-off-by: Kenneth Graunke <[email protected]> Signed-off-by: Eric Anholt <[email protected]>
*	i965/fs: Revert "Avoid generating MOVs for assignments for expressions."	Kenneth Graunke	2011-08-29	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 53c89c67f33639afef951e178f93f4e29acc5d53, along with the subsequent this->result = reg_undef additions it required. Both Eric and I agree that the way he did this is really fragile; if you forget to add this->result = reg_undef before calling accept(), it may end up using the same register for two separate things, breaking things in strange and mysterious ways. The next commit will port over the new VS backend's method for solving this problem, which is simpler, less intrusive, and still manages to avoid MOVs in the common case.
*	i965/fs: Implement textureSize (TXS) on Gen5+.	Kenneth Graunke	2011-08-23	1	-1/+2
\| \| \| \| \|	Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>