aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965
Commit message (Collapse)AuthorAgeFilesLines
* intel: Silence several "warning: unused parameter"Ian Romanick2011-09-091-4/+3
| | | | | | The intel_context and tiling parameters were not used by any if the i9[14]5_miptree_layout or the functions they call, and the tiling parameter was not used by brw_miptree_layout. Remove the unnecessary parameters.
* i965/vs: Allow copy propagation on GRFs.Eric Anholt2011-09-081-1/+6
| | | | | Further reduces instruction count by 4.0% in 40.7% of the vertex shaders.
* i965/vs: Clear tracked copy propagation values whose source gets overwritten.Eric Anholt2011-09-081-3/+12
| | | | | This only occurs for GRFs, and hasn't mattered until now because we only copy propagated non-GRFs.
* i965/vs: Add support for copy propagation of the UNIFORM and ATTR files.Eric Anholt2011-09-083-1/+72
| | | | Removes 2.0% of the instructions from 35.7% of vertex shaders in shader-db.
* i965/vs: Add constant propagation to a few opcodes.Eric Anholt2011-09-085-0/+281
| | | | | | | | | | | This differs from the FS in that we track constants in each destination channel, and we we have to look at all the swizzled source channels. Also, the instruction stream walk is done in an O(n) manner instead of O(n^2). Across shader-db, this reduces 8.0% of the instructions from 60.0% of the vertex shaders, leaving us now behind the old backend by 11.1% overall.
* i965/vs: Keep track of indices into a per-register array for virtual GRFs.Eric Anholt2011-09-082-0/+15
| | | | | | | | | | | | Tracking virtual GRFs has tension between using a packed array per virtual GRF (which is good for register allocation), and sparse arrays where there's an element per actual register (so the first and second column of a mat2 can be distinguished inside of an optimization pass). The FS mostly avoided the need for this second sparse array by doing virtual GRF splitting, but that meant that instances where virtual GRF splitting didn't work, instructions using those registers got much less optimized.
* i965/vs: Switch to the new VS backend by default.Eric Anholt2011-09-081-1/+1
| | | | | | | | | | | Now instead of env INTEL_NEW_VS=1 to get it, you need INTEL_OLD_VS=1 to not get it. While it's not quite to the same codegen efficiency as the old backend, it is not regressing piglit on G965 and G45, and actually fixing bugs on gen6, and the remaining codegen quality regressions all appear tractable. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Add support for overflowing the number of available push constants.Eric Anholt2011-09-083-0/+87
| | | | | | | | Fixes glsl-vs-uniform-array-4. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33742 Reviewed-by: Ian Romanick <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965/vs: Pack uniform registers before optimizationEric Anholt2011-09-081-1/+1
| | | | | | | | | We don't expect uniform accesses to generally go away from being dead code at this point, and we will want to have uniforms packed before spilling them out to pull constants when we are forced to do that. Reviewed-by: Ian Romanick <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965/vs: When failing due to lack of spilling, don't continue on.Eric Anholt2011-09-081-0/+1
| | | | | | | | Fixes assertion failure from double-free in oglc glsl-arrayobject constructor.declaration.structure Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Fix variable indexed array access with more than one array.Eric Anholt2011-09-081-1/+1
| | | | | | | | | The offset to the arrays after the first was mis-scaled, so we'd go access off the end of the surface and read 0s. Fixes glsl-vs-uniform-array-3. Reviewed-by: Ian Romanick <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965/vs: Add annotation to more of the URB write.Eric Anholt2011-09-082-1/+5
| | | | | | | | While we had nice debug output for most of the instruction stream, it was terminated by a series of anonymous MOVs and a send. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel: add support for __DRI_IMAGE_FORMAT_ABGR8888Chia-I Wu2011-09-091-0/+4
| | | | | | | | | | | It maps to MESA_FORMAT_RGBA8888_REV. Surfaces of the format can only be sampled from but not render to. Only i915 is tested. Reviewed-by: Eric Anholt <[email protected]> [olv: add a check in intel_image_target_renderbuffer_storage]
* i965/fs: Implement ir_u2f opcode.Kenneth Graunke2011-09-071-1/+1
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Fix disassembly for intdiv/intmod math functions.Kenneth Graunke2011-09-071-2/+2
| | | | | | | | The opcodes and strings were reversed. Quotient means division, and modulus means remainder. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Use proper texture alignment units for cubemaps on Gen5+.Kenneth Graunke2011-09-071-1/+4
| | | | | | | | | In particular, S3TC compressed textures need align_h == 4. Fixes skybox errors in Quake 4 and FEAR. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34628 Signed-off-by: Kenneth Graunke <[email protected]>
* i965/vs: Fix point size handling on gen4.Eric Anholt2011-09-061-4/+5
| | | | | | Fixes glsl-vs-point-size. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Use write commits on scratch writes in pre-gen6.Eric Anholt2011-09-061-2/+22
| | | | | | | This is required to ensure ordering between reads and writes within a thread. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Fix setup of scratch space pointer on pre-gen6.Eric Anholt2011-09-061-0/+10
| | | | | | | We were failing to relocate, so on the first draw run our scratch would tend to get written to 0x0. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Fix message setup for array read/writes on pre-gen6.Eric Anholt2011-09-061-18/+14
| | | | | | | | We were passing an MRF as the source argument, instead of using the implied move and putting the MRF number in the proper place in the instruction encoding. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Fix constant-indexed array read/write addresses on pre-gen6.Eric Anholt2011-09-061-1/+1
| | | | | | The second vertex was getting a garbage index. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Add support for vector comparison ops resulting in bool cond codes.Eric Anholt2011-09-062-21/+33
| | | | | | Fixes a giant pile of VS tests on gen4. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Make pre-gen6 math operate in vector mode instead of scalar.Eric Anholt2011-09-061-1/+1
| | | | | | | | | | | On the old backend, we used scalar mode because Mesa IR math is result.xyzw = math(op0.xxxx), which matched up well. However, in GLSL IR we do things like result.xy = math(op0.xy), so we want vector mode. For the common case of result.x = math(op0.x), performance will be the same (no cost for un-executed channels), though result.xyzw = math(op0.xxxx) would be worse. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Fix copy-and-paste disaster in pre-gen6 POW support.Eric Anholt2011-09-061-5/+0
| | | | | | Fixes vs-pow-float-float and friends. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Fix gen4 comparisons used for predication.Eric Anholt2011-09-061-1/+4
| | | | | | | When we tried to retype a brw_null_reg() in CMP(), the retyping didn't take effect because HW_REG just ignores the type field. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Fix GPU hangs in shaders with large virtual GRFs pre-gen6.Eric Anholt2011-09-061-1/+2
| | | | | | | If you get your total GRF count wrong, you write over some other shader's g0, and the GPU fails shortly thereafter. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: add casts to silence int/enum conversion warningsBrian Paul2011-09-061-2/+2
|
* mesa: put _mesa_ prefix on vert_result_to_frag_attrib()Brian Paul2011-09-064-5/+5
|
* i965: Remove two_side_color from brw_compute_vue_map().Paul Berry2011-09-0612-22/+11
| | | | | | | | | | | Since we now lay out the VUE the same way regardless of whether two-sided color is enabled, brw_compute_vue_map() no longer needs to know whether two-sided color is enabled. This allows the two-sided color flag to be removed from the clip, GS, and VS keys, so that fewer GPU programs need to be recompiled when turning two-sided color on and off. Reviewed-by: Eric Anholt <[email protected]>
* i965: For GEN6+, always make front/back colors adjacent in VUE.Paul Berry2011-09-061-16/+12
| | | | | | | | | | | | | | | | | | When doing two-sided color on GEN6+, we use the SF unit's INPUTATTR_FACING mode to cause front colors to be used on front-facing triangles, and back colors to be used on back-facing triangles. This mode requires that the front and back colors be adjacent in the VUE. Previously, we would only place front and back colors adjacent in the VUE when two-sided color was enabled. Now we place them adjacent in the VUE whether two-sided color is enabled or not. (We still only swizzle the colors when two-sided color is enabled, so there should be no user-visible change). This simplifies the implementation of the VUE map and reduces the amount of code that is dependent on two-sided color mode. Reviewed-by: Eric Anholt <[email protected]>
* i965: GS: Use the VUE map to compute URB size.Paul Berry2011-09-062-17/+15
| | | | | | | | | | | The previous computation had two bugs: (a) it used a formula based on Gen5 for Gen6 and Gen7 as well. (b) it failed to account for the fact that PSIZ is stored in the VUE header. Fortunately, both bugs caused it to compute a URB size that was too large, which was benign. This patch computes the URB size directly from the VUE map, so it gets the result correct in all circumstances. Reviewed-by: Eric Anholt <[email protected]>
* i965: clip: Remove no-longer-needed variables.Paul Berry2011-09-062-33/+1
| | | | | | | | The variables offset[], idx_to_attr[], nr_bytes, nr_attrs, and header_regs were all serving purposes which are now served by the VUE map. Reviewed-by: Eric Anholt <[email protected]>
* i965: clip: Remove assumption about VUE header from brw_clip_interp_vertex()Paul Berry2011-09-061-5/+8
| | | | | | | | | | | Previously, brw_clip_interp_vertex() iterated only through the "non-header" elements of the VUE when performing interpolation (because header elements don't need interpolation). This code now refers exclusively to the VUE map to figure out which elements need interpolation, so that brw_clip_interp_vertex() doesn't need to know the header size. Reviewed-by: Eric Anholt <[email protected]>
* i965: clip: Change computation of nr_regs to use VUE map.Paul Berry2011-09-061-5/+5
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: clip: Convert computations to ..._to_offset() for clarity.Paul Berry2011-09-063-19/+51
| | | | | | | | This patch replaces some ad-hoc computations using ATTR_SIZE and the offset[] array to use the VUE map functions brw_vert_result_to_offset() and brw_vue_slot_to_offset(). Reviewed-by: Eric Anholt <[email protected]>
* i965: clip: Add a function to determine whether a vert_result is in use.Paul Berry2011-09-063-9/+22
| | | | | | | | Previously we would examine the offset[] array (since an offset of 0 meant "not in use"). This paves the way for removing the offset[] array. Reviewed-by: Eric Anholt <[email protected]>
* i965: clip: Rework brw_clip_interp_vertex() to use the VUE map.Paul Berry2011-09-061-5/+5
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: clip: Modify brw_clip_interp_vertex() to use the VUE map.Paul Berry2011-09-061-2/+2
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: clip: Move header_regs into brw_clip_compile.Paul Berry2011-09-062-5/+5
| | | | | | This makes header_regs available for computing VUE offsets within clip code. Reviewed-by: Eric Anholt <[email protected]>
* i965: clip: Modify brw_clip_tri_alloc_regs() to use the VUE map.Paul Berry2011-09-061-2/+5
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: clip: Move hpos_offest and ndc_offset into local functions.Paul Berry2011-09-066-17/+29
| | | | | | | | | The offsets within the VUE of HPOS and NDC are needed only in a few auxiliary clipping functions. This patch moves computation of those offsets into the functions that need them, and does the computation using the VUE map. Reviewed-by: Eric Anholt <[email protected]>
* i965: clip: rename header_position_offset to the more correct ndc_offset.Paul Berry2011-09-064-4/+4
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: clip: Add VUE map computation to clip stage for Gen4-5.Paul Berry2011-09-062-1/+7
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: SF: Change gen{6,7}_sf_state.c to compute URB read length based on VUE ↵Paul Berry2011-09-062-8/+24
| | | | | | map. Reviewed-by: Eric Anholt <[email protected]>
* i965: SF: Move outputs_written to a local variable for clarity.Paul Berry2011-09-062-4/+6
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: SF: New implementation of get_attr_override using the VUE map.Paul Berry2011-09-063-47/+78
| | | | | | | | This patch changes get_attr_override() (which computes the relationship between vertex shader outputs and fragment shader inputs) to use the VUE map. Reviewed-by: Eric Anholt <[email protected]>
* i965: SF: Remove unnecessary variables.Paul Berry2011-09-062-6/+2
| | | | | | | | | | This patch removes the variables nr_attrs and nr_setup_attrs, whose purpose is now being served by the VUE map. nr_attr_regs and nr_setup_regs are still needed, however they are now computed using the VUE map rather than by counting the number of vertex shader outputs (which caused subtle bugs when gl_PointSize was written). Reviewed-by: Eric Anholt <[email protected]>
* i965: SF: Stop using nr_setup_attrs in compute_masks.Paul Berry2011-09-061-1/+1
| | | | | | | | Previously, the SF used nr_setup_attrs to determine whether it was looking at the last element of the VUE. Changed this code to use the VUE map. Reviewed-by: Eric Anholt <[email protected]>
* i965: SF: Remove attr_to_idx and idx_to_attr.Paul Berry2011-09-062-13/+1
| | | | | | | | These data structures were serving the same purpose as the VUE map, but were buggy. Now that the code has been transitioned to use the VUE map, they are not needed. Reviewed-by: Eric Anholt <[email protected]>
* i965: SF: Change calculate_masks to use the VUE map.Paul Berry2011-09-061-4/+4
| | | | | | | | Previously, SF code used the idx_to_attr[] array to compute the location of entries in the VUE map. This array didn't properly account for gl_PointSize. Now we use the VUE map directly. Reviewed-by: Eric Anholt <[email protected]>