aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965
Commit message (Collapse)AuthorAgeFilesLines
* intel: use automake conditionals for defining FEATURE_{ES1,ES2}Andreas Boll2013-05-011-1/+10
| | | | | | Removes the need of API_DEFINES. Reviewed-by: Matt Turner <[email protected]>
* i965/vs: Fix textureGrad() with shadow samplers on Haswell.Kenneth Graunke2013-05-011-2/+8
| | | | | | | | | | | | The shadow comparitor needs to be loaded into the Z component of the last DWord. Fixes es3conform's shadow_execution_vert and oglconform's shadow-grad advanced.textureGrad.1D tests on Haswell. NOTE: This is a candidate for stable branches. Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Lower textureGrad() for samplerCubeShadow.Kenneth Graunke2013-05-013-6/+27
| | | | | | | | | | | | | | | | | | | | | | | According to the Ivybridge PRM, Volume 4 Part 1, page 130, in the section for the sample_d message: "The r coordinate contains the faceid, and the r gradients are ignored by hardware." This doesn't match GLSL, which provides gradients for all of the coordinates. So we would need to do some math to compute the face ID before using sample_d. We currently don't have any code to do that. However, we do have a lowering pass that converts textureGrad to textureLod, which solves this problem. Since textureGrad on three components is sufficiently obscure, it's not a performance path. For now, only handle samplerCubeShadow; we need tests for samplerCube and samplerCubeArray. Fixes es3conform's shadow_comparison_frag test on Haswell. NOTE: This is a candidate for stable branches. Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Implement color clears using a simple shader in blorp.Eric Anholt2013-04-308-14/+344
| | | | | | | | | | | | | | | | | | | | | | | The upside is less CPU overhead in fiddling with GL error handling, the ability to use the constant color write message in most cases, and no GLSL clear shaders appearing in MESA_GLSL=dump output. The downside is more batch flushing and a total recompute of GL state at the end of blorp. However, if we're ever going to use the fast color clear feature of CMS surfaces, we'll need this anyway since it requires very special state setup. This increases the fail rate of some the GLES3conform ARB_sync tests, because of the initial flush at the start of blorp. The tests already intermittently failed (because it's just a bad testing procedure), and we can return it to its previous fail rate by fixing the initial flush. Improves GLB2.7 performance 0.37% +/- 0.11% (n=71/70, outlier removed). v2: Rename the key member, use the core helper for sRGB, and use BRW_MASK_* enums, fix comment and indentation (review by Paul). v3: Rewrite a comment, drop a silly temporary variable (review by Ken) Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Make a Mesa core function for sRGB render encoding handling.Eric Anholt2013-04-302-41/+15
| | | | | | | | v2: const-qualify ctx, and add a comment about the function (recommended by Brian and Kenneth). Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> (v1)
* i965: Don't flush the batch at the end of blorp.Eric Anholt2013-04-303-18/+19
| | | | | | | | | | Improves GLB2.7 performance 0.13% +/- 0.09% (n=104/105, outliers removed). More importantly, once color glClear()s are done through blorp in the next commit, this reduces regression in GLES3 conformance tests that rely on queueing up many glClear()s and having the GPU report being still busy in an ARB_sync query after that. Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Remove the last spans code!Eric Anholt2013-04-302-4/+0
| | | | | | | | The remaining bits happen to do nothing that _swrast_span_render_start()/finish() don't do. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* i965/fs: Print out the estimated cycle count in INTEL_DEBUG=wmEric Anholt2013-04-291-0/+5
| | | | | | | This could be used by shader-db for hopefully more accurate regression testing. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Allow LRPs with uniform registers.Eric Anholt2013-04-293-1/+11
| | | | | | | | | Improves GLB2.7 performance on my HSW by 0.671455% +/- 0.225037% (n=62). v2: Make is_valid_3src() a method of the fs_reg. (recommended by Ken) Reviewed-by: Matt Turner <[email protected]> (v1) Reviewed-by: Kenneth Graunke <[email protected]> (v1)
* i965: Disable Z16 on contexts that don't require it.Eric Anholt2013-04-291-1/+14
| | | | | | | | | | | | | | It appears that Z16 on Intel hardware is in fact slower than Z24, so people are getting surprisingly hurt when trying to use Z16 as a performance-versus-precision tradeoff, or when they're targeting GLES2 and that's all you get. GL 3.0+ have Z16 on the list of required exact format sizes, but GLES doesn't, so choose the better-performing layout in that case. Improves GLB 2.7 trex performance at 1920x1080 by 10.7% +/- 1.1% (n=3) on my IVB system. Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Fold the one last function intel_tex_format.c into the caller.Eric Anholt2013-04-292-2/+0
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move is_math/is_tex/is_control_flow() to backend_instruction.Kenneth Graunke2013-04-296-76/+49
| | | | | | | | | | | | | | | These are entirely based on the opcode, which is available in backend_instruction. It makes sense to only implement them in one place. This changes the VS implementation of is_tex() slightly, which now accepts FS_OPCODE_TXB and SHADER_OPCODE_LOD. However, since those aren't generated in the VS anyway, it should be fine. This also makes is_control_flow() available in the VS. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Don't try to use bogus interpolation modes pre-Gen6.Chris Forbes2013-04-301-9/+17
| | | | | | | | | | | | | | | | | | | | | Interpolation modes other than perspective-barycentric-pixel-center (and their associated coefficients in the WM payload) only exist in Gen6 and later. Unfortunately, if a varying was declared as `centroid`, we would blindly read the nonexistant values, and so produce all manner of bad behavior -- texture swimming, snow, etc. Fixes rendering in Counter-Strike Source and Team Fortress 2 on Ironlake. NOTE: This is a candidate for the 9.1 branch. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Tested-by: Jordan Justen <[email protected]>
* i965/vs: Fix order of source arguments to LRP.Matt Turner2013-04-281-1/+4
| | | | | | | The order or arguments matches DirectX, and is backwards from GLSL's mix() built-in. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=63983
* i965/vs: Add support for LRP instruction.Matt Turner2013-04-255-3/+22
| | | | | | | | | | Only 13 affected programs in shader-db, but they were all helped. total instructions in shared programs: 368877 -> 368851 (-0.01%) instructions in affected programs: 1576 -> 1550 (-1.65%) Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Add a function to fix-up uniform arguments for 3-src insts.Matt Turner2013-04-252-0/+25
| | | | | | | | | | | | | | | | | Three-source instructions have a vertical stride overloaded to 4, which prevents directly using vec4 uniforms as arguments. Instead we need to insert a MOV instruction to do the replication for the three-source instruction. With this in place, we can use three-source instructions in the vertex shader. While some thought needs to go into deciding whether its better to use a three-source instruction rather than a sequence of equivalent instructions (when one or more sources are uniforms or immediates), this will allow us to skip a lot of ugly lowering code and use the BFE and BFI2 instructions directly. Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Avoid recompiles for fragment clamping on non-clamping APIs.Eric Anholt2013-04-252-2/+2
| | | | | | | | | | Removes 75/78 state-dependent recompiles in GLB2.7 (the remaining 3 are due to FBO-rendering size predictions). We currently expose GL_ARB_color_buffer_float on GL core, so we may mis-predict there, but I'm about to send a patch for removing that silly extension in that case. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: report correct sample positionsChris Forbes2013-04-251-4/+4
| | | | | | | | | From low to high bits, the sample positions are packed y0,x0,y1,x1... Fixes arb_texture_multisample-sample-position piglit. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/gen7: fix encoding of (huge) surface size for BRW_SURFACE_BUFFERChia-I Wu2013-04-241-6/+10
| | | | | | | | | | | | | | | | | | Unlike GEN6, the bits of entry count are distributed like this width = (entry_count & 0x0000007f); /* bits [6:0] */ height = (entry_count & 0x001fff80) >> 7; /* bits [20:7] */ depth = (entry_count & 0x7fe00000) >> 21; /* bits [30:21] */ The maximum entry count is still limited to 2^27. This was noted while going over the PRM. No test is impacted, because 1<<20 (the bit that moved) is much larger than GL_UNIFORM_BLOCK_MAX_SIZE, GL_MAX_TEXTURE_BUFFER_SIZE, or MAX_*_UNIFORM_COMPONENTS. v2: Explain more in the commit message (by anholt) Reviewed-by: Eric Anholt <[email protected]>
* i965/gen7: fix 3DSTATE_LINE_STIPPLE_PATTERNChia-I Wu2013-04-241-3/+14
| | | | | | | | | | The inverse repeat count should taks up bits 31:15 and is in U1.16. Fixes the "Restarting lines within a single Begin/End block" subtest of piglit linestipple, and gets the other failing subtests much closer to passing. v2: Rewrite commit message with more detailed piglit info (by anholt) Reviewed-by: Eric Anholt <[email protected]>
* i965: fix SURFACE_STATE dumpingChia-I Wu2013-04-241-4/+4
| | | | | | Wrong fields were used when dumping width and height. Reviewed-by: Eric Anholt <[email protected]>
* i965: Remove strange comments about math functions.Matt Turner2013-04-241-3/+3
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove traces of nonexistent TAN math function.Matt Turner2013-04-242-2/+1
| | | | | | | Never existed? At least never supported. Doesn't appear in 965, G45, or ILK documentation. Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: don't flag _NEW_DEPTH in Begin/EndQuery if driver implements the functionsMarek Olšák2013-04-245-5/+10
| | | | | | | | | | | We don't want to set the flag for Gallium. I think only swrast needs the flag to be set for occlusion queries. v2: fix stats_wm updates in i965 Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* mesa: remove _NEW_PACKUNPACKMarek Olšák2013-04-241-1/+0
| | | | | | | | | | No driver checks the flag. Nobody uses it. I also removed the FLUSH_VERTICES calls, because PixelStorei has no effect on rendering. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* mesa: convert _NEW_RASTERIZER_DISCARD to a driver flagMarek Olšák2013-04-245-8/+12
| | | | | Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* mesa,i965: use NewDriverState to communicate TFB state changes with the driverMarek Olšák2013-04-246-14/+22
| | | | | | | | | | | | | | | | | | | | | | | _NEW_TRANSFORM_FEEDBACK is not used by core Mesa, so it can be removed. Instead, an new private flag is added to i965 to serve the same purpose. If you're new to this: * When creating a context. you can set private dirty flags in gl_context::DriverFlags, eg.: ctx->DriverFlags.NewStateX = BRW_NEW_STATE_X; * When StateX is changed, core Mesa does: ctx->NewDriverState |= ctx->DriverFlags.NewStateX; * When you have to draw, read and clear ctx->NewDriverState. * Pros: not touching NewState, the driver decides the mapping between GL states and hw state groups, unlimited number of flags in core Mesa (still limited number of flags in the driver though) Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Don't save value returned by emit() if it's not used.Matt Turner2013-04-221-11/+11
| | | | | | Probably a copy-n-paste mistake. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix a mistake in the comments for software counters.Kenneth Graunke2013-04-221-2/+2
| | | | | | | The code doesn't set brw->query.obj to NULL, it sets query->bo to NULL. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Apply CMP NULL {Switch} work-around to other Gen7s.Matt Turner2013-04-221-1/+4
| | | | | | | Listed in the restrictions section of CMP, but not on the work-arounds page. Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Introduce a globally-available minify() macro.Eric Anholt2013-04-211-4/+4
| | | | | | This matches u_minify()'s behavior, for consistency. Reviewed-by: Brian Paul <[email protected]>
* Revert "i965: Check reg.nr for BRW_ARF_NULL instead of reg.file."Matt Turner2013-04-181-1/+1
| | | | | | | | | This reverts commit ecdda414d361ab4430fd5747c9217687c1f3d63f. Commit was supposed to be a simple typo fix. Clearly needs more investigating. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=63688
* i965: Check reg.nr for BRW_ARF_NULL instead of reg.file.Matt Turner2013-04-171-1/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Implement work-around for CMP with null dest on Haswell.Matt Turner2013-04-171-0/+12
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Fix hypothetical use of uninitialized data in attribute_map[].Paul Berry2013-04-171-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes issue identified by Klocwork analysis: 'attribute_map' array elements might be used uninitialized in this function (vec4_visitor::lower_attributes_to_hw_regs). The attribute_map array contains the mapping from shader input attributes to the hardware registers they are stored in. vec4_vs_visitor::setup_attributes() only populates elements of this array which, according to core Mesa, are actually used by the shader. Therefore, when vec4_visitor::lower_attributes_to_hw_regs() accesses the array to lower a register access in the shader, it should in principle only access elements of attribute_map that contain valid data. However, if a bug ever caused the driver back-end to access an input that was not flagged as used by core Mesa, then lower_attributes_to_hw_regs() would access uninitialized memory, which could cause illegal instructions to get generated, resulting in a possible GPU hang. This patch makes the situation more robust by using memset() to pre-initialize the attribute_map array to zero, so that if such a bug ever occurred, lower_attributes_to_hw_regs() would generate a (mostly) harmless access to r0. In addition, it adds assertions to lower_attributes_to_hw_regs() so that if we do have such a bug, we're likely to discover it quickly. Reviewed-by: Jordan Justen <[email protected]>
* i965: Trim trailing whitespace in brw_defines.h.Eric Anholt2013-04-171-144/+144
| | | | | | It was all over the formats section I wanted to edit. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Shut up the last release build warning.Eric Anholt2013-04-121-0/+1
| | | | | | | I don't see a sensible value to use in this path, but we shouldn't ever hit this outside of developer new-texture-target enabling. Reviewed-by: Matt Turner <[email protected]>
* i965: Silence one more compile warning.Eric Anholt2013-04-121-0/+1
| | | | | | | | We don't want to store this thing in the class, and we do need the definition to be at the top of the function and held onto until the end here, so there's not much to do besides (void) reference it. Reviewed-by: Matt Turner <[email protected]>
* i965: Fix a warning in the release build.Eric Anholt2013-04-121-2/+1
| | | | | | | This was copy and pasted from can_reswizzle_dst(), and we can just fold it in instead to avoid the warning. Reviewed-by: Matt Turner <[email protected]>
* i965: Fix an unused variable warning in the release build.Eric Anholt2013-04-121-4/+2
| | | | | | | I think this actually clarifies what's going on in the asserts a bit, given how many regions we've got floating around. Reviewed-by: Matt Turner <[email protected]>
* i965: Fix an unused variable warning in the release build.Eric Anholt2013-04-121-1/+0
| | | | | | It's used in an assert, but we have this as a member of the class anyway. Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Fix some untriggered optimization bugs with uncompressed/sechalf.Eric Anholt2013-04-121-4/+4
| | | | | | | | | We have this support for firsthalf/sechalf instructions, which would be called in the !has_compr4 (aka original gen4) 16-wide case. We currently only support 16-wide for gen5+, so we weren't tripping over this, but it would have been a problem if we ever try to enable it. Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Add basic-block-level dead code elimination.Eric Anholt2013-04-122-0/+161
| | | | | | | | | | | | | | | | | This is a poor substitute for proper global dead code elimination that could replace both our current paths, but it was very easy to write. It particularly helps with Valve's shaders that are translated out of DX assembly, which has been register allocated and thus have a bunch of unrelated uses of the same variable (some of which get copy-propagated from and then left for dead). shader-db results: total instructions in shared programs: 1735753 -> 1731698 (-0.23%) instructions in affected programs: 492620 -> 488565 (-0.82%) v2: Fix comment typo Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Remove incorrect note of writing attr in centroid workaround.Eric Anholt2013-04-121-1/+1
| | | | | | | | This instruction doesn't update its IR destination, it just moves from payload to f0. This caused the dead code elimination pass I'm adding to dead-code-eliminate the first step of interpolation. Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Add a helper function for checking for partial register updates.Eric Anholt2013-04-125-22/+24
| | | | | | | | These checks were all over, and every time I wrote one I had to try to decide again what the cases were for partial updates. v2: Fix inadvertent reladdr check removal. Reviewed-by: Matt Turner <[email protected]>
* mesa: Add a macro to bitset for determining bitset size.Eric Anholt2013-04-122-3/+2
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965: Fix compiler warnings since the introduction of texture multisample.Eric Anholt2013-04-121-1/+1
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965/vs: Print error if vertex shader fails to compile.Matt Turner2013-04-111-0/+4
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: NULL check prog on shader compilation failure.Matt Turner2013-04-112-7/+11
| | | | | | Also change if (shader) to if (prog) for consistency. Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Don't hardcode DEBUG_VS in generic vec4 code.Paul Berry2013-04-115-14/+25
| | | | | | | | | | | | Since the vec4_visitor and vec4_generator classes are going to be re-used for geometry shaders, we can't enable their debug functionality based on (INTEL_DEBUG & DEBUG_VS) anymore. Instead, add a debug_flag boolean to these two classes, so that when they're instantiated the caller can specify whether debug dumps are needed. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>