summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
* autoconf: pass -Wall to automakeDylan Noblesmith2012-04-292-6/+6
| | | | | | | And fix these warning that appear at autoreconf time: "`:='-style assignments are not portable" v2: Fix the recently-converted-to-automake r600.
* i965/fs: Fix FB writes that tried to use the non-existent m16 register.Kenneth Graunke2012-04-271-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | A little analysis shows that the worst-case value for "nr" is 17: - base_mrf = 2 ... 2 - header present (say gen == 5) ... 4 - aa_dest_stencil_reg (stencil test) ... 5 - SIMD16 mode: += 4 * reg_width ... 13 - source_depth_to_render_target ... 15 - dest_depth_reg ... 17 This resulted in us setting base_mrf to 2 and mlen to 15. In other words, we'd try to use m2..m16. But m16 doesn't exist pre-Gen6. Also, the instruction scheduler data structures use arrays of size 16, so this would cause us to access them out of bounds. While the debugger system routine may need m0 and m1, we don't use it today, so the simplest solution is just to move base_mrf back to 1. That way, our worst case message fits in m1..m15, which is legal. An alternative would be to fail on SIMD16 in this case, but that seems a bit unfortunate if there's no real need to reserve m0 and m1. Fixes new piglit test shaders/depth-test-and-write on Ironlake. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48218 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Fix texelFetchOffset()Eric Anholt2012-04-241-3/+23
| | | | | | It appears that when using 'ld' with the offset bits, address bounds checking happens before the offset is applied, so parts of the drawing in piglit texelFetchOffset() with a negative texcoord go black.
* i965/fs: Fix texelFetchOffset()Eric Anholt2012-04-241-6/+21
| | | | | | It appears that when using 'ld' with the offset bits, address bounds checking happens before the offset is applied, so parts of the drawing in piglit texelFetchOffset() with a negative texcoord go black.
* i965: Convert live interval computation to using live variable analysis.Eric Anholt2012-04-191-39/+26
| | | | | | | | | | | | | | | | | | | | Our previous live interval analysis just said that anything in a loop was live for the whole loop. If you had to spill a reg in a loop, then we would consider the unspilled value live across the loop too, so you never made progress by spilling. Eventually it would consider everything in the loop unspillable and fail out. With the new analysis, things completely deffed and used inside the loop won't be marked live across the loop, so even if you spill/unspill something that used to be live across the loop, you reduce register pressure. But you usually don't even have to spill any more, since our intervals are smaller than before. This fixes assertion failure trying to compile the shader for the "glyphy" text rasterier and piglit glsl-fs-unroll-explosion. Improves Unigine Tropics performance 1.3% +/- 0.2% (n=5), by allowing more shaders to be compiled in 16-wide mode.
* i965: Move the old live interval analysis code next to the new live vars code.Eric Anholt2012-04-192-122/+122
| | | | I'm about to replace the insides of this using the new analysis.
* i965: Add support for live variable analysis using dataflow analysis.Eric Anholt2012-04-193-0/+245
|
* i965: Add basic block generator.Eric Anholt2012-04-195-0/+392
| | | | | | This takes the fs_inst list generated by the visitor, and generates a list of basic blocks with edges between them. This is a building block for data-flow analysis.
* i965/fs: Suppress printing the whole loop in BRW_OPCODE_DO annotation.Eric Anholt2012-04-191-0/+2
|
* i965: Rename BRW_MAX_SURFACES to BRW_MAX_WM_SURFACES.Kenneth Graunke2012-04-182-4/+4
| | | | | | | | Now that we use separate binding tables for WM, VS, and GS, and have BRW_MAX_VS_SURFACES and BRW_MAX_GS_SURFACES macros, we really shouldn't have an unqualified BRW_MAX_SURFACES macro. It's confusing. Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Fix outdated comments about binding tables.Kenneth Graunke2012-04-181-12/+8
| | | | | | | | | They had a number of issues: - A paragraph states that we use a single binding table, but we don't. - We labelled the WM binding table diagram as SOL/WM. - The WM diagram had an "Only relevant to the WM" comment. Duh. Signed-off-by: Kenneth Graunke <[email protected]>
* nouveau: rework and simplify nv04/nv05 driver a bitBen Skeggs2012-04-148-300/+215
| | | | | | | | | | | | | | | TEXTURED_TRIANGLE and MULTITEX_TRIANGLE are both a bit special in that if you use any other graph object in the meantime they'll forget their state and spew a lovely METHOD_CNT error at you when you try to draw. The pre-newlib driver has a flush_notify() hook which does this state re-emit, and a number of random workarounds like extra flushes and state dirtying after various operations to solve this issue. I'm taking a slightly different approach to things instead, which has the nice side-effect of removing the divergent code-paths for ttri/mtri, the flush/dirty workarounds and the need for flush_notify. Also gives a few FPS boost in OA, yay.
* nouveau/vieux: switch to libdrm_nouveau-2.0Ben Skeggs2012-04-1446-1890/+1844
|
* i965: fix typoDylan Noblesmith2012-04-131-1/+1
| | | | | | | | | | | | | | | | | | Noticed by clang: brw_wm_surface_state.c:330:30: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides] [MESA_FORMAT_Z24_S8] = 0, ^ brw_wm_surface_state.c:326:30: note: previous initialization is here [MESA_FORMAT_Z24_S8] = 0, ^ No functionality change, since the array is declared static so it was zero-initialized by default. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: When the kernel lacks the LLC check, assume it's present on gen >= 6.Eric Anholt2012-04-111-3/+7
| | | | | The param wasn't added until drm-intel-next for 3.4, so we were missing our various LLC fast-paths.
* intel: Drop backwards compat code for not having libdrm with the LLC check.Eric Anholt2012-04-111-4/+0
|
* i965/fs: Avoid generating extra AND instructions on bool logic ops.Eric Anholt2012-04-111-22/+14
| | | | | | | | | | | | | | | | | | | | | | | By making a bool fs_reg only have a defined low bit (matching CMP output), instead of being a full 0 or 1 value, we reduce the ANDs generated in logic chains like: if (v_texcoord.x < 0.0 || v_texcoord.x > texwidth || v_texcoord.y < 0.0 || v_texcoord.y > 1.0) discard; My concern originally when writing this code was that we would end up generating unnecessary ANDs on bool uniforms, so I put the ANDs right at the point of doing the CMPs that otherwise set only the low bit. However, in order to use a bool, we're generating some instruction anyway (e.g. moving it so as to produce a condition code update), and those instructions can often be turned into an AND at that point. It turns out in the shaders I have on hand, none of them regress in instruction count: Total instructions: 262649 -> 262545 39/2148 programs affected (1.8%) 14253 -> 14149 instructions in affected programs (0.7% reduction)
* i965/fs: Try to avoid generating extra MOVs to do saturates.Eric Anholt2012-04-113-12/+54
| | | | | | | | | | | | This change (before the previous two) produced a .23% +/- .11% performance improvement in Unigine Tropics at 1024x768 on IVB. Total instructions: 269270 -> 262649 614/2148 programs affected (28.6%) 179386 -> 172765 instructions in affected programs (3.7% reduction) v2: Move some of the logic of finding the instruction that produced the result of an expression tree to a helper.
* i965: Stop lying about cpp and height of a stencil buffer.Paul Berry2012-04-105-45/+66
| | | | | | | | | | | | | | | | | | | | | | | When using a separate stencil buffer, i965 requires that the pitch of the buffer (in the 3DSTATE_STENCIL_BUFFER command) be specified as 2x the actual pitch. Previously this was accomplished by doubling the "cpp" and "pitch" values stored in the intel_region data structure, and halving the height. However, this was confusing, and it led to a subtle (but benign) bug: since a stencil buffer is W-tiled, its true height must be aligned to a multiple of 64; we were accidentally aligning its faux height to a multiple of 64, causing memory to be wasted. Note that for window system stencil buffers, the DDX also doubles the cpp and pitch values. To facilitate fixing this DDX server bug in the future, we fix the cpp and pitch values we receive from the X server only if cpp has the "incorrect" value of 2. Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]> v2: Clarify comments about the DDX.
* i965: Add support for sampling texture buffer objects on gen7+.Eric Anholt2012-04-094-1/+71
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add real support for texturing/rendering with MESA_FORMAT_RGBA8888_REV.Eric Anholt2012-04-091-5/+1
| | | | | | | | | | | This was hacked in in one place for EGL image stuff, but the right thing to do was just to provide the mapping from the mesa format to the native hardware format, which includes render target support. This turns out to be required for GL_ARB_texture_buffer_object, which sees data in this layout. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen7: Fix the /* ignored */ comment on constant surface setup.Eric Anholt2012-04-091-1/+1
| | | | | | | It turns out this field *is* used, and it's the stride between samples from the buffer. Discovered during TBO debugging. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove vestiges of function call support from the old VS backend.Kenneth Graunke2012-04-094-188/+0
| | | | | | | | This never worked. brwProgramStringNotify also explicitly rejects programs that use CAL and RET. So there's no need for this to exist. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i915: set SPRITE_POINT_ENABLE bit correctlyYuanhan Liu2012-04-094-12/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When SPRITE_POINT_ENABLE bit is set, the texture coord would be replaced, and this is only needed when we called something like glTexEnvi(GL_POINT_SPRITE, GL_COORD_REPLACE, GL_TRUE). And more, we currently handle varying inputs as texture coord, we would be careful when setting this bit and set it just when needed, or you will find the value of varying input is not right and changed. Thus we do set SPRITE_POINT_ENABLE bit only when all enabled tex coord units need do CoordReplace. Or fallback is needed to make sure the rendering is right. With handling the bit setup at i915_update_sprite_point_enable(), we don't need the relative code at i915Enable then. This patch would _really_ fix the webglc point-size.html test case and of course, not regress piglit point-sprite and glean-pointSprite testcase. NOTE: This is a candidate for stable release branches. v2: fallback just when all enabled tex coord units need do CoordReplace (Eric) v3: move the sprite point validate code at I915InvalidateState (Eric) v4: sprite point enable bit update based on _NEW_PROGRAM, too add relative _NEW-state comments to show what state is being used(Eric) Signed-off-by: Yuanhan Liu <[email protected]>
* i965: Actually upload sampler state pointers for the VS unit on Gen6.Kenneth Graunke2012-04-051-1/+1
| | | | | | | | | | | | | | We already program all the sampler state correctly, we just didn't give the GPU a pointer to it for the VS stage. Thus, any texturing other than texelFetch() wouldn't work. Fixes piglit test vs-textureLod-miplevels and 99 of oglconform's glsl-bif-tex subtests. NOTE: This is a candidate for the 8.0 branch. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glsl: Demote 'type' from ir_instruction to ir_rvalue and ir_variable.Kenneth Graunke2012-04-021-1/+1
| | | | | | | | | | | | | Variables have types, expression trees have types, but statements don't. Rather than have a nonsensical field that stays NULL in the base class, just move it to where it makes sense. Fix up a few places that lazily used ir_instruction even though they actually knew the particular subclass. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/aub: Dump a final bitmap from DestroyContext.Kenneth Graunke2012-04-023-29/+41
| | | | | | | | | | | | | | | Certain applications don't call SwapBuffers before exiting. Yet, we'd really like to see a bitmap containing the final rendered image even if they choose never to present it. In particular, Piglit tests (at least with -auto -fbo) fall into this category. Many of them failed to dump any images at all. Dumping one final image at context destruction time seems to work. We may wish to pursue a more elegant solution later. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* intel: add PCI IDs for Ivy Bridge GT2 server variantEugeni Dodonov2012-04-012-1/+4
| | | | | | | Those IDs are used by Bromolow. Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Eugeni Dodonov <[email protected]>
* intel: Add some PCI IDs for Haswell.Kenneth Graunke2012-03-302-2/+20
| | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eugeni Dodonov <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Set "Shader Channel Select" fields in Haswell's SURFACE_STATE.Kenneth Graunke2012-03-303-1/+37
| | | | | | | | | | | | These can be used to implement EXT_texture_swizzle without baking state-dependent swizzle instructions into the shader and forcing recompiles. For now, just set them to pass-through mode, so everything continues to work as it did on Ivybridge. We can optimize this later. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Fill in Sample Mask in Haswell's 3DSTATE_PS.Kenneth Graunke2012-03-302-0/+5
| | | | | | | We only need one sample, since we don't support multisampling yet. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Set "Stencil Buffer Enable" bit on Haswell.Kenneth Graunke2012-03-302-1/+5
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Set Line Stipple enable bit in 3DSTATE_SF for Haswell.Kenneth Graunke2012-03-302-0/+5
| | | | | | | Apparently this needs to be the same as in 3DSTATE_WM. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Update max VS/PS threads shift offsets for Haswell.Kenneth Graunke2012-03-304-4/+10
| | | | | | | These now start at bit 23 instead of bit 24/25. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Disable HiZ on Haswell for now.Kenneth Graunke2012-03-301-1/+1
| | | | | | | | Getting HiZ working means updating all the state packets for resolves and clears. It's not worth doing until we get the basics working. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Add initial IS_HASWELL() macros.Kenneth Graunke2012-03-303-5/+14
| | | | | | | | | | | For now, these all return 0, as I don't yet want to enable Haswell support. Eventually they will be filled in with proper PCI IDs. Also add an is_haswell field similar to is_g4x to make it easy to distinguish Gen7 and Gen7.5. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Avoid explicit accumulator operands in SIMD16 mode on Gen7.Kenneth Graunke2012-03-301-0/+3
| | | | | | | | | | | | | | | According to the BSpec ISA volume's "Accumulator Register" section: "[DevIVB] SIMD16 execution on dwords is not allowed when accumulator is explicit source or destination operand." Fixes piglit tests: - fs-multiply-const-ivec4 - fs-multiply-const-uvec4 - fs-multiply-ivec4-const - fs-multiply-uvec4-const Signed-off-by: Kenneth Graunke <[email protected]>
* intel: fix un-blanced map_refcount issueYuanhan Liu2012-03-281-4/+4
| | | | | | | | | | | | | | | This is a regression introduced by commit cdcfd5, which forget to increase the map_refcount for successfully-mapped region. Thus caused a wrong non-blanced map_refcount. This would fix the regression found in the two following webglc testcase on Pineview platform: texture-npot.html gl-max-texture-dimensions.html Cc: Anuj Phogat <[email protected]> Signed-off-by: Yuanhan Liu <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* intel: fix TFP at 16-bppDave Airlie2012-03-251-6/+11
| | | | | | | | don't ask why I had to debug this. tested to fix g-s and kwin at 16-bpp on Ironlake. Signed-off-by: Dave Airlie <[email protected]>
* intel: fix null dereference processing HiZ bufferDylan Noblesmith2012-03-221-0/+6
| | | | | | | | | | | Or technically, a near-null dereference. https://bugs.freedesktop.org/show_bug.cgi?id=46303 https://bugs.freedesktop.org/show_bug.cgi?id=46739 NOTE: This is a candidate for the 8.0 branch. Reviewed-by: Chad Versace <[email protected]>
* intel: Make use of the new GPU-unsynchronized map functionality in libdrm.Eric Anholt2012-03-211-1/+3
| | | | | | | | Improves Unigine Tropics performance at 1024x768 by 2.06236% +/- 0.50272% (n=11). Reviewed-by: Chris Wilson <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Drop the tracking of bo_map vs bo_map_gtt for unmapping.Eric Anholt2012-03-212-15/+2
| | | | | | drm_intel_bo_unmap() supports both in the current libdrm version. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Avoid flushing the batch for busy BOs for ARB_mbr with INVALIDATE_BUFFER.Eric Anholt2012-03-211-15/+20
| | | | | | | | | Unigine Tropics uses INVALIDATE_BUFFER and not UNSYNCHRONIZED to reset the buffer object when its streaming wraps. Don't penalize it by flushing the batch at the wrap point, just allocate a new BO and get to using it. Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Handle devid overrides using libdrm.Eric Anholt2012-03-211-19/+4
| | | | | Reviewed-by: Yuanhan Liu <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Ask libdrm to dump an AUB file if INTEL_DEBUG=aub.Eric Anholt2012-03-213-0/+37
| | | | | | | It also asks for BMPs in the aub file at SwapBuffers time. Reviewed-by: Yuanhan Liu <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* drirc: Add missing XML attributes that made the driconf application whine.Eric Anholt2012-03-211-4/+4
| | | | | | | These are used for pretty presentation of the application name in the UI. Tested-by: Kenneth Graunke <[email protected]>
* i965: Change the hiz-override env var to a driconf option.Eric Anholt2012-03-203-28/+13
| | | | | | | | | | | The force-enable option is dropped, now that the hardware we were concerned about has HiZ on by default. Now, instead of doing INTEL_HIZ=0 to test disabling hiz, you can set hiz=false. v2: Disable separate stencil on gen6 when HIZ is turned off. (previously, this had to be done manually in addition). Reviewed-by: Kenneth Graunke <[email protected]> (v1)
* i965: Drop the INTEL_FORCE_GS environment variable.Eric Anholt2012-03-201-5/+0
| | | | | | | | | This was a debug option during gen6 transform feedback bringup (and a similar one existed during gen4 bringup). However, it looks like we're done with that, and we don't anticipate it being used again, either for geometry shaders or transform feedback. Suggested by: Kenneth Graunke <[email protected]>
* intel: Drop the INTEL_NO_BLIT debug environment variable.Eric Anholt2012-03-201-5/+3
| | | | | | | This was added in the i915/i965 merge from the i915 driver, but I don't recall it ever being used since then. Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Drop the INTEL_STRICT_CONFORMANCE environment variable.Eric Anholt2012-03-203-44/+6
| | | | | | | | | If you want to test the graphics driver, you want to test it under the conditions that users will see, not some set of additional fallbacks. If you want to test swrast, run the swrast driver (or no_rast=true) instead. Reviewed-by: Kenneth Graunke <[email protected]>