summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
* mesa: implement DrawTransformFeedback from ARB_transform_feedback2Marek Olšák2011-12-153-7/+14
| | | | | | | | | | | | | | It's like DrawArrays, but the count is taken from a transform feedback object. This removes DrawTransformFeedback from dd_function_table and adds the same function to GLvertexformat (with the function parameters matching GL). The vbo_draw_func callback has a new parameter "struct gl_transform_feedback_object *tfb_vertcount". The rest of the code just validates states and forwards the transform feedback object into vbo_draw_func.
* i965: Drop separate stencil assertions in update_draw_buffer().Eric Anholt2011-12-141-16/+0
| | | | | | | The comment said they deserved to be in emit_depthbuffer, and at this point they were all there already. Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Simplify and touch up the FBO completeness test.Eric Anholt2011-12-141-18/+21
| | | | | | | | | Now that we have miptrees for everything, we can more easily test for !has_separate_stencil completeness. Also, test for whether the stencil rb is the wrong kind of format for separate stencil, or if we are trying to do packed to different images of a single miptree. Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Remove another renderbuffer allocation path.Eric Anholt2011-12-141-8/+4
| | | | | | | | | | Now there's the thing that CALLOCs and sets up window system vtable, and the thing that CALLOCs and sets up user renderbuffer vtable. The user renderbuffer vtable gets replaced later by intel_renderbuffer_update_wrapper for wrapped renderbuffers (things with name == ~0). Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Make the separate stencil RB storage path match texture more.Eric Anholt2011-12-141-76/+52
| | | | | | | There were too many things making intel_renderbuffer *s and tweaking their bits. Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Move S8 width/height alignment to miptree creation.Eric Anholt2011-12-143-55/+22
| | | | | | | | We were doing it in the caller in the renderbuffer code, but it was missed in the separate stencil creation for textures. Apparently our testing was using renderbuffers or pre-aligned sizes. Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Drop check for wrapped_depth in RB mapping.Eric Anholt2011-12-141-1/+1
| | | | | | | This used to be needed because irb->mt would be unset for fake packed depth/stencil, but no longer. Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Fix uninitialized values in debug output for renderbuffer mapping.Eric Anholt2011-12-141-1/+1
|
* radeon: stop using _DepthBuffer, _StencilBuffer fieldsBrian Paul2011-12-132-9/+8
| | | | Reviewed-by: Eric Anholt <[email protected]>
* nouveau: stop using _DepthBuffer, _StencilBuffer fieldsBrian Paul2011-12-136-13/+14
| | | | Reviewed-by: Eric Anholt <[email protected]>
* mesa,intel: use _mesa_image_offset() for PBOsnobled2011-12-081-2/+3
| | | | | | | | This avoids forming invalid pointers needlessly, which even if never dereferenced is undefined behavior. It also makes _mesa_validate_pbo_access() more comprehensible. Reviewed-by: Brian Paul <[email protected]>
* mesa/drivers: use new swrast renderbuffer functionsBrian Paul2011-12-0812-62/+74
| | | | Reviewed-by: Eric Anholt <[email protected]>
* mesa: rewrite accum buffer supportBrian Paul2011-12-082-2/+3
| | | | | | | | | | | | | Implemented in terms of renderbuffer mapping/unmapping and format packing/unpacking functions. The swrast and state tracker code for implementing accumulation are unused and will be removed in the next commit. v2: don't use memcpy() in _mesa_clear_accum_buffer() v3: don't allocate MAX_WIDTH arrays, be more careful with mapping flags Reviewed-by: Eric Anholt <[email protected]>
* mesa: remove the ctx->Driver.IsTextureResident() hookBrian Paul2011-12-081-1/+0
| | | | | | | No driver implemented this and we always returned "True" for residence queries. Reviewed-by: Ian Romanick <[email protected]>
* mesa: remove TextureMemCpy driver hookBrian Paul2011-12-081-1/+0
| | | | There's probably no reason to use a special version of memcpy() anymore.
* i965 gen6: Implement pass-through GS for transform feedback.Paul Berry2011-12-076-46/+208
| | | | | | | | | | | | | | | | | | | | | | In Gen6, transform feedback is accomplished by having the geometry shader send vertex data to the data port using "Streamed Vertex Buffer Write" messages, while simultaneously passing vertices through to the rest of the graphics pipeline (if rendering is enabled). This patch adds a geometry shader program that simply passes vertices through to the rest of the graphics pipeline. The rest of transform feedback functionality will be added in future patches. To make the new geometry shader easier to test, I've added an environment variable "INTEL_FORCE_GS". If this environment variable is enabled, then the pass-through geometry shader will always be used, regardless of whether transform feedback is in effect. On my Sandy Bridge laptop, I'm able to enable INTEL_FORCE_GS with no Piglit regressions. Reviewed-by: Kenneth Graunke <[email protected]> Acked-by: Eric Anholt <[email protected]>
* i965: Clean up misleading defines for DWORD 2 of URB_WRITE header.Paul Berry2011-12-075-24/+59
| | | | | | | | | | | | | R02_PRIM_END and R02_PRIM_START don't actually refer to bits in DWORD 2 of R0 (as the name, and comments in the code, would seem to indicate). Actually they refer to bits in DWORD 2 of the header for URB_WRITE messages. This patch renames the defines to reflect what they actually mean. It also addes a define URB_WRITE_PRIM_TYPE_SHIFT, which previously was just hardcoded in .c files. Reviewed-by: Kenneth Graunke <[email protected]>
* i965 gs: Clean up dodgy register re-use, at the cost of a few MOVs.Paul Berry2011-12-072-65/+111
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this patch, in the Gen4 and Gen5 GS, we used GRF 0 (called "R0" in the code) as a staging area to prepare the message header for the FF_SYNC and URB_WRITE messages. This cleverly avoided an unnecessary MOV operation (since the initial value of GRF 0 contains data that needs to be included in the message header), but it made the code confusing, since GRF 0 could no longer be relied upon to contain its initial value once the GS started preparing its first message. This patch avoids confusion by using a separate register ("header") as the staging area, at the cost of one MOV instruction. Worse yet, prior to this patch, the GS would completely overwrite the contents of GRF 0 with the writeback data it received from a completed FF_SYNC or URB_WRITE message. It did this because DWORD 0 of the writeback data contains the new URB handle, and that neds to be included in DWORD 0 of the next URB_WRITE message header. However, that caused the rest of the message header to be corrupted either with undefined data or zeros. Astonishingly, this did not produce any known failures (probably by dumb luck). However, it seems really dodgy--corrupting FFTID in particular seems likely to cause GPU hangs. This patch avoids the corruption by storing the writeback data in a temporary register and then copying just DWORD 0 to the header for the next message. This costs one extra MOV instruction per message sent, except for the final message. Also, this patch moves the logic for overriding DWORD 2 of the header (which contains PrimType, PrimStart, PrimEnd, and some other data that we don't care about yet). This logic is now in the function brw_gs_overwrite_header_dw2() rather than in brw_gs_emit_vue(). This saves one MOV instruction in brw_gs_quads() and brw_gs_quad_strip(), and paves the way for the Gen6 GS, which will need more complex logic to override DWORD 2 of the header. Finally, the function brw_gs_alloc_regs() contained a benign bug: it neglected to increment the register counter when allocating space for the "temp" register. This turned out not to have any effect because the temp register wasn't used on Gen4 and Gen5, the only hardware models (so far) to require a GS program. Now, all the registers allocated by brw_gs_alloc_regs() are actually used, and properly accounted for. Reviewed-by: Kenneth Graunke <[email protected]>
* i965 gen6: Allocate URB space for GSPaul Berry2011-12-073-12/+63
| | | | | | | | | | | | | | When the GS is not in use, the entire URB space is available for the VS. When the GS is in use, we split the URB space 50/50. The 50/50 split is probably not optimal--we'll probably want tune this for performance in a future patch. For example, in most situations, it's probably worth allocating more than 50% of the space to the VS, since VS space is used for vertex caching. But for now this is good enough. Based on previous work by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Set the maximum number of GS URB entries on Sandybridge.Kenneth Graunke2011-12-071-0/+2
| | | | | | | | | | | We never filled this in before because we didn't care. I'm skeptical these are correct; my sources indicate that both the VS and GS # of entries are 256 on both GT1 and GT2. I'm also loathe to change it and break stuff. Reviewed-by: Paul Berry <[email protected]>
* i965: Only convert if/else to conditional adds prior to Gen6.Paul Berry2011-12-071-2/+28
| | | | | | | | | | | | | | | | | | | | | | | | | Normally when outputting instructions in SPF (single program flow) mode, we convert IF and ELSE instructions to conditional ADD instructions applied to the IP register. On platforms prior to Gen6, flow control instructions cause an implied thread switch, so this is a significant savings. However, according to the SandyBridge PRM (Volume 4 part 2, p79): [Errata DevSNB{WA}] - When SPF is ON, IP may not be updated by non-flow control instructions. So we have to disable this optimization on Gen6. On later platforms, there is no significant benefit to converting flow control instructions to ADDs, so for the sake of consistency, this patch disables the optimization on later platforms too. The reason we never noticed this problem before is that so far we haven't needed to use SPF mode on Gen6. However, later patches in this series will introduce a Gen6 GS program which uses SPF mode. Reviewed-by: Kenneth Graunke <[email protected]>
* i965 gs: Remove unnecessary mapping of key->primitive.Paul Berry2011-12-072-16/+7
| | | | | | | | | | | | | Previously, GS generation code contained a lookup table that mapped primitive types POLYGON, TRISTRIP, and TRIFAN to TRILIST, mapped LINESTRIP to LINELIST, and left all other primitives unchanged. This was silly, because we never generate a GS program for those primitive types anyhow. This patch removes the unnecessary lookup table. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Set Ivybridge's is_array SURFACE_STATE bit.Kenneth Graunke2011-12-071-1/+2
| | | | | | | | | | | Fixes piglit tests fbo-array, fbo-depth-array, fbo-generatemipmap-array, and array-texture, as well as the array variants of my new textureSize and texelFetch tests. Not a candidate for 7.11 because EXT_texture_array wasn't supported. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Return BRW_DEPTHBUFFER_D32_FLOAT as the null-depthbuffer format.Kenneth Graunke2011-12-071-0/+3
| | | | | | | | | | | Fixes many crashes on Ivybridge due to upload_sf_state calling brw_depthbuffer_format without an actual depth buffer. This was a recent regression on master. +3992 piglits on Ivybridge. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* intel: Update comment about how depth/stencil miptrees are handled.Eric Anholt2011-12-071-6/+18
| | | | | | | This evolved over several commits, and I also wanted to document some new information about how we handle formats. Reviewed-by: Chad Versace <[email protected]>
* intel: Rely on miptree mapping for all renderbuffer maps.Eric Anholt2011-12-072-202/+21
| | | | | | | Now that all RBs have miptrees, and miptree mapping covered these last two code paths, consistently use them. Reviewed-by: Chad Versace <[email protected]>
* intel: Add support for LLC-cached reads of X-tiled miptrees using a blit.Eric Anholt2011-12-072-0/+83
| | | | | | | This mimics the MapRenderbuffer code, and should improve the performance of glGetTexImage(). v2: Fix broken error handling.
* intel: Handle MapRenderbuffer of fake packed depth/stencil using miptree maps.Eric Anholt2011-12-071-138/+2
| | | | | This gets the same performance win as the miptree maps did, and removes a pile of code duplication.
* intel: Track miptrees for fake packed depth/stencil renderbuffers.Eric Anholt2011-12-071-0/+10
| | | | | | | | | | | | | Right now the fake packed d/s RBs are creating two sub-renderbuffers with their own storage, and the hardware setup and the mapping code have been explicitly referencing them. By setting miptrees on them, we'll be able to make our renderbuffer code for fake packed depth/stencil more consistent with all our other renderbuffers. The interesting new behavior here is that there is now a mt with a non-depthstencil format (X8Z24) that has a stencil_mt field associated. This looks like it should be safe, and we'll need to be able to do this for floating point depth/stencil as well.
* intel: Make the fake packed depth/stencil mappings use a cached temporary.Eric Anholt2011-12-072-121/+129
| | | | | | | | | | | | | | | | | | | | | | Before, we had an uncached read of S8 to untile, then a RMW (so uncached penalty) of the packed S8Z24 to store the value, then the consumer would uncached read that once per pixel. If data was written to the map, we would then have to uncached read the written data back out and do the scatter to the tiled S8 buffer (also uncached access penalties, since WC couldn't actually combine). So 3 or 5 uncached accesses per pixel in the ROI (and we we were ignoring the ROI, so it was the whole image). Now we get an uncached read of S8 to untile, and an uncached read of Z. The consumer gets to do cached accesses. Then if data was written, we do streaming Z writes (WC success), and scattered S8 tiling writes (uncached penalty). So 2 or 3 uncached accesses per pixel in the ROI. This should be a performance win, to the extent that anybody is doing software accesses of packed depth/stencil buffers. Reviewed-by: Chad Versace <[email protected]>
* intel: Make intel_region_map return void *.Eric Anholt2011-12-072-4/+4
| | | | | | | | We don't gripe about void * arithmetic for our driver, and this prevents silly casting when assigning the result of mapping to non-byte types. Reviewed-by: Chad Versace <[email protected]>
* intel: Move separate-stencil s8 mapping logic to intel_miptree_map.Eric Anholt2011-12-072-113/+112
| | | | | | | We're going to want to reuse this logic in mapping of fake packed miptrees wrapping separate depth/stencil miptrees. Reviewed-by: Chad Versace <[email protected]>
* intel: Move the gtt-particular texture mapping logic to a helper function.Eric Anholt2011-12-071-49/+71
| | | | | | | This code will be incrementally moving to a model like intel_fbo.c's renderbuffer mapping with helper functions, as I move that code here. Reviewed-by: Chad Versace <[email protected]>
* intel: Make mapping of texture slices track the region of interest.Eric Anholt2011-12-072-5/+51
| | | | | | | This will be used for things like packed depth/stencil temporaries and making LLC-cached temporary mappings using blits. Reviewed-by: Chad Versace <[email protected]>
* intel: Move the teximage mapping logic to a miptree level/slice mapping.Eric Anholt2011-12-073-48/+109
| | | | | | | This will let us share teximage mapping logic with renderbuffer mapping, which has an intel_mipmap_tree but not a gl_texture_image. Reviewed-by: Chad Versace <[email protected]>
* intel: Only prefer separate stencil when we can do HiZ.Eric Anholt2011-12-072-4/+14
| | | | | | | | | | | | This required is_hiz_depth_format to start returning true on S8_Z24 as well, since that's the format we have here. The two previous callers are only calling it on non-depthstencil formats. This avoids us needing to have HiZ working on a new Z format immediately upon exposing the format (particularly painful for Z32_FLOAT_X24S8, which means all the fake packed depth/stencil paths). Reviewed-by: Chad Versace <[email protected]>
* i965: Set SURFACE_STATE vertical alignment bit on Ivybridge.Kenneth Graunke2011-12-061-0/+7
| | | | | | | | | | | | See intel_vertical_texture_alignment_unit() in intel_tex_layout.c; certain surface types require setting this to VALIGN_4. Analogous to commit dd0e46c4102976b7d317104ecd1bb565ac34613a on Gen6. Fixes piglit test fbo-generatemipmap-formats with the GL_ARB_depth_texture and GL_EXT_packed_depth_stencil arguments. Signed-off-by: Kenneth Graunke <[email protected]>
* radeon: add original r100 to the always tiled depth list.Dave Airlie2011-12-061-1/+1
| | | | | | According to Alex, he thinks r100 is also covered. Signed-off-by: Dave Airlie <[email protected]>
* osmesa: remove unused bpc variableFabio Pedretti2011-12-061-8/+0
| | | | Signed-off-by: Brian Paul <[email protected]>
* radeon/r200: add RV200 detiling + add an always tiled flagDave Airlie2011-12-063-37/+72
| | | | | | passes readpix sanity on the M7. Signed-off-by: Dave Airlie <[email protected]>
* r200: add Z16 depth detiling.Dave Airlie2011-12-061-0/+105
| | | | | | This passes readPixSanity with z16 visuals. Signed-off-by: Dave Airlie <[email protected]>
* r200: handle Z24 depth buffers correctlyDave Airlie2011-12-061-2/+2
| | | | | | The same detiling pattern applies to X8_Z24 as well. Signed-off-by: Dave Airlie <[email protected]>
* r200: fix cb microtile setupDave Airlie2011-12-061-0/+3
| | | | | | We shouldn't see this in buffers from the DDX but just in case. Signed-off-by: Dave Airlie <[email protected]>
* r200: enable tiling flags on blitter setup.Dave Airlie2011-12-061-0/+10
| | | | | | The r200 blitter also didn't set the correct tiling flags. Signed-off-by: Dave Airlie <[email protected]>
* i965: Fix incorrect comment about single program flow on Ironlake.Kenneth Graunke2011-12-051-1/+1
| | | | | | | | The code forces single program flow to be enabled on Ironlake, or equivalently, disables multiple program flow. The comment was reversed. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* radeon/r200: drop old span depth/stencil code.Dave Airlie2011-12-051-317/+0
| | | | | | This is no longer used with the new renderbuffer code. Signed-off-by: Dave Airlie <[email protected]>
* radeon/r200: add draw/stencil buffer detilingDave Airlie2011-12-052-0/+111
| | | | | | | This moves the detiling to the fbo mapping, r200 depth is always tiled, and we can't detile it with the blitter. Signed-off-by: Dave Airlie <[email protected]>
* radeon: fix warningsDave Airlie2011-12-051-2/+2
|
* radeon: use mesa renderbuffer accessors for depth for now.Dave Airlie2011-12-051-4/+5
| | | | Signed-off-by: Dave Airlie <[email protected]>
* radeon: add some tiling support for r100.Dave Airlie2011-12-052-0/+13
| | | | | | | | This sets up the tiling flags on the blitter. Fixes some piglit tests with tiling enabled. Signed-off-by: Dave Airlie <[email protected]>