summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri
Commit message (Collapse)AuthorAgeFilesLines
* Clean up .gitignore filesMatt Turner2013-01-107-8/+0
|
* intel: Clean up confusion between logical and physical surface dimensions.Paul Berry2013-01-098-153/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In most cases, the width, height, and depth of the physical surface used by the driver to implement a texture or renderbuffer is equal to the logical width, height, and depth exposed to the client through functions such as glTexImage3D(). However, there are two exceptions: cube maps (which have a physical depth of 6 but a logical depth of 1) and multisampled renderbuffers (which have larger physical dimensions than logical dimensions to allow multiple samples per pixel). Previous to this patch, we accounted for the difference between physical and logical surface dimensions at inconsistent places in the call graph (multisampling was accounted for in intel_miptree_create_for_renderbuffer(), and cubemaps were accounted for in intel_miptree_create_internal()). As a result, it wasn't always clear, when calling a miptree creation function, whether physical or logical dimensions were needed. Also, we weren't consistent about storing logical dimensions in the intel_mipmap_tree structure (we only did so in the intel_miptree_create_for_renderbuffer() code path, and we did not store depth). This patch refactors things so that intel_miptree_create_internal() is responsible for converting logical to physical dimensions and for storing both the physical and logical dimensions in the intel_mipmap_tree structure. As a result, all miptree creation functions interpret their arguments as logical dimensions, and both physical and logical dimensions are always available to functions that work with intel_mipmap_trees. In addition, it renames the fields in intel_mipmap_tree used to store the dimensions, so that it is clear from the name whether physical or logical dimensions are being referred to. This should fix the following bugs: - When creating a separate stencil surface for a depthstencil cubemap, we would erroneously try to convert the depth from 1 to 6 twice, resulting in an assertion failure. - When creating an MCS buffer for compressed multisampling, we used physical dimensions instead of logical dimensions, resulting in wasted memory. In addition, this should considerably simplify the implementation of ARB_texture_multisample, because it moves the code to compute the physical size of multisampled surfaces out of renderbuffer-only code. Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* intel: Add a force_y_tiling parameter to intel_miptree_create().Paul Berry2013-01-095-26/+34
| | | | | | | | | | | This allows intel_miptree_alloc_mcs() to force Y tiling for the MCS buffer. Previously we accomplished this by the hack of passing INTEL_MSAA_LAYOUT_CMS as the msaa_layout parameter, but that parameter is going to be going away soon. Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* intel: Move compute_msaa_layout earlier in file.Paul Berry2013-01-091-38/+41
| | | | | | | | | | No functional change. This patch moves the compute_msaa_layout() function earlier in intel_mipmap_tree.c so that it can be used by other functions in that file. Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965/fs: Fix struct vs. class in acp_entry definitions.Kenneth Graunke2013-01-081-1/+1
|
* mesa: Add ALIGN() macro to main/macros.h.Paul Berry2013-01-082-13/+1
| | | | | | | | | Previously this macro existed in 3 separate places, some inside the intel driver and some outside of it. It makes more sense to have it in main/macros.h Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Support GL_FIXED and packed vertex formats natively on Haswell+.Kenneth Graunke2013-01-072-12/+58
| | | | | | | Haswell and later support the GL_FIXED and 2_10_10_10_rev vertex formats natively, and don't need shader workarounds. Reviewed-by: Eric Anholt <[email protected]>
* i965: Add #defines for GL_FIXED vertex formats.Kenneth Graunke2013-01-071-0/+4
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: Add remaining #defines for packed vertex formats.Kenneth Graunke2013-01-071-0/+9
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: Use Haswell's sample_d_c for textureGrad with shadow samplers.Kenneth Graunke2013-01-074-5/+17
| | | | | | The new hardware actually just supports this now. Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Remove dead code from generate_uniform_pull_constant_load_gen7.Kenneth Graunke2013-01-071-2/+0
| | | | | | | generate_uniform_pull_constant_load_gen7() is only called on Gen7+, so the gen < 6 code is dead. Reviewed-by: Eric Anholt <[email protected]>
* intel: Fix copy-and-paste bug setting gl_constants::MaxSamplesIan Romanick2013-01-041-1/+1
| | | | | | | | | | gl_constants::MaxSamples is an integer, so setting it to 1.0 is just silly. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Fix glCompressedTexSubImage2D offsets for ETC textures.Paul Berry2013-01-041-0/+3
| | | | | | | | | | | This patch fixes intel_miptree_unmap_etc() (which decompresses ETC textures to linear) to pay attention to map->x and map->y when writing to the destination image. Previously these values were ignored, causing the xoffset and yoffset parameters passed to glCompressedTexSubImage2D() to be ignored. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Replace structs with bit-shifting for Gen7 SURFACE_STATE entries.Kenneth Graunke2013-01-036-360/+252
| | | | | | | | | | | | | | | | | Every generation except Gen7 creates SURFACE_STATE entries via a uint32_t array. Only Gen7 uses the older bitfield structure, which we moved away from because it was less efficient. Convert it for consistency. This reduces the compiled size of gen7_wm_surface_state.o by 2.86% in a release build. v2: Fix accidental use of BRW_SURFACE_WIDTH/HEIGHT in brw_state_dump.c; switch back to gen7_set_surface_mcs_info setting surf[6] directly (both per Eric's review comments). Acked-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* radeon/r200: Fix tcl cullingsmoki2013-01-032-18/+8
| | | | | Should fix: https://bugs.freedesktop.org/show_bug.cgi?id=57842
* i965: Add break statement at end of BRW_OPCODE_CONTINUE case.Vinson Lee2013-01-021-0/+1
| | | | | | | | Fixes missing break in switch defect reported by Coverity. Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Fix error reporting in _mesa_invalidate_pbo_{compressed_,}teximage.Paul Berry2013-01-021-2/+2
| | | | | | | | | | | The old error reporting was completely bogus, passing _mesa_error() a format string that didn't even match the remaining arguments. Also, in many cases the number of dimensions in the TexImage call was not preserved in the error message (e.g. an error in glTexImage2D was reported simply as an error in glTexImage). Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Fail to blit rather than assert on invalid pitch requirements.Kenneth Graunke2012-12-291-2/+2
| | | | | | | | | | | | | | | | | | | Dungeon Defenders hits TexImage()'s try_pbo_upload() path where image->Width == 2, which doesn't meet intelEmitCopyBlit's requirement that the pitch needs to be a multiple of 4. Since intelEmitCopyBlit can already fail for a myriad of other reasons, and it's not clear that other callers are immune to this failure mode, simply make it return false rather than assert. Fixes Dungeon Defenders on i965/Ivybridge. Now playable (aside from having to work around the EXT_bindable_uniform issue). NOTE: This is probably a candidate for the 9.0 branch. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel: Skip texture validation logic when nothing has changed.Eric Anholt2012-12-285-2/+30
| | | | | | Improves GLBenchmark 2.1 offscreen performance by 3.2% +/- 1.5% (n=52). Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Turn a test in miptree_match_image into an assert.Eric Anholt2012-12-281-2/+5
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Stop making a copy of non-builtin uniforms in ParameterValues[].Eric Anholt2012-12-281-3/+0
| | | | | | | | We don't need them now that our set of parameter pointers points at the GL core storage for them. This should save memory/bandwidth/overhead in uniform updates. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Consistently use nr_pull_params instead of NumParameters.Eric Anholt2012-12-282-4/+3
| | | | | | | | | | NumParameters used to be an upper bound on the number of vec4s to be uploaded, which was basically safe (unless your buffer was bound near the top of address space *and* you array indexed outside the buffer, in which case I think you might GPU hang). As I migrate the driver away from ParameterValues[], this is no longer true. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Reference the core GL uniform storage for non-builtin uniforms.Eric Anholt2012-12-282-54/+35
| | | | | | | Like in the FS, there's no reason to use an external copy if the ParameterValues[] relayout of it isn't the layout we need. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Reference the core GL uniform storage for non-builtin uniforms.Eric Anholt2012-12-283-44/+30
| | | | | | | There's no reason to use an external copy if the relayout in the external copy isn't serving us. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Remove the param_index/param_offset indirection.Eric Anholt2012-12-284-46/+10
| | | | | | | Now that ParameterValues doesn't change across the visitor, we don't need to go through this. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add asserts to check that we don't realloc ParameterValues.Eric Anholt2012-12-284-0/+19
| | | | | | | Things are even more restrictive than they used to be, so I've made mistakes in this area. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add texrect scale parameters before pointers to ParameterValues.Eric Anholt2012-12-283-0/+24
| | | | | | | | | | | | | If adding scale parameters during program compile caused a realloc of ParameterValues, then the driver uniform storage set up by _mesa_associate_uniform_storage() would point to potentially freed memory. Note that this uses TexturesUsed, which may change at runtime for GLSL when sampler uniforms change. This is a flaw in our handling of texrect in general, and not one I'm fixing currently. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix a typo in a comment.Eric Anholt2012-12-281-1/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add a note about a bug from the no-recompile-on-sampler-updates change.Eric Anholt2012-12-281-0/+4
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix border color handling for deprecated SNORM formats.Eric Anholt2012-12-261-2/+29
| | | | | | | | | | We don't have native hardware support for these, so they get promoted to RGBA, in which case we don't have hardware dealing with the channel swizzling for us. Fixes piglit EXT_texture_snorm/texwrap formats bordercolor (-swizzled). Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Start using HIZ for Z16 textures.Eric Anholt2012-12-261-0/+1
| | | | | | | | | | I had left this out for a long time because it regressed some depthstencil-render-miplevels cases when it was enabled. Now that the bugs causing those are fixed, there's nothing stopping us. Improves glbenchmark 2.1 offscreen performance by 7.3% +/- 2.8% (n=10). Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Use the parent miptree's format for setting up HiZ miptrees.Eric Anholt2012-12-261-1/+1
| | | | | | | This worked out before because the parent was always 4 bytes so it didn't affect the layout, but now we want to support Z16 too. Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Remove a couple of dead function prototypes.Eric Anholt2012-12-221-5/+0
| | | | Reviewed-by: Jordan Justen <[email protected]>
* i965: Add perf debug for depth/stencil alignment workaround.Eric Anholt2012-12-221-0/+16
| | | | | | | | Fixing these rendering bugs has been implicated in performance regressions (which may be unfixable), but at least knowing that it's happening should help diagnose those regressions. Reviewed-by: Jordan Justen <[email protected]>
* i965: Assert that relayout laid out something that won't need it again.Eric Anholt2012-12-221-0/+6
| | | | | | | | The ETC1 changes failed at this, so let's make sure it will be caught in testing next time. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Also fix validation of Z32F_S8 textures.Eric Anholt2012-12-221-0/+2
| | | | | | | | | This was caught by the assertion in the next commit. It fixes the remaining piglit depthstencil-render-miplevels cases, probably by avoiding broken stencil copies in the validation path. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Fix validation of ETC miptrees.Eric Anholt2012-12-221-5/+7
| | | | | | | | | | | When comparing to the teximage's format, we have to look at the format-the-mt-was-created-for not the format-actually-stored-in-the-mt. Improves glbenchmark 2.1 offscreen test performance 159% +/- 17% (n=3). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54582 Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* qi965: Add perf debug for texture relayout.Eric Anholt2012-12-221-0/+5
| | | | | | | | Relayout is expensive, so it's something developers (both us and others) should know about when it happens. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Fix hiz resolves getting stomped by depth offset validation.Eric Anholt2012-12-221-5/+5
| | | | | | | Fixes all the remaining non-Z32F_S8 depthstencil-render-miplevels tests in piglit. Reviewed-by: Jordan Justen <[email protected]>
* i965: Fix gl_VertexID when there are no other vertex inputs.Paul Berry2012-12-181-3/+3
| | | | | | | | | | | | brw_emit_vertices contains special case logic to handle the case where a vertex shader doesn't read any inputs. This special case logic was incorrectly activating in the case were the only vertex input is gl_VertexID. As a result, if a shader used gl_VertexID but used no other inputs, then all vertices got a gl_VertexID of zero. Fixes oglconform test "ubo-usage advanced.transform_feedback". Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Make a function is_transform_feedback_active_and_unpaused.Paul Berry2012-12-184-8/+7
| | | | | | | | | | The rather unweildy logic for determining this condition was repeated in a large number of places. This patch consolidates it to a single inline function. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: refactor _mesa_compute_max_transform_feedback_vertices from i965.Paul Berry2012-12-181-12/+4
| | | | | | | | | | | | | | | | | Previously, the i965 driver contained code to compute the maximum number of vertices that could be written without overflowing any transform feedback buffers. This code wasn't driver-specific, and for GLES3 support we're going to need to use it in core mesa. So this patch moves the code into a core mesa function, _mesa_compute_max_transform_feedback_vertices(). Reviewed-by: Ian Romanick <[email protected]> v2: Eliminate C++-style variable declarations, since these won't work with MSVC. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* mesa: Change args to vbo_count_tessellated_primitives.Paul Berry2012-12-181-1/+3
| | | | | | | | | | | | | | No functional change--this simply paves the way to allow futures patches to call vbo_count_tessellated_primitives() during error checking, before the _mesa_prim struct has been constructed. This will be needed for GLES3, which requires draw calls to fail if there is not enough space available in transform feedback buffers to accommodate the primitives to be drawn. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* drivers: compute version and then initialize exec tableJordan Justen2012-12-167-0/+50
| | | | | | | | This change forces the context version to be computed before initilizing the exec dispatch tables. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move BRW_MAX_GRF and similar defines to brw_reg.h.Kenneth Graunke2012-12-152-18/+17
| | | | | | These don't really belong in brw_structs.h. Reviewed-by: Eric Anholt <[email protected]>
* i965: Split struct brw_reg out from brw_eu.h into its own header.Kenneth Graunke2012-12-152-709/+778
| | | | | | | | | | | | | | | | | | | struct brw_instruction and the related instruction emitting code won't be useful on Gen8+, as the instruction encoding changed. However, the struct brw_reg code is still extremely valuable. While we're at it, fix up some style points: - s/GLuint/unsigned/g - s/GLint/int/g - s/GLshort/int16_t/g - s/GLushort/uint16_t/g - s/INLINE/inline/g - Replace tabs with spaces - Put return types on a separate line from the function name/parameters - Remove trailing whitespace - Remove extraneous whitespace around function parameters Reviewed-by: Eric Anholt <[email protected]>
* i965: Add missing autoconf bits so test_vec4_register_coalesce will buildIan Romanick2012-12-141-0/+3
| | | | | Signed-off-by: Ian Romanick <[email protected]> Tested-by: Eric Anholt <[email protected]>
* i965: Generalize VS compute-to-MRF for compute-to-another-GRF, too.Eric Anholt2012-12-143-61/+128
| | | | | | | | | No statistically significant performance difference on glbenchmark 2.7 (n=60). It reduces cycles spent in the vertex shader by 3.3% +/- 0.8% (n=5), but that's only about .3% of all cycles spent according to the fixed shader_time. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Extend opt_compute_to_mrf to handle limited "reswizzling"Eric Anholt2012-12-143-9/+113
| | | | | | | | | | | | | The way our visitor works, scalar expression/swizzle results that get stored in channels other than .x will have an intermediate MOV from their result in the .x channel to the real .y (or whatever) channel, and similarly for vec2/vec3 results. By knowing how to adjust DP4-type instructions for optimizing out a swizzled MOV, we can reduce instructions in common matrix multiplication cases. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Add a unit test for opt_compute_to_mrf().Eric Anholt2012-12-143-2/+133
| | | | | | | | | | | | | The compute-to-mrf code is really twitchy, and it's hard to construct GLSL testcases for it. This unit test is also really hard to work with (for example, if your instruction is removed by dead code elimination, you end up inspecting something irrelevant), but I did use it for debugging some of the commits to follow. I called it test_vec4_register_coalesce because the compute-to-mrf code is about to morph into that. Reviewed-by: Kenneth Graunke <[email protected]>