summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* intel/hsw: Enable hiz (v2)Chad Versace2013-04-102-2/+51
| | | | | | | | | | | | | | | | | | | Enable hiz by setting intel_context::has_hiz. However, to work around a hardware bug, we selectively enable hiz for only nicely aligned miptree slices. No Piglit regressions on Haswell 0x0d26 rev07 when based atop mesa-master-4ad3601. Improves the performance of GLB27_TRex_C24Z16_FixedTimeStep by 18.52% (hsw-0x0d26-rev07; kernel-3.9.0-rc1; GLBenchmark 2.7.0 Release a68901; samples=3). v2: Replace the check for IS_HASWELL(devid) in intel_miptree_slice_has_hiz() with a conditional set of has_hiz. [for anholt] Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965: Remove brw_context::depthstencil::hiz_mtChad Versace2013-04-102-3/+0
| | | | | | | | After recent refactorings, the field is written but no longer read. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* intel: Replace checks for hiz_mt with intel_has*hiz()Chad Versace2013-04-108-40/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When appropriate, replace each check `hiz_mt != NULL` with either a call to intel_miptree_slice_has_hiz() or intel_renderbuffer_has_hiz(). No behavioral change. This prepares for selectively enabling hiz on individual miptree slices for Haswell. This refactoring had several side effects. 1. To prevent new warnings about discarding the const qualifier, I removed 'const' from some variable declarations in intel_validate_framebuffer(). The alternative was to add const qualifiers to multiple function signatures in the intel_renderbuffer_has_hiz call graph. Since the dominant convention in the Intel code is to not qualify function parameters as const, I chose to remove rather than add const qualifiers. 2. I changed the signature of brw_emit_depth_stencil_hiz() by replacing `struct intel_mipmap_tree *hiz_mt` with `bool hiz`. The function used hiz_mt mostly as a boolean indicator of the presence of hiz, so the signature change is consistent with the patch's goal. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965: Change signature of brw_get_depthstencil_tile_masks()Chad Versace2013-04-104-3/+16
| | | | | | | | | | Add new parameters `depth_level` and `depth_layer`, which specify depth miptree's slice of interest. A following patch will pass the new parameters through to intel_miptree_slice_has_hiz(). Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965/blorp: Add fields brw_blorp_mip_info::level,layerChad Versace2013-04-102-0/+15
| | | | | | | | | The new fields define the 2D miptree slice to be used. A following patch will pass the new fields through to intel_miptree_slice_has_hiz(). Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* intel: Add field intel_mipmap_slice::has_hizChad Versace2013-04-104-2/+44
| | | | | | | | | | | | | | | | | On Haswell, HiZ will selectively be enabled on individual miptree slices to workaround a hardware bug. The new field 'has_hiz' indicates if HiZ is enabled for a given slice. Also add two new accessor functions for this field. intel_miptree_slice_has_hiz intel_renderbuffer_has_hiz The new field and accessor functions are not yet used. Also, this patch introduces no behavioral change because, in this patch, intel_miptree_alloc_hiz() sets has_hiz for all slices. Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965/blorp: Align rectangle primitive for hiz opsChad Versace2013-04-101-0/+29
| | | | | | | | | | The hardware docs and the simulator require that the rectangle primitive emitted during fast depth clears and hiz resolves must be aligned to 8x4 pixels. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965/vs: Use GRFs for pull constant offsets on gen7.Eric Anholt2013-04-106-22/+56
| | | | | | | | | | | | | This allows the computation of the offset to get written directly into the message source. shader-db results: total instructions in shared programs: 3308390 -> 3283025 (-0.77%) instructions in affected programs: 442998 -> 417633 (-5.73%) No difference in GLB2.7 low res (n=9). Reviewed-by: Matt Turner <[email protected]>
* i965/vs: When asked to make a dst_reg for a src.xxxx, just write to src.x.Eric Anholt2013-04-101-1/+8
| | | | | | | | | | | | | We have several places in our pull constant handling where we make a temporary src_reg for an int, and then turn it into a dst. In doing so, we were writing to the dst.xyzw, so we never register coalesced it with a later mov from dst.x to real_dst.x. These extra channels written would be removed if we had channel-wise DCE in the backend, but we don't. Fix it for now by just not writing these extra channels that won't get used. Reviewed-by: Matt Turner <[email protected]>
* i965/gen6: Reduce updates of transform feedback offsets with HW contexts.Eric Anholt2013-04-101-1/+1
| | | | | | | | | | | | The software-tracked transform feedback offsets (svbi_0_starting_index) are incorrect in the presence of primitive restart, so we were actually updating it with a bogus value if the batch wrapped and we emitted the packet again during a single transform feedback. By reducing state emission, we avoid the bug. Fixes piglit OpenGL 3.1/primitive-restart-xfb flush Reviewed-by: Paul Berry <[email protected]> NOTE: This is a candidate for the 9.1 branch.
* i965/gen7: Skip resetting SOL offsets at batch start with HW contexts.Eric Anholt2013-04-102-6/+21
| | | | | | | | | | | | The software-tracked transform feedback offsets (svbi_0_starting_index) are incorrect in the presence of primitive restart, so we can't reliably compute offsets for our buffer pointers after a batch flush. Thanks to HW contexts, our transform feedback offsets are now saved, so we can just keep using the ones from before the batch wrap. Fixes piglit OpenGL 3.1/primitive-restart-xfb flush Reviewed-by: Paul Berry <[email protected]> NOTE: This is a candidate for the 9.1 branch.
* st/mesa: remove #if FEATURE_GL/ES testsBrian Paul2013-04-091-7/+0
| | | | Reviewed-by: Jordan Justen <[email protected]>
* mesa: remove old comment about FEATURE_GLBrian Paul2013-04-091-2/+1
| | | | Reviewed-by: Jordan Justen <[email protected]>
* mesa: remove #ifdef FEATURE_ES2, add some comments insteadBrian Paul2013-04-091-2/+9
| | | | Reviewed-by: Jordan Justen <[email protected]>
* st/mesa: remove #include mfeatures.hBrian Paul2013-04-0924-24/+0
| | | | | | None of these were needed. Reviewed-by: Jordan Justen <[email protected]>
* mesa: update derived framebuffer state in GetMultisamplefvMarek Olšák2013-04-101-0/+5
| | | | | | This makes sure that ctx->DrawBuffer->Visual.samples is up-to-date. Reviewed-by: Eric Anholt <[email protected]>
* mesa: fix glGet queries depending on derived framebuffer state (v2)Marek Olšák2013-04-102-7/+26
| | | | | | | | | | | | | "ctx->DrawBuffer->Visual" might be invalid if (NewState &_NEW_BUFFERS) != 0. v2: also fix: - RGBA_INTEGER_MODE_EXT - RGBA_FLOAT_MODE_ARB (also check API support) - FRAMEBUFFER_SRGB_CAPABLE_EXT NOTE: This is a candidate for stable branches. Reviewed-by: Eric Anholt <[email protected]>
* i965/gen7.5: Allow HW primitive restart for all primitive types.Paul Berry2013-04-091-7/+6
| | | | | | | | | | | | | Gen7.5 (Haswell) hardware supports primitive restart for all primitive types. It also handles all possible primitive restart indices. Rather than specialize both can_cut_index_handle_restart_index() and the switch statement in can_cut_index_handle_prims() for Haswell, just return early if the hardware is Haswell because we know it can handle everything. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Only use brw_draw.c's trim() function when necessary.Paul Berry2013-04-091-2/+14
| | | | | | | | | | | | | | | | | | | | brw_draw.c contains a trim() function which modifies the vertex count for quads and quad strips in order to discard dangling vertices. In principle this shouldn't be necessary, since hardware since Gen4 is capable of discarding dangling vertices by itself. However, it's necessary because as a hack to speed up rendering on Gen 4-5, we sometimes convert quads to trifans and quad strips to tristrips. The trim() function isn't necessary on Gen6 and up. This patch documents why and when the trim() function is necessary, and avoids calling it when it's not needed. This will avoid creating problems when we enable hardware support for primitive restart of quads and quad strips on Haswell. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Fix DEBUG_SHADER_TIME when VS terminates with 2 URB writes.Paul Berry2013-04-091-4/+2
| | | | | | | | | | | The call to emit_shader_time_end() before the second URB write was conditioned with "if (eot)", but eot is always false in this code path, so emit_shader_time_end() was never being called for vertex shaders that performed 2 URB writes. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Fix ir_print_visitor's handling of interpolation qualifiers.Paul Berry2013-04-091-1/+2
| | | | | | | | | | | This patch updates the interp[] array to match the enum glsl_interp_qualifier. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> v2: Add a STATIC_ASSERT to make sure the array is the correct size. This required adding INTERP_QUALIFIER_COUNT to the enum.
* intel: Remove the texture_tiling driconf option.Kenneth Graunke2013-04-084-11/+1
| | | | | | | | | | | | This option can force textures to be untiled. However, on Gen6+, depth buffers must be Y-tiled. MSAA buffers also must be Y-tiled. So setting this option on even a trivial application like glxgears causes assertion failures in a debug build, and likely GPU hangs in a release build. It's just giving users a license to shoot themselves in the foot. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Prefer Y-tiling on Gen6+.Kenneth Graunke2013-04-081-1/+1
| | | | | | | | | | | | | | | | | In the past, we preferred X-tiling for color buffers because our BLT code couldn't handle Y-tiling. However, the BLT paths have been largely replaced by BLORP on Gen6+, which can handle any kind of tiling. We hadn't measured any performance improvement in the past, but that's probably because compressed textures were all untiled anyway. Improves performance in GLB27_TRex_C24Z16_FixedTime by 7.69231%. v2: Rebase on top of Eric's untiled-for-larger-than-aperture changes. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Use tiling even for compressed textures.Kenneth Graunke2013-04-081-1/+1
| | | | | | | | | | | | | | The code has no rationale for why we would force compressed textures to be untiled, and it appears to work fine. Git archeology indicates that it's been that way dating back to when we first started tiling. Improves performance in GLB27_TRex_C24Z16_FixedTimeStep at 1280x720 by 10.0529% +/- 0.573075% (n=12). Improves performance in Xonotic by 4.56409% +/- 0.27965% (n=3). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* intel: Refactor selection of miptree tilingChad Versace2013-04-081-42/+61
| | | | | | | | | | | | | | | | | | This patch (1) extracts from intel_miptree_create() the spaghetti logic that selects the tiling format, (2) rewrites that spaghetti into a lucid form, and (3) moves it to a new function, intel_miptree_choose_tiling(). No behavioral change. As a bonus, it is now evident that the force_y_tiling parameter to intel_miptree_create() does not really force Y tiling. v2 (Ken): Rebase on top of Eric's untiled-for-larger-than-aperture changes. This required passing in the miptree. Signed-off-by: Chad Versace <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* intel: Allocate hiz in intel_renderbuffer_move_to_temp()Chad Versace2013-04-081-0/+4
| | | | | | | | | | | | | When moving the renderbuffer to a new miptree, we neglected to allocate the hiz buffer for the new miptree. Oops. Fixes all Piglit depthstencil-render-miplevels tests from crash to pass on Sandybridge. Note: This is a candidate for the 9.1 branch. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* st/mesa: fix levels in initial texture creationDave Airlie2013-04-081-1/+1
| | | | | | | | | | | calim pointed out we were getting mipmap levels for array multisamples, this didn't make sense. So then I noticed this function takes last_level so we are passing in a too high value here. I think this should fix the case he was seeing. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* intel: Avoid making tiled miptrees we won't be able to blit.Eric Anholt2013-04-081-14/+21
| | | | | | | | | | | | | | Doing so was breaking miptree mapping, which we really need to be able to handle. With this change, intel_miptree_map_direct() falls through to doing a CPU mapping on the buffer like we need. With the previous 2 patches, all of these should be fixed: piglit max-texture-size (all 3 patches required!) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37871 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44958 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=53494 Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Do temporary CPU maps of textures that are too big to GTT map.Eric Anholt2013-04-081-0/+21
| | | | | | | | This still fails, since 8192*4bpp == 32768, which is too big to use the blitter on. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Daniel Vetter <[email protected]>
* intel: Add support for writing to our linear-temporary-CPU-map case.Eric Anholt2013-04-081-2/+23
| | | | | | | This will be used for handling updates of large textures. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>.
* intel: Remove check for kernel 2.6.29.Kenneth Graunke2013-04-081-7/+0
| | | | | | | | | | | | | Now that we require 2.6.39, there's no need to also check for 2.6.29. Calling drm_intel_bufmgr_gem_enable_fenced_relocs() without checking should be safe, as it simply sets a flag. This does remove the check for zero fences available, but that doesn't seem worth checking. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* intel: Require kernel 2.6.39 for relaxed relocation support.Kenneth Graunke2013-04-082-5/+4
| | | | | | | | | | Chris Wilson's relaxed relocation patch landed in March 2011. Anyone running pre-3.0 kernels probably isn't going to get the latest Mesa anyway. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Remove a few BRW_STATE_... enum values.Kenneth Graunke2013-04-081-2/+0
| | | | | | | | These were likely used for BRW_NEW_... dirty bit flags at one point, but they're unused now. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Remove brw->vb.info and struct brw_vertex_info.Kenneth Graunke2013-04-082-13/+0
| | | | | | | Nobody uses this value, so there's no need to set it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Remove the BRW_NEW_INPUT_DIMENSIONS flag.Kenneth Graunke2013-04-083-8/+0
| | | | | | | | | | When I removed the proj_attrib_mask optimization, I also removed the last consumer of this bit without realizing it. Since nobody uses it, there's no point in flagging it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* register_allocate: Fix the type of best_benefit.Matt Turner2013-04-081-1/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* st/mesa: don't expose ARB_color_buffer_float without driver support in GL coreMarek Olšák2013-04-061-0/+11
| | | | Reviewed-by: Brian Paul <[email protected]>
* mesa: allow drivers not to expose ARB_color_buffer_float in GL core profileMarek Olšák2013-04-067-15/+48
| | | | Reviewed-by: Brian Paul <[email protected]>
* mesa: move updating clamp control derived state out of mesa_update_state_lockedMarek Olšák2013-04-064-36/+40
| | | | | | | | | It has 2 dependencies: glClampColor and the framebuffer, we might just as well do the update where those two are changed. v2: cosmetic changes from Brian's email Reviewed-by: Brian Paul <[email protected]>
* mesa: don't set _ClampFragmentColor to TRUE if it has no effectMarek Olšák2013-04-0613-21/+37
| | | | | | | | This should reduce shader recompilations with drivers that emulate fragment color clamping, because we want the clamping to be enabled only if there is a signed normalized or floating-point colorbuffer. Reviewed-by: Brian Paul <[email protected]>
* mesa: refactor clamping controls, get rid of _ClampReadColorMarek Olšák2013-04-067-30/+58
| | | | | | v2: cosmetic changes from Brian's email Reviewed-by: Brian Paul <[email protected]>
* mesa: don't memcmp() off the end of a cache key.Chris Forbes2013-04-061-2/+9
| | | | | | | | | | | | | | | | Reported-by: `per` in #intel-gfx The size of the cache key varies, so store the actual size as well as the key blob itself, rather than just assuming it's the same as the size passed in. NOTE: This is a candidate for stable branches. V2: Don't leave silly holes in structure; use unsigned instead of GLuint. V3: Fix missing case for `last` match. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Use a variable for the push constant size in kB.Kenneth Graunke2013-04-041-2/+3
| | | | | | | | | This clarifies that the offset of 2 is actually 16 kB / 8kB units. It also keys both computations off of a single variable, which should make it easier to change in the future. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Turn brw->urb.vs_size and gs_size into local variables.Kenneth Graunke2013-04-043-22/+12
| | | | | | | | These variables are only used within a single function, so we may as well make them local variables. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Remove BRW_NEW_WM_INPUT_DIMENSIONS dirty bit.Kenneth Graunke2013-04-043-4/+0
| | | | | | | | This was only produced by the brw_wm_input_dimensions atom, which was removed in the previous commit. So there's no need for the dirty bit. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Delete brw_vs_constval.c and the brw_wm_input_sizes atom.Kenneth Graunke2013-04-045-279/+0
| | | | | | | | This was only used to compute proj_attrib_mask, which was removed by the previous commit. That makes this dead code. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Remove now dead brw_wm_prog_key::proj_attrib_mask field.Kenneth Graunke2013-04-043-29/+0
| | | | | | | | | The previous commit removed the last user of this field, so there's no longer any point in setting it. Removing this should eliminate state-dependent recompiles, and make the precompile more reliable. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Remove fixed-function texture projection avoidance optimization.Kenneth Graunke2013-04-041-25/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This optimization attempts to avoid extra attribute interpolation instructions for texture coordinates where the W-component is 1.0. Unfortunately, it requires a lot of complexity: the brw_wm_input_sizes state atom (all the brw_vs_constval.c code) needs to run on each draw. It computes the input_size_masks array, then uses that to compute proj_attrib_mask. Differences in proj_attrib_mask can cause state-dependent fragment shader recompiles. We also often fail to guess proj_attrib_mask for the fragment shader precompile, causing us to needlessly compile it twice. Furthermore, this optimization only applies to fixed-function programs; it does not help modern GLSL-based programs at all. Generally, older fixed-function programs run fine on modern hardware anyway. The optimization has existed in some form since the initial commit. When we rewrote the fragment shader backend, we dropped it for a while. Eric readded it in commit eb30820f268608cf451da32de69723036dddbc62 as part of an attempt to cure a ~1% performance regression caused by converting the fixed-function fragment shader generation code from Mesa IR to GLSL IR. However, no performance data was included in the commit message, so it's unclear whether or not it was successful. Time has passed, so I decided to re-measure this. Surprisingly, Eric's OpenArena timedemo actually runs /faster/ after removing this and the brw_wm_input_sizes atom. On Ivybridge at 1024x768, I measured a 1.39532% +/- 0.91833% increase in FPS (n = 55). On Ironlake, there was no statistically significant difference (n = 37). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Use ctx->Stencil._WriteEnabled in DEPTH_STENCIL_STATE.Kenneth Graunke2013-04-041-5/+1
| | | | | | | | This is the same computation as the _WriteEnabled flag, so we may as well use it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Fix stencil write enable flag in 3DSTATE_DEPTH_BUFFER on Gen7+.Kenneth Graunke2013-04-041-1/+1
| | | | | | | | | | | | ctx->Stencil.WriteMask is a statically sized array of 3 elements. Checking it against 0 actually is a NULL check, and can never fail, which meant that we always said stencil writes were enabled. Use the new core Mesa derived state flag to fix this. NOTE: This is a candidate for stable branches. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>