| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In most cases, the width, height, and depth of the physical surface
used by the driver to implement a texture or renderbuffer is equal to
the logical width, height, and depth exposed to the client through
functions such as glTexImage3D(). However, there are two exceptions:
cube maps (which have a physical depth of 6 but a logical depth of 1)
and multisampled renderbuffers (which have larger physical dimensions
than logical dimensions to allow multiple samples per pixel).
Previous to this patch, we accounted for the difference between
physical and logical surface dimensions at inconsistent places in the
call graph (multisampling was accounted for in
intel_miptree_create_for_renderbuffer(), and cubemaps were accounted
for in intel_miptree_create_internal()). As a result, it wasn't
always clear, when calling a miptree creation function, whether
physical or logical dimensions were needed. Also, we weren't
consistent about storing logical dimensions in the intel_mipmap_tree
structure (we only did so in the
intel_miptree_create_for_renderbuffer() code path, and we did not
store depth).
This patch refactors things so that intel_miptree_create_internal() is
responsible for converting logical to physical dimensions and for
storing both the physical and logical dimensions in the
intel_mipmap_tree structure. As a result, all miptree creation
functions interpret their arguments as logical dimensions, and both
physical and logical dimensions are always available to functions that
work with intel_mipmap_trees.
In addition, it renames the fields in intel_mipmap_tree used to store
the dimensions, so that it is clear from the name whether physical or
logical dimensions are being referred to.
This should fix the following bugs:
- When creating a separate stencil surface for a depthstencil cubemap,
we would erroneously try to convert the depth from 1 to 6 twice,
resulting in an assertion failure.
- When creating an MCS buffer for compressed multisampling, we used
physical dimensions instead of logical dimensions, resulting in
wasted memory.
In addition, this should considerably simplify the implementation of
ARB_texture_multisample, because it moves the code to compute the
physical size of multisampled surfaces out of renderbuffer-only code.
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This allows intel_miptree_alloc_mcs() to force Y tiling for the MCS
buffer. Previously we accomplished this by the hack of passing
INTEL_MSAA_LAYOUT_CMS as the msaa_layout parameter, but that parameter
is going to be going away soon.
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
No functional change. This patch moves the compute_msaa_layout()
function earlier in intel_mipmap_tree.c so that it can be used by
other functions in that file.
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
| |
Previously this macro existed in 3 separate places, some inside the
intel driver and some outside of it. It makes more sense to have it
in main/macros.h
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Haswell and later support the GL_FIXED and 2_10_10_10_rev vertex formats
natively, and don't need shader workarounds.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
The new hardware actually just supports this now.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
generate_uniform_pull_constant_load_gen7() is only called on Gen7+, so
the gen < 6 code is dead.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
gl_constants::MaxSamples is an integer, so setting it to 1.0 is just
silly.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes intel_miptree_unmap_etc() (which decompresses ETC
textures to linear) to pay attention to map->x and map->y when writing
to the destination image. Previously these values were ignored,
causing the xoffset and yoffset parameters passed to
glCompressedTexSubImage2D() to be ignored.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Every generation except Gen7 creates SURFACE_STATE entries via a
uint32_t array. Only Gen7 uses the older bitfield structure, which we
moved away from because it was less efficient. Convert it for
consistency.
This reduces the compiled size of gen7_wm_surface_state.o by 2.86% in a
release build.
v2: Fix accidental use of BRW_SURFACE_WIDTH/HEIGHT in brw_state_dump.c;
switch back to gen7_set_surface_mcs_info setting surf[6] directly
(both per Eric's review comments).
Acked-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Should fix:
https://bugs.freedesktop.org/show_bug.cgi?id=57842
|
|
|
|
|
|
|
|
| |
Fixes missing break in switch defect reported by Coverity.
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The old error reporting was completely bogus, passing _mesa_error() a
format string that didn't even match the remaining arguments. Also,
in many cases the number of dimensions in the TexImage call was not
preserved in the error message (e.g. an error in glTexImage2D was
reported simply as an error in glTexImage).
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Dungeon Defenders hits TexImage()'s try_pbo_upload() path where
image->Width == 2, which doesn't meet intelEmitCopyBlit's requirement
that the pitch needs to be a multiple of 4.
Since intelEmitCopyBlit can already fail for a myriad of other reasons,
and it's not clear that other callers are immune to this failure mode,
simply make it return false rather than assert.
Fixes Dungeon Defenders on i965/Ivybridge. Now playable (aside from
having to work around the EXT_bindable_uniform issue).
NOTE: This is probably a candidate for the 9.0 branch.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
Improves GLBenchmark 2.1 offscreen performance by 3.2% +/- 1.5% (n=52).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
We don't need them now that our set of parameter pointers points at the
GL core storage for them. This should save memory/bandwidth/overhead in
uniform updates.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
NumParameters used to be an upper bound on the number of vec4s to be
uploaded, which was basically safe (unless your buffer was bound near
the top of address space *and* you array indexed outside the buffer, in
which case I think you might GPU hang). As I migrate the driver away
from ParameterValues[], this is no longer true.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Like in the FS, there's no reason to use an external copy if the
ParameterValues[] relayout of it isn't the layout we need.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
There's no reason to use an external copy if the relayout in the
external copy isn't serving us.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Now that ParameterValues doesn't change across the visitor, we don't
need to go through this.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Things are even more restrictive than they used to be, so I've made
mistakes in this area.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If adding scale parameters during program compile caused a realloc of
ParameterValues, then the driver uniform storage set up by
_mesa_associate_uniform_storage() would point to potentially freed
memory.
Note that this uses TexturesUsed, which may change at runtime for GLSL
when sampler uniforms change. This is a flaw in our handling of texrect
in general, and not one I'm fixing currently.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We don't have native hardware support for these, so they get promoted to
RGBA, in which case we don't have hardware dealing with the channel
swizzling for us.
Fixes piglit EXT_texture_snorm/texwrap formats bordercolor (-swizzled).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
I had left this out for a long time because it regressed some
depthstencil-render-miplevels cases when it was enabled. Now that the
bugs causing those are fixed, there's nothing stopping us.
Improves glbenchmark 2.1 offscreen performance by 7.3% +/- 2.8% (n=10).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This worked out before because the parent was always 4 bytes so it
didn't affect the layout, but now we want to support Z16 too.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixing these rendering bugs has been implicated in performance
regressions (which may be unfixable), but at least knowing that it's
happening should help diagnose those regressions.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
The ETC1 changes failed at this, so let's make sure it will be caught in
testing next time.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This was caught by the assertion in the next commit. It fixes the
remaining piglit depthstencil-render-miplevels cases, probably by
avoiding broken stencil copies in the validation path.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
When comparing to the teximage's format, we have to look at the
format-the-mt-was-created-for not the format-actually-stored-in-the-mt.
Improves glbenchmark 2.1 offscreen test performance 159% +/- 17% (n=3).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54582
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
Relayout is expensive, so it's something developers (both us and others)
should know about when it happens.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
Fixes all the remaining non-Z32F_S8 depthstencil-render-miplevels tests
in piglit.
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
brw_emit_vertices contains special case logic to handle the case where
a vertex shader doesn't read any inputs. This special case logic was
incorrectly activating in the case were the only vertex input is
gl_VertexID. As a result, if a shader used gl_VertexID but used no
other inputs, then all vertices got a gl_VertexID of zero.
Fixes oglconform test "ubo-usage advanced.transform_feedback".
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The rather unweildy logic for determining this condition was repeated
in a large number of places. This patch consolidates it to a single
inline function.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the i965 driver contained code to compute the maximum
number of vertices that could be written without overflowing any
transform feedback buffers. This code wasn't driver-specific, and for
GLES3 support we're going to need to use it in core mesa. So this
patch moves the code into a core mesa function,
_mesa_compute_max_transform_feedback_vertices().
Reviewed-by: Ian Romanick <[email protected]>
v2: Eliminate C++-style variable declarations, since these won't work
with MSVC.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
No functional change--this simply paves the way to allow futures
patches to call vbo_count_tessellated_primitives() during error
checking, before the _mesa_prim struct has been constructed.
This will be needed for GLES3, which requires draw calls to fail if
there is not enough space available in transform feedback buffers to
accommodate the primitives to be drawn.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
| |
This change forces the context version to be computed before
initilizing the exec dispatch tables.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
These don't really belong in brw_structs.h.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
struct brw_instruction and the related instruction emitting code won't
be useful on Gen8+, as the instruction encoding changed. However, the
struct brw_reg code is still extremely valuable.
While we're at it, fix up some style points:
- s/GLuint/unsigned/g
- s/GLint/int/g
- s/GLshort/int16_t/g
- s/GLushort/uint16_t/g
- s/INLINE/inline/g
- Replace tabs with spaces
- Put return types on a separate line from the function name/parameters
- Remove trailing whitespace
- Remove extraneous whitespace around function parameters
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Tested-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
No statistically significant performance difference on glbenchmark 2.7
(n=60). It reduces cycles spent in the vertex shader by 3.3% +/- 0.8%
(n=5), but that's only about .3% of all cycles spent according to the
fixed shader_time.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The way our visitor works, scalar expression/swizzle results that get
stored in channels other than .x will have an intermediate MOV from
their result in the .x channel to the real .y (or whatever) channel, and
similarly for vec2/vec3 results.
By knowing how to adjust DP4-type instructions for optimizing out a
swizzled MOV, we can reduce instructions in common matrix multiplication
cases.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The compute-to-mrf code is really twitchy, and it's hard to construct
GLSL testcases for it. This unit test is also really hard to work with
(for example, if your instruction is removed by dead code elimination,
you end up inspecting something irrelevant), but I did use it for
debugging some of the commits to follow.
I called it test_vec4_register_coalesce because the compute-to-mrf code
is about to morph into that.
Reviewed-by: Kenneth Graunke <[email protected]>
|