| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The derived state approach currently used (_RestartIndex) doesn't work:
in the GL_PRIMITIVE_RESTART_FIXED_INDEX case, the restart index depends
on the index buffer's data type, and that isn't known until draw time.
The existing code also fails to obey the GL 4.3 rules which say that
FIXED_INDEX takes precedence over normal primitive restart.
This helper function correctly determines the restart index, and will
replace the derived state.
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The derived _PrimitiveRestart enable flag combines the PrimitiveRestart
and PrimitiveRestartFixedIndex enable flags. However, DrawArrays is not
supposed to do FixedIndex restart:
From the OpenGL 4.3 Core specification, section 10.3.5 (page 302):
"If PRIMITIVE_RESTART_FIXED_INDEX is enabled, primitive restart is not
performed for array elements transferred by any drawing command not
taking a type parameter, including all of the *Draw* commands other
than *DrawElements*."
The OpenGL ES 3.0 specification agrees by omission:
"When DrawElements, DrawElementsInstanced, or DrawRangeElements
transfers a set of generic attribute array elements to the GL..."
Notably, DrawArrays is not included in the list of draw calls that
take PRIMITIVE_RESTART_FIXED_INDEX into consideration.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Previously it would assertion fail in debug builds (though the correct
value was returned in a non-debug build). Marking it as a candidate for
stable even though it has no current consumers in the stable branches, in
case one shows up in a later backport.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64727
NOTE: This is a candidate for stable branches.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were expanding the live range too far, breaking register_coalesce_2()
and compute_to_mrf() on 16-wide shaders. Turning it back on improves
GLB2.7 performance by 0.239355% +/- 0.0850649% (n=398). shader-db stats
are:
total instructions in shared programs: 1627211 -> 1609262 (-1.10%)
instructions in affected programs: 450351 -> 432402 (-3.99%)
While 33 new 16-wide shaders are gained, 70 are lost. Despite that,
tropics (the app that lost the most 16-wide) shows a .41% +/- .16%
(n=7/8, first-run outlier removed) performance improvement on my HSW.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The scheduler didn't know about uniform-type accesses, and if a uniform
access was last in a 16-wide, we'd walk off the end of the array. This
never happened, because we'd never coalesce out all the GRFs, due to a bug
to be fixed in the next commit.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
i965 and radeon use ra_set_node_reg() to force payload registers to
specific registers while exposing those registers to the allocator still.
We were treating those register nodes as unsuccessfully allocated in the
ra_simplify() step, leading to walking the registers again to do
optimistic coloring even if there was nothing left ot do.
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the introduction of default-to-SARGB8 window system framebuffers,
non-blorp hardware lost blit acceleration for these two paths between the
window system and ARGB8888 textures. Since we shouldn't be doing any
conversion anyway, just compatibility-check the linear variants of the
formats.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61954
Reviewed-by: Kenneth Graunke <[email protected]>
Tested-by: Tobias Jakobi <[email protected]>
|
|
|
|
|
|
|
|
| |
Since the glBitmap() MRT change, it's unused. There was basically no way
to responsibly use this function since MRT was introduced.
Reviewed-and-tested-by: Ian Romanick <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Any 32-bit format got ARGB8888 handling (including, say, GL_RG1616), and
anything else got 16-bit (including, say, GL_R8), which could potentially
hang the GPU by writing out of bounds.
NOTE: This is a candidate for the stable branches.
Reviewed-and-tested-by: Ian Romanick <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We'd only hit color buffer 0 even if multiple draw buffers were bound.
NOTE: This is a candidate for the stable branches.
Reviewed-and-tested-by: Ian Romanick <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will ensure that we have resolves if we ever extend this to
glTexSubImage(), and fixes missing image start offset handling.
The texture buffer alloc ended up getting moved up, because we want to
look at the format of the image's actual mt to see if we'll end up
blitting the right thing, in the case of packed depth/stencil uploads.
This is the last caller of intelEmitCopyBlit() on a miptree-wrapped BO.
Reviewed-and-tested-by: Ian Romanick <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The previous code was missing depth resolves, that had only been prevented
due to no blitting of Y tiling. The pair of flip args in the new blit
function means that we can just drop the pack->Invert fallback.
Reviewed-and-tested-by: Ian Romanick <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I needed to do this for the PBO blit cases to use intel_miptree_blit().
But this also actually partially fixes a bug in EGLImage handling: We
can't share regions across contexts, because regions have a refcount that
isn't protected by a mutex, and different contexts can be simulataneously
accessed from multiple threads. Now we just need to get regions out of
__DRIImage. There was also a missing use of image->offset in the EGLImage
renderbuffer storage code.
Reviewed-and-tested-by: Ian Romanick <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
In a bit of debug code, we no longer have the inter-slice x/y to print.
But I think the level/slice is more useful in this case for looking at
what's getting mapped, especially given that INTEL_DEBUG=blit will tell
you the other value.
Reviewed-and-tested-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
While this is a bit more CPU work, it also is less code to handle this
path, and fixes problems with 32k-pitch textures and missing resolves.
v2: Add error checking in new code.
Reviewed-and-tested-by: Ian Romanick <[email protected]> (v1)
Reviewed-by: Kenneth Graunke <[email protected]> (v1)
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
For a blit-uploaded temporary, it's faster on current hardware to memcpy
the data into a linear CPU mapping than to go through the GTT.
v2: Turn the not-fully-supported mask into 3 supported enum values.
Reviewed-and-tested-by: Ian Romanick <[email protected]> (v1)
Reviewed-by: Kenneth Graunke <[email protected]> (v1)
Reviewed-by: Paul Berry <[email protected]> (v2)
Reviewed-by: Chad Versace <[email protected]> (v2)
|
|
|
|
|
|
|
|
|
| |
This is just in case someone else trips over this due to our weird reuse
of this code in glBlitFramebuffer().
Reviewed-and-tested-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
| |
If the hw is pre-gen5 and can't blit depth, it'll cleanly error out.
Reviewed-and-tested-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
I think we've measured no performance difference from this in the past,
except that the blorp code can do things like multisample resolves.
Prevents piglit regression in the next commit when a testcase started
trying to do a multisampled resolve through the old glCopyTexSubImage()
path.
Reviewed-and-tested-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were protected for a long time by the fact that depth was Y tiled and
you couldn't blit Y. Now that we can blit Y, we were failing to resolve
depth in glCopyPixels().
Note in the comment about swrast, that the swrast map path does resolves
appropriately already.
Reviewed-and-tested-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I had previously asserted that it was hard to write a useful, simpler
blit function, but I think this might be it.
This has the side effect of extending the 32k pitch check to a few more
places that were missing it.
v2: Update comment for being moved inside intel_miptree_blit().
Reviewed-and-tested-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
| |
This makes it more consistent with intel_miptree_get_tile_offsets().
Reviewed-and-tested-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
| |
Reviewed-and-tested-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Right now, the callers in i965 don't expect a nonzero page offset to
actually occur (since that's being handled elsewhere), but it seems
like a trap to leave it this way.
Reviewed-and-tested-by: Ian Romanick <[email protected]>
Acked-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The problem is the sampler units are allocated from the same pool for all
shader stages, so if a vertex shader uses 12 samplers (0..11), the fragment
shader samplers start at index 12, leaving only 4 sampler units
for the fragment shader. The main cause is probably the fact that samplers
(texture unit -> sampler unit mapping, etc.) are tracked globally
for an entire program object.
This commit adapts the GLSL linker and core Mesa such that the sampler units
are assigned to sampler uniforms for each shader stage separately
(if a sampler uniform is used in all shader stages, it may occupy a different
sampler unit in each, and vice versa, an i-th sampler unit may refer to
a different sampler uniform in each shader stage), and the sampler-specific
variables are moved from gl_shader_program to gl_shader.
This doesn't require any driver changes, and it fixes piglit/max-samplers
for gallium and classic swrast. It also works with any number of shader
stages.
v2: - converted tabs to spaces
- added an assertion to _mesa_get_sampler_uniform_value
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
to match the size of ctx->Texture.Unit, and it will also fix
piglit/max-samplers with the following commit.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
Some Gallium drivers were crashing, because the array was not large enough.
v2: clamp the per-shader maximum in st/mesa, then sum them all up
NOTE: This is a candidate for the stable branches.
|
|
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64934
NOTE: This is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It turns out the MI_LOAD_REGISTER_IMM approach doesn't work on Haswell,
and regressed essentially all the transform feedback Piglit tests.
This morally reverts eaa6fbe6d54dc99efac4ab8e800edef65ce8220d. However,
the code is still simpler than it was. On BeginTransformFeedback, we
simply flush the batch and set the SOL reset flag so that the next batch
will start with zeroed offsets. There's still no software counting.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64887
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Enables guardband clipping when the viewport covers the entire render
target.
No piglit regressions on Ironlake.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Relaxes the validation of
OPTION ARB_precision_hint_{nicest,fastest};
to allow duplicate options. The spec says that both /nicest/ and
/fastest/ cannot be specified together, but could be interpreted
either way for respecification of the same option.
Other drivers (NVIDIA etc) accept this, and at least one Unity3D game
expects it to succeed (Kerbal Space Program).
V2: Add spec quote.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59440
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
The hardware does it, so no need for this workaround.
Reviewed-and-tested-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
This should already be handled by _mesa_base_tex_format() calls in
TexImage*.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Most of the work in BeginTransformFeedback is only necessary on Gen6.
We may as well just skip it on Gen7+.
v2: Add an intel->gen == 6 assert.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we have hardware contexts, we don't need to continually
reprogram the GS_SVBI_INDEX registers. They're automatically saved and
restored with the context, so they can just increment over time. We
only need to reset them when starting transform feedback.
There's also no reason to delay until the next drawing operation; we can
just emit the packet immediately. However, this means we must drop the
initialization in brw_invariant_state, as BeginTransformFeedback may
occur before the first drawing in a context.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
EXT_transform_feedback isn't yet supported on Gen4-5, so none of this
query code is actually used. This also means we can remove some of the
surrounding support code.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
| |
This was only used for the the non-hardware context code.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
| |
We can just do it ourselves with MI_LOAD_REGISTER_IMM.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Failing to get a hardware context now means failing to load the driver,
so this code will never get hit.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
| |
Using a function-like macro makes it easy to loop over all four streams.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64745
Note: This is a candidate for the stable branches.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
| |
meta.h should be included in brw_state_upload.c to get access to
function _mesa_meta_in_progress().
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we have hardware contexts and can use MI_STORE_REGISTER_MEM,
we can use the GPU's pipeline statistics counters rather than going out
of our way to count primitives in software.
Aside from being simpler, this also paves the way for Geometry Shaders,
which can output an arbitrary number of primitives on the GPU. It will
also allow us to use hardware primitive restart when these queries are
in use.
The GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN query is easy: it
corresponds to the SO_NUM_PRIMS_WRITTEN/SO_NUM_PRIMS_WRITTEN0_IVB
counters.
The GL_PRIMITIVES_GENERATED query is trickier. Gen provides several
statistics registers which /almost/ match the semantics required:
- IA_PRIMITIVES_COUNT
The number of primitives fetched by the VF or IA (input assembler).
This undercounts when GS is enabled, as it can output many primitives.
- GS_PRIMITIVES_COUNT
The number of primitives output by the GS. Unfortunately, this
doesn't increment unless the GS unit is actually enabled, and it
usually isn't.
- SO_PRIM_STORAGE_NEEDED*_IVB
The amount of space needed to write primitives output by transform
feedback. These naturally only work when transform feedback is on.
We'd also have to add the counters for all four streams.
- CL_INVOCATION_COUNT
The number of primitives processed by the clipper. This doesn't work
if the GS or SOL throw away primitives for rasterizer discard.
However, it does increment even if the clipper is in REJECT_ALL mode.
Dynamically switching between counters would be painfully complicated,
especially since GS, rasterizer discard, and transform feedback can all
be switched on and off repeatedly during a single query.
The most usable counter is CL_INVOCATION_COUNT. The previous two
patches reworked rasterizer discard support so that all primitives hit
the clipper, making this work.
v2: Occlusion query bug fixes removed and squashed in earlier patches.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This has more of a negative impact than the previous patch, as on Gen6
passing primitives through to the clipper means we actually have to make
the GS thread write them to the URB.
I don't see another good solution though, and rasterizer discard is not
the most common of cases, so hopefully it won't be too terrible.
v2: Add a perf_debug; resolve rebase conflicts on the brw dirty flags;
remove the rasterizer_discard field from brw_gs_prog_key.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]> [v1]
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to implement the GL_PRIMITIVES_GENERATED query in a sane
fashion on our hardware, we can't discard primitives until the clipper.
The patch after next explains the rationale.
By setting the clipper to REJECT_ALL mode, all primitives get thrown away,
so rendering is still appropriately disabled.
This may negatively impact performance in the rasterizer discard case,
but it's unclear how much and this hasn't been observed to be a
bottleneck in any application we've looked at. The clipper is the very
next stage in the pipeline, so I don't think it will be terrible.
v2: Add a perf_debug; resolve rebase conflicts on the brw dirty flags.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't currently use the clipper statistics, but we'll soon use
CL_INVOCATIONS_COUNT to implement the GL_PRIMITIVES_GENERATED query.
The number of primitives generated is not supposed to be altered during
operations such as glGenerateMipmap.
Prevents spec/EXT_transform_feedback/generatemipmap prims_generated
from breaking when we start using pipeline statistics registers to
implement the GL_PRIMITIVES_GENERATED query in a few commits.
v2: Use the BRW_NEW_META_IN_PROGRESS flag for correct state handling.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]> [v1]
Reviewed-by: Paul Berry <[email protected]>
|