| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
There's no need to pass the target, level and texObj parameters since
they can be easily obtained from the texImage pointer.
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
On failure, intel_miptree_create() needs to *release* the miptree, not
just free it, so that the stencil_mt gets released too.
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
|
| |
This saves a couple of instructions on most programs with control
flow. More interestingly, 6 shaders from unigine sanctuary now fit
into 16-wide without register spilling.
|
|
|
|
|
| |
There's nothing batchbuffer-related here. State updates by the caller
will trigger re-emitting of any new hardware state.
|
|
|
|
|
| |
There should be nothing special about this call compared to other
callers of intel_draw_buffer().
|
|
|
|
|
|
|
| |
We were printing out the line triggering the flush, but a variety of
different causes just printed the line number for intel_flush()'s call
of intel_batchbuffer_flush(). Plumb the line numbers from the caller
of intel_flush() on through.
|
|
|
|
|
|
|
|
|
|
|
| |
Since the refactor in d7b33309fe160212f2eb73f471f3aedcb5d0b5c1, depth
in the miptree changed from 1 to 6, so we always decided it didn't
match, and we would relayout to something that would still not
"match".
Improves performance 23.8% (+/- 1.1%, n=4)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43329
|
|
|
|
|
| |
We would have done a relayout at validate time, but it's senseless to
store into a miptree if it's going to force relayout.
|
|
|
|
|
|
|
|
| |
bitset.h is still used by classic nouveau -- see `git grep '\<BITSET_'`
-- and the state stored is too big to fit in 64bit integers (it requires
approximately 87 bits), so there is no obvious alternative here.
This effecively reverts commit 196800d79829a420073f762fac90090a7b416d2d.
|
|
|
|
| |
Signed-off-by: Mathias Froehlich <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Mathias Froehlich <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Mathias Froehlich <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Mathias Froehlich <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider the following code:
MOV A.x, B.x
MOV B.x, C.x
After the first line, cur_value[A][0] == B, indicating that A.x's
current value came from register B.
When processing the second line, we update cur_value[B][0] to C.
However, for drect copies, we fail to reset cur_value[A][0] to NULL.
This is necessary because the value of A is no longer the value of B.
Fixes Counter-Strike: Source in Wine (where the menu rendered completely
black in DX9 mode), completely white textures in Civilization V, and the
new Piglit test glsl-vs-copy-propagation-1.shader_test.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42032
Tested-by: Matt Turner <[email protected]>
Tested-by: Christopher James Halse Rogers <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In this code, 'i' loops over the number of virtual GRFs, while 'j' loops
over the number of vector components (0 <= j <= 3).
It can't possibly be correct to see if bit 'i' is set in the destination
writemask, as it will have values much larger than 3. Clearly this is
supposed to be 'j'.
Found by inspection.
Tested-by: Matt Turner <[email protected]>
Tested-by: Christopher James Halse Rogers <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In Android IceCreamSandwich, SurfaceFlinger requires GL_OES_image_external
for basic compositing tasks. Without the extension, SurfaceFlinger fails
to start.
Despite the incompleteness of the extension's implementation introduced by
this patch, it is good enough to enable SurfaceFlinger and to unblock the
people who need to begin testing Mesa on IceCreamSandwich.
To enable the extension, set the environment variable
MESA_EXTENSION_OVERRIDE="+GL_OES_EGL_image_external". Ideally, Android
should set this in init.rc.
WARNING: This implementation of GL_OES_EGL_image_external is not complete.
Some of it is even incorrect. When we begin to really implement
GL_OES_EGL_image_external, much of the patch will need reverting.
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here is the final patch to enable dynamic eu instruction store size:
increase the brw eu instruction store size dynamically instead of just
allocating it statically with a constant limit. This would fix something
that 'GL_MAX_PROGRAM_INSTRUCTIONS_ARB was 16384 while the driver would
limit it to 10000'.
v2: comments from ken, do not hardcode the eu limit to (1024 * 1024)
Signed-off-by: Yuanhan Liu <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A single next_insn may change the base address of instruction store
memory(p->store), so call it first before referencing the instruction
store pointer from an index.
This the final prepare work to enable the dynamic store size.
v2: comments from Ken, define emit_endif as bool type
Signed-off-by: Yuanhan Liu <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If dynamic instruction store size is enabled, while after the brw_JMPI()
and before the brw_land_fwd_jump() function, the eu instruction store
base address(p->store) may change. Thus, the safe way to reference the
jmp instruction is by index instead of by the instruction address.
v2: comments from Eric, don't change the prototype of brw_JMPI
Signed-off-by: Yuanhan Liu <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If dynamic instruction store size is enabled, while after
the brw_IF/ELSE() and before the brw_ENDIF() function, the
eu instruction store base address(p->store) may change.
Thus let if_stack just store the instruction index. This is
somehow more flexible and safe than store the instruction
memory address.
Signed-off-by: Yuanhan Liu <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When updating SOL indices, we were accidentally putting the starting
index in dword 1 and the SVBI number to increment in dword 2--these
should be reversed. Usually both of these values are zero, so we
didn't see any problem. However, if a transform feedback operation
spans multiple batch buffers, the starting index will be nonzero.
Fixes piglit test "EXT_transform_feedback/intervening-read output".
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When rendering triangle strips, vertices come down the pipeline in the
order specified, even though this causes alternate triangles to have
reversed winding order. For example, if the vertices are ABCDE, then
the GS is invoked on triangles ABC, BCD, and CDE, even though this
means that triangle BCD is in the reverse of the normal winding order.
The hardware automatically flags the triangles with reversed winding
order as _3DPRIM_TRISTRIP_REVERSE, so that face culling and two-sided
coloring can be adjusted to account for the reversed order.
In order to ensure that winding order is correct when streaming
vertices out to a transform feedback buffer, we need to alter the
ordering of BCD to BDC when the first provoking vertex convention is
in use, and to CBD when the last provoking vertex convention is in
use.
To do this, we precompute an array of indices indicating where each
vertex will be placed in the transform feedback buffer; normally this
is SVBI[0] + (0, 1, 2), indicating that vertex order should be
preserved. When the primitive type is _3DPRIM_TRISTRIP_REVERSE, we
change this order to either SVBI[0] + (0, 2, 1) or SVBI[0] + (1, 0,
2), depending on the provoking vertex convention.
Fixes piglit tests "EXT_transform_feedback/tessellation
triangle_strip" on Gen6.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
No longer used anywhere.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
The former was only used for clearing buffers. The later wasn't used
anywhere! Remove them and all implementations of those functions.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
Fixes piglit tesselation triangle_strip flat_last.
Reviewed-by: Kenneth Graunke <[email protected]> (v1)
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes almost all of the transform feedback piglit tests. Remaining
are a few tests related to tesselation for
quads/trifans/tristrips/polygons with flat shading.
v2: Incorporate Paul's feedback (squash with previous, state flag note,
static assert, update FINISHME)
Reviewed-by: Kenneth Graunke <[email protected]> (v1)
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
| |
We'll be growing more code in here as we actually enable the unit.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
| |
v2: Make the buffer enable bitfield take an index argument.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The code was relying on gs.prog_data's copy of the
number-of-verts-per-prim, which segfaulted on gen7 since it doesn't
make a GS program. We can easily calculate that value right here.
v2: Fix svbi_0_starting_index regression.
Reviewed-by: Paul Berry <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Although there is not much documentation of this fact, there are in
fact two separate VF caches:
- an "index-based" cache (described in the Sandy Bridge PRM, vol 2
part 1, section 2.1.2 "Vertex Cache"). This cache stores URB
handles of vertex shader outputs; its purpose is to avoid redundant
invocations of the vertex shader when drawing in random access mode
(e.g. glDrawElements()), and the same vertex index is specified
multiple times. It is automatically invalidated between
3D_PRIMITIVE commands and between instances within a single
3D_PRIMITIVE command.
- an "address-based" cache (mentioned briefly in vol 2 part 1, section
1.7.4 "PIPE_CONTROL Command"). This cache stores the data read from
vertex buffers; its purpose is to avoid redundant memory accesses
when doing instanced drawing or when multiple 3D_PRIMITIVE commands
access the same vertex data. It needs to be manually invalidated
whenever new data is written to a buffer that is used for vertex
data.
Previous to this patch, it was not necessary for Mesa to explicitly
invalidate the address-based cache, because there were no reasonable
use cases in which the GPU would write to a vertex data buffer during
a batch, and inter-batch flushing was taken care of by the kernel.
However, with transform feedback, there is now a reasonable use case:
vertex data is written to a buffer using transform feedback, and then
that data is immediately re-used as vertex input in the next drawing
operation. To make this use case work, we need to flush the
address-based VF cache between transform feedback and the next draw
operation. Since we are already calling
intel_batchbuffer_emit_mi_flush() when transform feedback completes,
and intel_batchbuffer_emit_mi_flush() is intended to invalidate all
caches, it seems reasonable to add VF cache invalidation to this
function.
As with commit 63cf7fad13fc9cfdd2ae7b031426f79107000300 (i965: Flush
pipeline on EndTransformFeedback), this is not an ideal solution. It
would be preferable to only invalidate the VF cache if the next draw
call was about to consume data generated by a previous draw call in
the same batch. However, since we don't have the necessary dependency
tracking infrastructure to figure that out right now, we have to
overzealously invalidate the cache.
Fixes Piglit test "EXT_transform_feedback/immediate-reuse".
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
After creating new binding table entries for transform feedback, we
need to set the dirty flag BRW_NEW_SURFACES, so that a new binding
table pointer will be sent to the hardware. Otherwise the new binding
table entries will not take effect.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The surface states tracked by BRW_NEW_WM_SURFACES are no longer used
for just WM. They are also used for vertex texturing and transform
feedback. To avoid confusion, this patch renames BRW_NEW_WM_SURFACES
to BRW_NEW_SURFACES.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
X8 depth formats weren't supported until Ironlake (Gen 5).
Fixes GPU hangs introduced in d84a180417d1eabd680554970f1eaaa93abcd41e.
One example test case was "fbo-missing-attachment-blit from".
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Although i965 gen6 does not yet support ARB_transform_feedback2 or
NV_transform_feedback2, it needs to support pause/resume functionality
so that meta-ops will work correctly.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
The codegen backends all had this same tracking, so just do it at the
EU level.
Reviewed-by: Yuanhan Liu <[email protected]>
|
|
|
|
|
|
|
| |
The EU code itself can just do this work, since all the consumers were
duplicating it.
Reviewed-by: Yuanhan Liu <[email protected]>
|
|
|
|
|
|
|
| |
This is a similar cleanup to what we did for brw_IF(), brw_ELSE(),
brw_ENDIF() handling.
Reviewed-by: Yuanhan Liu <[email protected]>
|
|
|
|
|
|
| |
The branch distances get patched up later at the WHILE instruction.
Reviewed-by: Yuanhan Liu <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This makes it easier to keep track of which dirty bits correspond to
which pieces of context, since it makes _NEW_RASTERIZER_DISCARD
correspond with ctx->RasterDiscard.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we were storing the RasterDiscard flag (for
GL_RASTERIZER_DISCARD) in gl_context::TransformFeedback. This was
confusing, because we use the _NEW_TRANSFORM flag (not
_NEW_TRANSFORM_FEEDBACK) to track state updates to it, and because
rasterizer discard has effects even when transform feedback is not in
use.
This patch makes RasterDiscard a toplevel element in gl_context rather
than a subfield of gl_context::TransformFeedback.
Note: We can't put RasterDiscard inside gl_context::Transform, since
all items inside gl_context::Transform need to be pieces of state that
are saved and restored using PushAttrib and PopAttrib.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we only enabled transform feedback when
MESA_GL_VERSION_OVERRIDE was 3.0 or greater, since transform feedback
support was not completely finished, so it didn't make sense to
advertise support for it unless absolutely necessary.
Now that transform feedback is fully implemented on gen6, we can
enable this extension unconditionally.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds software-based PRIMITIVES_GENERATED and
TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN queries that work by keeping
track of the number of primitives that are sent down the pipeline, and
adjusting as necessary to account for the way each primitive type is
tessellated.
In the long run we'll want to replace this with a hardware-based
implementation, because the software approach won't work with geometry
shaders or primitive restart. However, at the moment, we don't have
the necessary kernel support to implement a hardware-based query (we
would need the kernel to save GPU registers when context switching, so
that drawing performed by another process doesn't get counted).
Fixes Piglit tests EXT_transform_feedback/query-primitives_generated-*
and EXT_transform_feedback/query-primitives-written-*.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, i965 only supported two query types: GL_TIME_ELAPSED_EXT
and GL_SAMPLES_PASSED_ARB, and it distinguished between the two using
if/else statements that compared query->Base.Target to
GL_TIME_ELAPSED_EXT.
This patch changes the if/else statements to switch statements so that
we can add more query types without having to have a chain of
else-ifs.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't currently have kernel support for saving GPU registers on a
context switch, so if multiple processes are performing transform
feedback at the same time, their SVBI registers will interfere with
each other. To avoid this situation, we keep a software shadow of the
state of the SVBI 0 register (which is the only register we use), and
re-upload it on every new batch.
The function that updates the shadow state of SVBI 0 is called
brw_update_primitive_count, since it will also be used to update the
counters for the PRIMITIVES_GENERATED and
TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN queries.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables rasterizer discard functionality (a part of
transform feedback) in Gen6, by generating an alternate GS program
when rasterizer discard is active. Instead of forwarding vertices
down the pipeline, the alternate GS program uses a URB Write message
to deallocate the URB entry that was allocated by FF sync and
terminate the thread.
Note: parts of the Sandy Bridge PRM seem to imply that we could do
this more efficiently, by clearing the GEN6_GS_RENDERING_ENABLE bit,
and not allocating a URB entry at all. However, it's not clear how we
are supposed to terminate the thread if we do that. Volume 2 part 1,
section 4.5.4, says "GS threads must terminate by sending a URB_WRITE
message with the EOT and Complete bits set.", and my experiments so
far confirm that.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Paul Berry <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A common use case for transform feedback is to perform one draw
operation that writes transform feedback output to a buffer, followed
by a second draw operation that consumes that buffer as vertex input.
Since vertex input is consumed at an earlier pipeline stage than
writing transform feedback output, we need to flush the pipeline to
ensure that the transform feedback output is completely written before
the data is consumed.
In an ideal world, we would do some dependency tracking, so that we
would only flush the pipeline if the next draw call was about to
consume data generated by a previous draw call in the same batch.
However, since we don't have that sort of dependency tracking
infrastructure right now, we just unconditionally flush the buffer
every time glEndTransformFeedback() is called. This will cause a
performance hit compared to the ideal case (since we will sometimes
flush the pipeline unnecessarily), but fortunately the performance hit
will be confined to circumstances where transform feedback is in use.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|