| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PIPE_BARRIER_UPDATE is defined as:
PIPE_BARRIER_UPDATE_BUFFER | PIPE_BARRIER_UPDATE_TEXTURE
Which means we were flushing for any flags other than these two, but
this was intended to only flush for ssbos and images.
Actually, the driver automatically flushes jobs as we need, including
writes/reads involving SSBOs and images, so we don't really need to
flush anything when the program emits a barrier. However, this may
lead to excessive flushing in some cases, so we will soon change this
to avoid atutomatic flushing of the current job for SSBOs and images,
meaning that we will rely on the application to emit correct memory
barriers for these that we should make sure to process here.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can use the PRIMITIVE_COUNTS_FEEDBACK packet to write various primitive
counts to a buffer, including the number of primives written to transform
feedback buffers, which will handle buffer overflow correctly.
There are a couple of caveats with this:
Primitive counters are reset when we emit a 'Tile Binning Mode Configuration'
packet, which can happen in the middle of a primitives query, so we need to
read the buffer when we submit a job and accumulate the counts in the context
so we don't lose them.
We also need to do the same when we switch primitive type during transform
feedback so we can compute the correct number of recorded vertices from
the number of primitives. This is necessary so we can provide an accurate
vertex count for draw from transform feedback.
v2:
- When computing the number of vertices for a primitive, pass in the base
primitive, since that is what the hardware will count.
- No need to update primitive counts when switching primitive types if
the base primitives are the same.
- Log perf warning when mapping the primitive counts BO for readback (Eric).
- Only emit the primitive counts packet once at job end (Eric).
- Use u_upload mechanism for the primitive counts buffer (Eric).
- Use the XML to generate indices into the primitive counters buffer (Eric).
Fixes piglit tests:
spec/ext_transform_feedback/overflow-edge-cases
spec/ext_transform_feedback/query-primitives_written-bufferrange
spec/ext_transform_feedback/query-primitives_written-bufferrange-discard
spec/ext_transform_feedback/change-size base-shrink
spec/ext_transform_feedback/change-size base-grow
spec/ext_transform_feedback/change-size offset-shrink
spec/ext_transform_feedback/change-size offset-grow
spec/ext_transform_feedback/change-size range-shrink
spec/ext_transform_feedback/change-size range-grow
spec/ext_transform_feedback/intervening-read prims-written
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The glMemoryBarrier() function makes shader memory stores ordered with
respect to things specified by the given bits. Until now, st/mesa has
ignored GL_TEXTURE_UPDATE_BARRIER_BIT and GL_BUFFER_UPDATE_BARRIER_BIT,
saying that drivers should implicitly perform the needed flushing.
This seems like a pretty big assumption to make. Instead, this commit
opts to translate them to new PIPE_BARRIER bits, and adjusts existing
drivers to continue ignoring them (preserving the current behavior).
The i965 driver performs actions on these memory barriers. Shader
memory stores go through a "data cache" which is separate from the
render cache and other read caches (like the texture cache). All
memory barriers need to flush the data cache (to ensure shader memory
stores are visible), and possibly invalidate read caches (to ensure
stale data is no longer visible). The driver implicitly flushes for
most caches, but not for data cache, since ARB_shader_image_load_store
introduced MemoryBarrier() precisely to order these explicitly.
I would like to follow i965's approach in iris, flushing the data cache
on any MemoryBarrier() call, so I need st/mesa to actually call the
pipe->memory_barrier() callback.
Fixes KHR-GL45.shader_image_load_store.advanced-sync-textureUpdate
and Piglit's spec/arb_shader_image_load_store/host-mem-barrier on
the iris driver.
Roland said this looks reasonable to him.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
We don't want to pull the compiler into every include in the gallium
driver, so just make a new little header to store the limits.
|
|
|
|
|
| |
The compiler has its limits under V3D_* (like most V3D stuff), so sync up
with that.
|
|
|
|
|
| |
So far I assume that all the buffers get written. If they weren't, you'd
probably be using UBOs instead.
|
| |
|
|
|
|
|
|
|
| |
This lets the driver use pipe_debug_message() for GL_ARB_debug_output.
Signed-off-by: Rhys Kidd <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
Shaders are usually quite short, and are private to the context. We can
save memory and reduce the work the kernel needs to do at exec time by
packing them together in a stream uploader for long-lived state.
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
| |
These describe what the fields mean in RCL generation. "resolve" is left
over from VC4, and sounds like MSAA resolves (which may or may not be
involved in the store we generate).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For V3D, the HW will interpolate slightly differently along the shared
edge of the trifan. The conformance tests manage to catch this in the
nearest_consistency_* group. To get interpolation to match, we need the
last vertex of the triangle to be shared.
I first tried implementing draw_rectangle to do triangles instead, but
that was quite a bit (147 lines) of code duplication from u_blitter, and
this seems much simpler and less likely to break as u_blitter changes.
Fixes dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_* on V3D.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
This is the final step of the driver rename.
|
|
|