| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
This requires us to start using the partial clear state. It makes
things quite a bit more complicated but it's still a fairly
straightforward exercise in diagram following.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
| |
Now that we have this field, it's much easier to switch on it than to
walk an if ladder that checks different things.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
| |
We also simplify the way we handle stencil since we know a priori that
it will have ISL_AUX_USAGE_NONE.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
The only real change here is that we now reject clear colors for MCS
with certain formats on gen < 9 because we can't trust that the
reinterpretation will work. This may cause some MCS partial resolves.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Our attempts to do it automatically are problematic at best. In order
to really be precise, we need to know both the desired aux usage and
whether or not clear is supported. The current automatic mechanism
doesn't cover this. This commit itself is not a functional change since
it just reworks everything to be in terms of a silly helper. Later
commits will switch things over to more sensible ways of choosing usage.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
We keep the old and possibly broken method of determining aux usage
intact for now. Therefore, the only functional change here is that we
may call finish_render a bit more accurately.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
| |
Multisample surfaces only have a single miplevel so there's no reason to
be passing the extra parameters around. It only leads to confusion.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This commit changes layer_range_length to return locical layers and also
changes the way we allocate the aux_state field to not allocate extra
layers for MCS. This will be important as we're about to start doing
significantly more detailed tracking of MCS state.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
intel_miptree_supports_ccs_e should handle the gen >= 9 requirement and
there's no reason why we can't do CCS_E on window system buffers so long
as we resolve.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
| |
Nothing created through intel_miptree_create_for_renderbuffer will ever
be exposed externally so there's no need to set FOR_SCANOUT.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
It turns out that if you have rendering in-flight with CCS_E enabled and
you go to do a depth resolve without flushing, the CCS data may never
hit the memory.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
| |
This fixes the Piglit ARB_texture_views rendering-formats test.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Image layouts only let us know that an image *may* be fast-cleared. For
this reason we can end up with redundant resolves. Testing has shown
that such resolves can measurably hurt performance and that predicating
them can avoid the penalty.
v2:
- Introduce additional resolve state management function (Jason Ekstrand).
- Enable easy retrieval of fast clear state fields.
v3: Use more descriptive field enums (Jason)
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
With an earlier patch from this series, resolves are additionally
performed on layout transitions. Remove the now unnecessary implicit
resolves within render passes.
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
v2: Expound on comment for the pipe controls (Jason Ekstrand).
v3:
- Cast base_layer to uint64_t to avoid overflow.
- Remove "seems" from the pipe control comment.
- Fix clamp of layer_count (Jason Ekstrand).
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Use the performance warning infrastructure to provide helpful
information when testing applications.
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
For readability, bring the assignment of CCS closer to the assignment of
NONE and MCS.
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The lifespan of the fast-clear data will surpass the render pass scope.
We need CCS_D to be enabled in order to invalidate blocks previously
marked as cleared and to sample cleared data correctly.
v2: Avoid refactoring.
v3: Allow CCS_D for subpass resolves.
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The next patch enables the use of CCS_D even when the color attachment
will not be fast-cleared. Catch the gen7 case early to simplify the
changes required.
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
v2: Rewrite functions, change location of synchronization.
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We'll be performing a GPU memcpy in more places to copy small amounts of
data. Add an alternate function that thrashes less state.
v2:
- Make a new function (Jason Ekstrand).
- Move the #define into the function.
v3:
- Update the function name (Jason).
- Update comments.
v4: Use an indirect drawing register as TEMP_REG (Jason Ekstrand).
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: Remove ::first_subpass_layout assertion (Jason Ekstrand).
v3: Allow some fast clears in the GENERAL layout.
v4: Remove extra '||' and adjust line break (Jason Ekstrand).
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: Don't pass in the command buffer (Jason Ekstrand).
v3: Remove an incorrect assertion and an if condition for gen7.
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2: Rewrite functions.
v3 (Jason Ekstrand):
- Don't set ResourceMinLOD.
- Fix clamp of level_count.
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
v2: Update comments, function signatures, and add assertions.
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
This will be used to load and store clear values from surface state
objects.
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: Define MCS buffers with any sample count (Jason)
Cc: <[email protected]>
Suggested-by: Jason Ekstrand <[email protected]>
Signed-off-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
It may technically be possible to enable some sort of fast-clear support
for at least the base slice of a 2D array texture on gen7. However,
it's not documented to work, we've never tried to do it in GL, and we
have no idea what the hardware does if you turn on CCS_D with arrayed
rendering. Let's just play it safe and disallow it for now. If someone
really cares that much about gen7 performance, they can come along and
try to get it working later.
|
|
|
|
|
|
| |
(Patch written by Ken, but entirely comments written by Chris.)
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The non-LLC story was a horror show. We uploaded data via pwrite
(drm_intel_bo_subdata), which would stall if the cache BO was in
use (being read) by the GPU. Obviously, we wanted to avoid that.
So, we tried to detect whether the buffer was busy, and if so, we'd
allocate a new BO, map the old one read-only (hopefully not stalling),
copy all shaders compiled since the dawn of time to the new buffer,
upload our new one, toss the old BO, and let the state upload code
know that our program cache BO changed. This was a lot of extra data
copying, and flagging BRW_NEW_PROGRAM_CACHE would also cause a new
STATE_BASE_ADDRESS to be emitted, stalling the entire pipeline.
Not only that, but our rudimentary busy tracking consistented of a flag
set at execbuf time, and not cleared until we threw out the program
cache BO. So, the first shader upload after any drawing would hit this
"abandon the cache and start over" copying path.
This is largely unnecessary - it's just ancient and crufty code. We can
use the same persistent mapping paths on all platforms. On non-ancient
kernels, this will use a write combining map, which should be reasonably
fast.
One aspect that is worse: we do occasionally grow the program cache BO,
and copy the old contents to the newer BO. This will suffer from UC
readback performance now. To mitigate this, we use the MOVNTDQA based
streaming memcpy on platforms with SSE 4.1 (all Gen7+ atoms). Gen4-5
are unfortunately going to be penalized.
v2: Add MOVNTDQA path, rebase on other map flag changes.
v3: Drop cache->bo_used_by_gpu too (caught by Chris Wilson).
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Chris Wilson pointed out that this mapping really is persistant.
Shouldn't actually have any effect today, but best to set it anyway.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
Using a read-only mapping is completely bogus - we use this mapping to
write all new shaders to the cache.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Write-combine mappings give much better performance on writes than
uncached access through the GTT.
Improves performance of GFXBench 4's gl_driver2 benchmark at 1024x768
on Apollolake by 3.6086% +/- 0.674193% (n=15).
v2: (by Ken) Rebase on lockless mappings, map_count deletion, valgrind
updates, potential for CPU/WC maps failing, and other changes.
v3: (by Ken and Chris Wilson)
(Ken): Rebase on set_domain -> gem_wait
(Chris): Fix up a failed CPU/WC mmaping with a GTT mapping
Not all objects will be mappable for direct access by the CPU
(either using WC/CPU or WC paths), for example, a dmabuf wrapping an
object on a foreign device or an object wrapping access to stolen
memory. Since either the physical pages are not known or even do not
exist, we need to use the mediated, indirect access via the GTT. (If
one day, the kernel does suddenly start providing mediated access
via a regular WB/WC mmapping, we no longer need the fallback.)
v4: Avoid falling back for MAP_RAW (Chris).
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
If the buffer is idle, we I915_GEM_WAIT will return immediately,
so we may as well skip the ioctl altogether. We can't trust the
"idle" flag for external buffers, but for most, it should be fine.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the advent of asynchronous maps, domain tracking doesn't make a
whole lot of sense. Buffers can be in use on both the CPU and GPU at
the same time. In order to avoid blocking, we stopped using set_domain
for asynchronous mappings, which means that the kernel's tracking has
lies. We can't properly track it in userspace either, as the kernel
can change domains on us spontaneously (for example, when un-swapping).
According to Chris Wilson, I915_GEM_SET_DOMAIN does the following:
1. pins the backing storage (acquiring pages outside of the
struct_mutex)
2. waits either for read/write access, including inter-device waits
3. updates the domain, clflushing as required
4. marks the object as used (for swapping)
5. turns off FBC/PSR/fancy scanout caching
Item (1) is not terribly important. Most BOs are recycled via the
BO cache, so they already have pages. Regardless, we fixed this
via an initial set_domain in the previous patch.
We implement item (2) with I915_GEM_WAIT. This has one downside:
we'll stall unnecessarily if we do a read-only mapping of a buffer
that the GPU is reading. I believe this is pretty uncommon. We
may want to extend the wait ioctl at some point.
Mesa already does item (3) itself. For cache-coherent buffers (most on
LLC systems), we don't need to do any clflushing - the CPU and GPU views
are coherent. For non-coherent buffers (most on non-LLC systems), we
currently only use the CPU for read-only maps, and we explicitly clflush
when necessary.
We don't care about item (4)...swapping has already killed performance.
Plus, with async maps, the kernel's domain tracking is already bogus,
so it can't do this accurately regardless.
Item (5) should be okay because we avoid cached maps of scanout buffers.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Suggested by Chris Wilson.
v2: Set the write domain to 0 (suggested by Chris).
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
From the ARB_uniform_buffer_object spec:
""shared" uniform blocks, the default layout, ..."
This doesn't fix anything as the default layout is already applied
at this point but fixes the misleading code/comment.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
| |
This was added in 2d03f48a65a666 and seems like it was intended
as a TODO comment in a function stub rather than a useful
code comment.
Reviewed-by: Samuel Pitoiset <[email protected]>
|