| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rename the (un)map_gtt functions to (un)map_map (map by
returning a map) and add new functions (un)map_tiled_memcpy that
return a shadow buffer populated with the intel_tiled_memcpy
functions.
Tiling/detiling with the cpu will be the only way to handle Yf/Ys
tiling, when support is added for those formats.
v2: Compute extents properly in the x|y-rounded-down case (Chris Wilson)
v3: Add units to parameter names of tile_extents (Nanley Chery)
Use _mesa_align_malloc for the shadow copy (Nanley)
Continue using gtt maps on gen4 (Nanley)
v4: Use streaming_load_memcpy when detiling
v5: (edited by Ken) Move map_tiled_memcpy above map_movntdqa, so it
takes precedence. Add intel_miptree_access_raw, needed after
rebasing on commit b499b85b0f2cc0c82b7c9af91502c2814fdc8e67.
Reviewed-by: Chris Wilson <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
We stream from a tiled and aligned source into an unaligned user buffer,
so we need to use _mm_storeu_si128.
Fixes: d21c086d819d78fb3f6abcbb14aa492970f442aa (i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For certain EGLImage cases, we represent a single slice or LOD of an
image with a byte offset to a tile and X/Y intratile offsets to the
given slice. Most of i965 is fine with this but it breaks blorp. This
is a terrible way to represent slices of a surface in EGL and we should
stop some day but that's a very scary and thorny path. This gets blorp
to start working with those surfaces and fixes some dEQP EGL test bugs.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106629
Cc: [email protected]
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The reference for MOVNTDQA says:
For WC memory type, the nontemporal hint may be implemented by
loading a temporary internal buffer with the equivalent of an
aligned cache line without filling this data to the cache.
[...] Subsequent MOVNTDQA reads to unread portions of the WC
cache line will receive data from the temporary internal
buffer if data is available.
This hidden cache line sized temporary buffer can improve the
read performance from wc maps.
v2: Add mfence at start of tiled_to_linear for streaming loads (Chris)
Reviewed-by: Chris Wilson <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of directly using intel_obj->buffer. Among other things
intel_bufferobj_buffer() will update intel_buffer_object::
gpu_active_start/end, which are used by glBufferSubData() to decide
which path to take. Fixes a failure in the Piglit
ARB_shader_image_load_store-host-mem-barrier Buffer Update/WaW tests,
which could be reproduced with a non-standard glGetTexSubImage
implementation (see bug report).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105351
Reported-by: Nanley Chery <[email protected]>
Cc: [email protected]
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Otherwise the specified surface state will allow the GPU to access
memory up to BufferOffset bytes past the end of the buffer. Found by
inspection.
v2: Protect against out-of-range BufferOffset (Nanley).
Cc: [email protected]
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The buffer texture size calculations (should be easy enough, right?)
are repeated in three different places, each of them subtly broken in
a different way. E.g. the image load/store path was never fixed to
clamp to MaxTextureBufferSize, and none of them are taking into
account the buffer offset correctly. It's easier to fix it all in one
place.
Cc: [email protected]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106481
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This patch adds {X,A}BGR2101010 entries to the list of supported
'intel_image_formats'.
Bug: https://crbug.com/776093
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Add R10G10B10{A,X}2 translation between mesa_format and DRI format
to driGLFormatToImageFormat() and driImageFormatToGLFormat().
Bug: https://crbug.com/776093
Reviewed-by: Chad Versace <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The only function that doesn't need to call access_raw is map_blit. If
it takes the blitter path, it will happen as part of intel_miptree_copy.
If map_blit takes the blorp path, brw_blorp_copy_miptrees will handle
doing whatever resolves are needed. This should save us resolves in
quite a few cases and will probably help performance a bit.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
We still support the blitter on gen4-5 but it's on the same ring as 3D.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
It's faster than the blitter and can handle things like stencil properly
so it doesn't require software fallbacks.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
The blorp path (called first) can do anything the blitter path can do so
it's just dead code.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
On gen4-5, we try the blitter before we even try blorp. On newer
platforms, blorp can do everything the blitter can so there's no point
in even having the blitter fall-back path.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This function is no longer used.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using meta for anything is fairly aweful and definitely has more CPU
overhead. However, it also uses the 3D pipe and is therefore likely
faster in terms of GPU time than the blitter. Also, the blitter code
has so many early returns that it's probably not buying us that much.
We may as well just use meta all the time instead of working over-time
to find the tiny case where we can use the blitter. We keep gen4-5
using the old blit paths to avoid perturbing old hardware too much.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We'd like to start using soft-pin to assign BO addresses up front, and
never move them again. Our previous plan for dealing with 48-bit VF
cache bugs was to relocate vertex buffers to the low 4GB, so we'd never
have addresses that alias in the low 32 bits. But that requires moving
buffers dynamically.
This patch tracks the last seen BO address for each vertex/index buffer,
and emits a VF cache invalidate if the high bits change. (Ideally, we
won't hit this case very often.) This should work for the soft-pin
case, but unfortunately won't work in the relocation case, as we don't
actually know the addresses. So, we have to use both methods.
v2: Mention that the cache uses a <VertexBufferIndex, Address> tuple
more explicitly (suggested by Scott). Mention "single batch" too
(suggested by Chris).
Reviewed-by: Scott D Phillips <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We're planning to start managing the PPGTT in userspace in the near
future, rather than relying on the kernel to assign addresses. While
most buffers can go anywhere, some need to be restricted to within 4GB
of a base address.
This commit adds a "memory zone" parameter to the BO allocation
functions, which lets the caller specify which base address the BO will
be associated with, or BRW_MEMZONE_OTHER for the full 48-bit VMA.
Eventually, I hope to create a 4GB memory zone corresponding to each
state base address.
Reviewed-by: Scott D Phillips <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
This is useful for every user of ISL. Drop the comment along the way to
match similar functions in ISL.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
intel_miptree_supports_{ccs,mcs,hiz} ensures the format is valid for the
color or depth miptree before the miptree is assigned an aux_usage.
alloc_aux switches on the aux_usage so don't assert that the format is
valid.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Synchronize the requirements listed in isl_surf_get_ccs_surf with
intel_miptree_supports_ccs by importing a restriction from ISL. Some
implications:
* We successfully create every aux_surf in alloc_aux
* We only return false from alloc_aux if we run out of memory
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
| |
The flush_vertices argument is now unused, remove it.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
| |
With the previous patches, we now update the indirect clear color buffer
every time the clear color changes. Avoid redundant updates.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
If the aux state is CLEAR and clear color value has changed, only the
surface state must be updated. The bit-pattern in the aux buffer is
exactly the same.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This comment made more sense when it was above the calls to
intel_miptree_slice_set_needs_depth_resolve(). We stopped using these
functions at commit 554f7d6d02931ea45653c8872565d21c1678a6da
("i965: Move depth to the new resolve functions").
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For depth buffers, we avoid fast-clearing if the aux_state is already
CLEAR. We do the same for color buffers only if the clear color
doesn't change. We require that the clear colors match because, in
that case, we don't update the indirect clear color outside of BLORP.
Update the indirect clear color for color buffers as well. We'll
enable the same depth buffer optimization for color buffers in a
later patch.
Note that we're now actually updating the indirect clear color twice
in the case where we use BLORP to perform the fast-clear. This is
only temporary. In later patches, we'll prevent BLORP from performing
the update.
v2: Add more context to the commit message (Topi).
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Reduce complexity and allow the next patch to delete some code. With
this change, clear operations will still be skipped and setting the
aux_state will cause no side-effects.
Remove the associated comment which implies an early return.
Reviewed-by: Rafael Antognolli <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Reduce code duplication now and prevent it in the following commits.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 1d94aa19877fb702ffacacde28ad7253cce72c97.
The next patch will make depth miptrees use the clear color setter that
was originally being used for color miptrees. Go back to using the
isl_color_value parameter because it's the same type as the
fast_clear_color field used by color and depth miptrees.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
There isn't much that changes between the aux allocation functions.
Remove the duplicated code.
v2: Inline the switch statement (Jason).
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We're going to delete intel_miptree_alloc_ccs() in the next commit. With
that in mind, replace the use of this function in
do_single_blorp_clear() with intel_miptree_alloc_aux() and move the
delayed allocation logic to it's callers.
v2: Duplicate the delayed allocation comment (Topi Pohjolainen).
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Drop an unused parameter.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
We have enough information to determine the optimal flags internally.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
A name of "aux-miptree" should be sufficient.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The indirect clear color isn't correctly tracked in
intel_miptree::fast_clear_color. The initial value of ::fast_clear_color
is zero, while that of the indirect clear color is undefined.
Topi Pohjolainen discovered this issue with MCS buffers. This issue is
apparent when fast-clearing an MCS buffer for the first time with
glClearColor = {0.0,}. Although the indirect clear color is undefined,
the initial aux state of the MCS is CLEAR and the tracked clear color is
zero, so we avoid updating the indirect clear color with {0.0,}.
Make the indirect clear color match the initial value of
::fast_clear_color.
Note: although we only have to drop HiZ's BO_ALLOC_BUSY flag for gen10+,
we also drop it pre-gen10 to keep things simple. We add this flag back
for pre-gen10 in a later patch.
v2: Add a note about dropping HiZ's BO_ALLOC_BUSY flag (Topi).
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Add infrastructure for initializing the clear color BO.
intel_miptree_init_mcs is no longer needed with change.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this patch, the aux_state was actually AUX_INVALID because the BO
was never defined. This was fine on single slice miptrees because we
would fast-clear the resource right after creation. For multi-slice
miptrees on SKL+ however, this results in undefined behavior when
accessing a non-base slice. Here's a specific example:
1) Fast clear level 0
* Undefined CCS_D buffer allocated in "PASS_THROUGH" state.
* Level 0 transitions to the CLEAR state.
2) Render to level 1
* Level 1 may have a 2-bit pattern of 2's.
* Rendering with a 2 in the CCS is undefined.
Cc: <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this patch, if we failed to initialize an MCS buffer, we'd
end up in a state in which the miptree thinks it has an MCS buffer,
but doesn't. We also leaked the clear_color_bo if it existed.
With this patch, we now free the miptree aux buffer resources and let
intel_miptree_alloc_mcs() know that the MCS buffer no longer exists.
Cc: <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
There is a compile warning from Android 8 (API version 26) from "include cutils/log.h"
warning: "Deprecated: don't include cutils/log.h, use either android/log.h or log/log.h"-W#warnings,
Change to include "log/log.h" on Android 8 or later major version to avoid this warning
Signed-off-by: jenny.q.cao <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The only remaining users of gl_vertex_array are tnl based
drivers. So move everything related to that into tnl and
rename it accordingly.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
|
| |
Only tnl based drivers still use this array. So remove it
from core mesa and use Array._DrawVAO instead.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
| |
Was meant to be temporary in i965.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
For now store binding and attrib in brw_vertex_element.
The i965 driver still provides lots of opportunity to make use
of the unique binding information in the VAO which is currently not
taken from the VAO.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From the bspec docs for "Indirect State Pointers Disable":
"At the completion of the post-sync operation associated with this
pipe control packet, the indirect state pointers in the hardware are
considered invalid"
So the ISP disable is a post-sync type of operation which means that it
should be combined with a CS stall. Without this, the simulator throws
an error.
Fixes: 766d801ca "anv: emit pixel scoreboard stall before ISP disable"
Fixes: f536097f6 "i965: require pixel scoreboard stall prior to ISP disable"
Reviewed-by: Lionel Landwerlin <[email protected]>
|