| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Stumbled into this when adding a new PIPE_CAP.
Signed-off-by: Andres Rodriguez <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This looks like an evergreen specific feature, but with atomic
counters AMD have hw specific counters they use instead of operating
on buffers directly. These are separate to the buffer atomics,
so require different limits and code paths.
I've left the CAP for atomic type extensible in case someone
else has a variant on this sort of thing (freedreno maybe?)
and needs to change it.
This adds all the CAPs required to add support for those atomic
counters, along with a related CAP for limiting the number of
output resources.
I'd like to land this and the st patch then I can start to
upstream the evergreen support for these and other GL4.x features.
v2: drop the ATOMIC_COUNTER_MODE cap, just use the return
from the HW counters. If 0 we use the current mode.
v3: fix some rebase errors (Gert Wollny)
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Tested-By: Gert Wollny <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes use of the DRM_IOCTL_VC4_GEM_MADVISE ioctl to mark all
BOs placed in the mesa BO cache as purgeable so that the system can
reclaim this memory under memory pressure.
v2:
- Removed BOs from the cache when they've been purged by the kernel
- Check whether the madvise ioctl is supported or not before using it
v3: Don't walk the whole list when we find a busy BO (by anholt, acked by
Boris)
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Tested-by: Timothy Arceri <[email protected]>
|
| |
|
|
|
|
|
|
|
|
| |
Some hw (evergreen) has a limit on how many combined (images/buffers/mrts)
a fragment shader can access.
Reviewed-by: Ilia Mirkin <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Mesa's DEBUG and assert's NDEBUG are not tied to each other, so we need
to explicitly compile this code out.
Fixes: 3df78928786134874eafa "vc4: Drop reloc_count tracking for debug
asserts on non-debug builds."
Cc: Eric Anholt <[email protected]>
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
This was causing Android clang version 3.8.256229 to miscompile,
presumably due to strict aliasing.
Fixes: 14dc281c1332 ("vc4: Enforce one-uniform-per-instruction after optimization.")
|
|
|
|
|
|
|
|
| |
It's redundant with nir_shader::info::stage.
Acked-by: Timothy Arceri <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
| |
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
| |
If there happened to be an ENOENT laying around, we would try using the
ioctls later and fail out resource allocation.
|
|
|
|
|
| |
It was calling down into i915 trying to label the BO, which is definitely
not the right thing.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
V3D 3.3 is a continuation of the 3D implementation in VC4 (v2.1 and v2.6).
V3D 3.3 introduces an MMU (no more CMA allocations) and support for
GLES3.1. This driver is not currently conformant, though that will be a
target as soon as possible.
V3D 3.x parts use a new texture tiling layout common across many Broadcom
graphics parts including and the HVS scanout engine. It also massively
changes the QPU instructions, introducing a common physical register file
(no more A/B split) and half-float instructions, while removing the 4x8
unorm instructions in favor of half-float for talking to fixed function
interfaces. Because so much has changed, vc5 is implemented in a separate
gallium driver, using only the XML code-generation support from vc4.
v2: Fix tile layout for 64bpp textures. Fix texture swizzling for 32-bit
returns. Fix up a bit of MRT setup. Sync the simulator to kernel
behavior a bit more. Improve uniform debugging code. Rebase on
QIR->VIR rename. Move texture state mostly to the CSOs. Improve
cache flushing on the simulator. Fix program deletion
use-after-frees.
Acked-by: Dave Airlie <[email protected]> (uabi plan)
Acked-by: Daniel Vetter <[email protected]> (uabi plan)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I've been doing this inside of vc4, but vc5 wants it as well and it may be
useful for other drivers (Intel has a related path for pre-gen6 with MRT,
and freedreno had a TGSI path for it at one point).
This required defining a common enum for the standard comparison
functions, but other lowering passes are likely to also want that enum.
v2: Add to meson.build as well.
Acked-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because vc4 can control the order that tiles are rasterized in, we can use
it to implement overlapping blits using normal drawing and
GL_ARB_texture_barrier, as long as we can tell the kernel what order to
render the tiles in.
v2: Fix on the simulator.
v3: Add the cap (disabled) to other drivers, add rst docs for the cap.
v4: Rebase on PIPE_CAP_TGSI_ANY_REG_AS_ADDRESS
v5: Split from the core gallium commit, drop some unnecessary code related
to glBlitFramebuffer(), fix a crash with clears before state has been
bound.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because vc4 can control the order that tiles are rasterized in, we can use
it to implement overlapping blits using normal drawing and
GL_ARB_texture_barrier, as long as we can tell the kernel what order to
render the tiles in.
This commit introduces the core gallium support, vc4 changes will follow.
v2: Fix on the simulator.
v3: Add the cap (disabled) to other drivers, add rst docs for the cap.
v4: Rebase on PIPE_CAP_TGSI_ANY_REG_AS_ADDRESS
v5: Drop vc4 changes from this commit, for clarity.
Reviewed-by: Nicolai Hähnle <[email protected]> (v3)
|
|
|
|
|
|
| |
Improves x11perf -copywinwin100 from ~2000/sec to ~4700/sec. More
importantly, this is a prerequisite for the new GL_MESA_tile_raster_order
extension.
|
|
|
|
|
| |
This keeps us from promoting them up to 8888, at the cost of not being
color-renderable.
|
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
Fixes: 5b102160ae ("broadcom/genxml: Introduce a V3D packet/struct decoder.")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the DRM_VC4_GET_TILING ioctl isn't present then we can't tell
if a dmabuf bo is tiled or linear, so will always assume it's
linear.
By not advertising tiled formats in this situation we ensure the
assumption is correct.
This fixes a bug where most attempts to render a gl wayland client
under weston will result in a client side abort.
Signed-off-by: Derek Foreman <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Acked-by: Daniel Stone <[email protected]> (on irc)
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I remember thinking "gosh, it would be nice if I could do a kernel-style
'if (!IS_ENABLED(DEBUG))' instead of using an #ifdef, so the code was
compiled on both builds", and then forgot to test a release build anyway.
Fixes: a8fd58eae596 ("vc4: Add labels to BOs for debug builds or with VC4_DEBUG=surf set.")
Reported-by: Derek Foreman <[email protected]>
|
|
|
|
|
|
|
| |
This has proven to be incredibly useful for debugging CMA allocation
failures and driving memory management improvements. However, we don't
want to burden entry and exit from the BO cache with the labeling ioctl's
overhead on release builds.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cleared_and_retried is always reset to false when jumping to the retry
label, thus leading to an infinite retry loop.
Fix that by moving the cleared_and_retried variable definitions at the
beginning of the function. While we're at it, move the create variable
with the other local variables and explicitly reset its content in the
retry path.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Fixes: 78087676c98aa8884ba92 "vc4: Restructure the simulator mode."
|
|
|
|
|
|
|
|
|
|
|
| |
I was overwriting view->texture with the shadow resource when we need to
do shadow copies (retiling or baselevel rebase), but that tripped up some
critical new sanity checking in state_tracker (making sure that stObj->pt
hasn't changed from view->texture through TexImage-related paths).
To avoid that, move the shadow resource to the vc4_sampler_view struct.
Fixes: f0ecd36ef8e1 ("st/mesa: add an entirely separate codepath for setting up buffer views")
|
|
|
|
|
|
|
| |
Denotes availability of 64bit int atomic instructions
Signed-off-by: Jan Vesely <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
By leaving the compiled shader in the context's stage state, the next
compile of a new FS would look in the old compiled FS for figuring out
whether to set various dirty flags for the VS compile. Clear out the
pointer when deleting the program, and make sure that we always mark the
state as dirty if the previous program had been lost. Fixes valgrind
warnings on glsl-max-varyings.
Fixes: 2350569a78c6 ("vc4: Avoid VS shader recompiles by keeping a set of FS inputs seen so far.")
|
|
|
|
|
| |
Even if we're not clearing color, the blitter has started dereferencing
the color value.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The blitter will bind just the depth buffer, which flushes the current job
if we had both a color and depth/stencil. If the clear was doing partial
depth/stencil (quad-based) and color (tile-based), we'd go on to try to
set up the rest of the tile clear in the now flushed job.
Instead, move the partial clear up before we start setting up the job for
the current FBO state, and re-fetch the job if we're continuing on to a
tile-based clear. Fixes valgrind failures in fbo-depthtex.
Fixes: 9421a6065c4e ("vc4: Fix fallback to quad clears of depth in GLX.")
|
|
|
|
|
|
|
|
| |
I was trying to continue the hash table loop, not the inner loop. This
tended to work out, because we would have *just* freed the job struct.
Fixes some valgrind failures in fbo-depthtex.
Fixes: f597ac396640 ("vc4: Implement job shuffling")
|
|
|
|
|
|
|
|
|
| |
Denotes native half precision float operations capability
v2: PIPE_CAP_HALFS -> PIPE_SHADER_CAP_FP16
fix indentation
Signed-off-by: Jan Vesely <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Changed all register and instruction names, works the same.
v2: Rebase on build system changes (by anholt)
v3: Fix build on clang (by anholt, reported by Rob)
Signed-off-by: Jonas Pfeil <[email protected]>
Tested-by: Rob Herring <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If you don't pass this, the compiler refuses to compile the assembly for
pre-v7 CPUs. This also keeps us from building identical, non-NEON code on
aarch64 and x86.
Fixes: a373f77662c5 ("vc4: Use a wrapper file to set VC4_BUILD_NEON instead of CFLAGS.")
v2: Fix Android build by just appending NEON_C_SOURCES when
ARCH_ARM_HAVE_NEON.
Tested-by: Rob Herring <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This can be used to guard support for EXT_memory_object and related
extensions.
v2: update gallium docs
v3 (Timothy Arceri):
- add cap to nv50
Signed-off-by: Andres Rodriguez <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: rename cap to PIPE_CAP_QUERY_SO_OVERFLOW and be a bit more explicit
in the documentation
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
This allows drivers more freedom in how exactly they want to lower I/O,
e.g. first lowering I/O to temporaries.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
It's a duplicate of glsl_type::count_attribute_slots.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We simply pick r4 if available (anything else would force a MOV), then
round-robin through accumulators (avoids physical regfile RAW delay
slots), then round-robin through the physical regfile.
The effect on instruction count is pretty impressive:
total instructions in shared programs: 76563 -> 74526 (-2.66%)
instructions in affected programs: 66463 -> 64426 (-3.06%)
and we could probably do better with a little heuristic of "if we're going
to choose a physical reg, and other operands of instructions using this as
a src have the same physical regfile, then use the other regfile".
|
|
|
|
|
|
| |
Without this, a BlitFramebuffer would mark the whole framebuffer as being
changed (so we emit loads/stores of all of it) rather than just the
modified subset.
|
|
|
|
|
|
|
| |
I don't know how I managed to leave this here for so long. Found when
working on a 1:1 overlapping blit extension for X11.
Cc: [email protected]
|
|
|
|
|
|
| |
This gets us automatic CL decoding to a floating-point value, and drops a
magic number from the emit code. 250x250 shader runner tests now say they
have a center of 125.0 instead of 2000.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The VC4_DEBUG_CL output goes from:
0x00000010 0x00000010: 0x06 VC4_PACKET_START_TILE_BINNING
0x00000011 0x00000011: 0x38 VC4_PACKET_PRIMITIVE_LIST_FORMAT
0x00000012 0x00000012: 0x12
0x00000013 0x00000013: 0x66 VC4_PACKET_CLIP_WINDOW
0x00000014 0x00000014: 0x00
0x00000015 0x00000015: 0x00
0x00000016 0x00000016: 0x00
0x00000017 0x00000017: 0x00
0x00000018 0x00000018: 0xfa
0x00000019 0x00000019: 0x00
0x0000001a 0x0000001a: 0xfa
0x0000001b 0x0000001b: 0x00
to:
0x00000010 0x00000010: 0x06 Start Tile Binning
0x00000011 0x00000011: 0x38 Primitive List Format
Data Type: 1 (16-bit index)
Primitive Type: 2 (Triangles List)
0x00000013 0x00000013: 0x66 Clip Window
Clip Window Height in pixels: 250
Clip Window Width in pixels: 250
Clip Window Bottom Pixel Coordinate: 0
Clip Window Left Pixel Coordinate: 0
v2: Squash in robher's fixes for Android
|
|
|
|
|
|
|
|
|
|
|
| |
The current way of importing the resource from renderonly after allocation
is opaque and is taking away control from the driver, which it needs in
order to implement more advanced scenarios than the simple linear scanout
with matching stride alignments.
Signed-off-by: Lucas Stach <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
Acked-by: Daniel Stone <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since commit 7f80a9ff1312 ("vc4: Introduce XML-based packet header
generation like Intel's."), the vc4 build on Android is broken:
out/target/product/linaro_x86_64/gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h:12:10: fatal error: 'v3d_packet_helpers.h' file not found
external/mesa3d/src/gallium/drivers/vc4/vc4_cl_dump.c:28:10: fatal error: 'vc4_packet.h' file not found
The path of the generated header needs to be fixed since we build out of
tree.
Acked-by: Eric Anholt <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
X11 and GL compositor performance on VC4 has been terrible because of our
SHARED-usage buffers all being forced to linear. This swaps SHARED &&
!LINEAR buffers over to being tiled.
This is an expected win for all GL compositors during rendering (a full
copy of each shared texture per draw call), allows X11 to be used with
decent performance without a GL compositor, and improves X11 windowed
swapbuffers performance as well. It also halves the memory usage of
shared buffers that get textured from. The only cost should be idle
systems with a scanout-only buffer that isn't flagged as LINEAR, in which
case the memory bandwidth cost of scanout goes up ~25%.
This implements the EGL_EXT_image_dma_buf_import_modifiers extension,
supporting the VC4 T_TILED modifier.
v2: Added modifier support to resource creation/import, and
advertisement (by daniels).
v3: Fix old-kernel fallback path, fix compiler error and warnings, and
comment touchups (by anholt).
Reviewed-by: Daniel Stone <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Rather than open-coding populating the first slice inside resource
import, use vc4_setup_slices to do it for us.
v2: Rebase on VC4_DEBUG=surf change
Reviewed-by: Daniel Stone <[email protected]>
|