| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Changed all register and instruction names, works the same.
v2: Rebase on build system changes (by anholt)
v3: Fix build on clang (by anholt, reported by Rob)
Signed-off-by: Jonas Pfeil <[email protected]>
Tested-by: Rob Herring <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If you don't pass this, the compiler refuses to compile the assembly for
pre-v7 CPUs. This also keeps us from building identical, non-NEON code on
aarch64 and x86.
Fixes: a373f77662c5 ("vc4: Use a wrapper file to set VC4_BUILD_NEON instead of CFLAGS.")
v2: Fix Android build by just appending NEON_C_SOURCES when
ARCH_ARM_HAVE_NEON.
Tested-by: Rob Herring <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This can be used to guard support for EXT_memory_object and related
extensions.
v2: update gallium docs
v3 (Timothy Arceri):
- add cap to nv50
Signed-off-by: Andres Rodriguez <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: rename cap to PIPE_CAP_QUERY_SO_OVERFLOW and be a bit more explicit
in the documentation
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
This allows drivers more freedom in how exactly they want to lower I/O,
e.g. first lowering I/O to temporaries.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
It's a duplicate of glsl_type::count_attribute_slots.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We simply pick r4 if available (anything else would force a MOV), then
round-robin through accumulators (avoids physical regfile RAW delay
slots), then round-robin through the physical regfile.
The effect on instruction count is pretty impressive:
total instructions in shared programs: 76563 -> 74526 (-2.66%)
instructions in affected programs: 66463 -> 64426 (-3.06%)
and we could probably do better with a little heuristic of "if we're going
to choose a physical reg, and other operands of instructions using this as
a src have the same physical regfile, then use the other regfile".
|
|
|
|
|
|
| |
Without this, a BlitFramebuffer would mark the whole framebuffer as being
changed (so we emit loads/stores of all of it) rather than just the
modified subset.
|
|
|
|
|
|
|
| |
I don't know how I managed to leave this here for so long. Found when
working on a 1:1 overlapping blit extension for X11.
Cc: [email protected]
|
|
|
|
|
|
| |
This gets us automatic CL decoding to a floating-point value, and drops a
magic number from the emit code. 250x250 shader runner tests now say they
have a center of 125.0 instead of 2000.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The VC4_DEBUG_CL output goes from:
0x00000010 0x00000010: 0x06 VC4_PACKET_START_TILE_BINNING
0x00000011 0x00000011: 0x38 VC4_PACKET_PRIMITIVE_LIST_FORMAT
0x00000012 0x00000012: 0x12
0x00000013 0x00000013: 0x66 VC4_PACKET_CLIP_WINDOW
0x00000014 0x00000014: 0x00
0x00000015 0x00000015: 0x00
0x00000016 0x00000016: 0x00
0x00000017 0x00000017: 0x00
0x00000018 0x00000018: 0xfa
0x00000019 0x00000019: 0x00
0x0000001a 0x0000001a: 0xfa
0x0000001b 0x0000001b: 0x00
to:
0x00000010 0x00000010: 0x06 Start Tile Binning
0x00000011 0x00000011: 0x38 Primitive List Format
Data Type: 1 (16-bit index)
Primitive Type: 2 (Triangles List)
0x00000013 0x00000013: 0x66 Clip Window
Clip Window Height in pixels: 250
Clip Window Width in pixels: 250
Clip Window Bottom Pixel Coordinate: 0
Clip Window Left Pixel Coordinate: 0
v2: Squash in robher's fixes for Android
|
|
|
|
|
|
|
|
|
|
|
| |
The current way of importing the resource from renderonly after allocation
is opaque and is taking away control from the driver, which it needs in
order to implement more advanced scenarios than the simple linear scanout
with matching stride alignments.
Signed-off-by: Lucas Stach <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
Acked-by: Daniel Stone <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since commit 7f80a9ff1312 ("vc4: Introduce XML-based packet header
generation like Intel's."), the vc4 build on Android is broken:
out/target/product/linaro_x86_64/gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h:12:10: fatal error: 'v3d_packet_helpers.h' file not found
external/mesa3d/src/gallium/drivers/vc4/vc4_cl_dump.c:28:10: fatal error: 'vc4_packet.h' file not found
The path of the generated header needs to be fixed since we build out of
tree.
Acked-by: Eric Anholt <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
X11 and GL compositor performance on VC4 has been terrible because of our
SHARED-usage buffers all being forced to linear. This swaps SHARED &&
!LINEAR buffers over to being tiled.
This is an expected win for all GL compositors during rendering (a full
copy of each shared texture per draw call), allows X11 to be used with
decent performance without a GL compositor, and improves X11 windowed
swapbuffers performance as well. It also halves the memory usage of
shared buffers that get textured from. The only cost should be idle
systems with a scanout-only buffer that isn't flagged as LINEAR, in which
case the memory bandwidth cost of scanout goes up ~25%.
This implements the EGL_EXT_image_dma_buf_import_modifiers extension,
supporting the VC4 T_TILED modifier.
v2: Added modifier support to resource creation/import, and
advertisement (by daniels).
v3: Fix old-kernel fallback path, fix compiler error and warnings, and
comment touchups (by anholt).
Reviewed-by: Daniel Stone <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Rather than open-coding populating the first slice inside resource
import, use vc4_setup_slices to do it for us.
v2: Rebase on VC4_DEBUG=surf change
Reviewed-by: Daniel Stone <[email protected]>
|
|
|
|
|
|
| |
I kept flipping the bool on for debug, so let's just make it available.
Reviewed-by: Daniel Stone <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Needing to get our uapi header from libdrm has only complicated things.
Follow intel's lead and drop our requirement for it.
Generated from the same commit mentioned in the README.
v2: Update Android.mk as well, move vc4_drm.h reference for distcheck.
Reviewed-by: Daniel Stone <[email protected]>
|
|
|
|
|
| |
The kernel hasn't been synchronous in a couple of years, plus there was
synchronization code right there.
|
|
|
|
|
|
|
|
|
| |
Ensure vc4_cl_dump.h and $(BROADCOM_FILES) are distributed in the
dist-file.
This fixes `make distcheck`
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
| |
For now this is a no-op on the output, but it makes it clear that we've
had weird things going on with things like
V3D21_CLIPPER_Z_SCALE_AND_OFFSET.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This gets our vc4_emit.c size back down a bit:
before:
1020 0 0 1020 3fc src/gallium/drivers/vc4/.libs/vc4_emit.o
after:
968 0 0 968 3c8 src/gallium/drivers/vc4/.libs/vc4_emit.o
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Take the CL pointer in, which will be useful for enabling relocs.
However, our code expands a bit more:
before:
4449 0 0 4449 1161 src/gallium/drivers/vc4/.libs/vc4_draw.o
988 0 0 988 3dc src/gallium/drivers/vc4/.libs/vc4_emit.o
after:
4481 0 0 4481 1181 src/gallium/drivers/vc4/.libs/vc4_draw.o
1020 0 0 1020 3fc src/gallium/drivers/vc4/.libs/vc4_emit.o
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This slightly inflates the size of the generated code, in exchange for
getting us some convenient tools.
before:
4389 0 0 4389 1125 src/gallium/drivers/vc4/.libs/vc4_draw.o
808 0 0 808 328 src/gallium/drivers/vc4/.libs/vc4_emit.o
after:
4449 0 0 4449 1161 src/gallium/drivers/vc4/.libs/vc4_draw.o
988 0 0 988 3dc src/gallium/drivers/vc4/.libs/vc4_emit.o
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I really liked this idea, as it should help with management of packet
parsing tools like the CL dump. The python script is forked off of theirs
because our packets are byte-based instead of dwords, and the changes to
do so while avoiding performance regressions due to unaligned accesses
were quite invasive.
v2: Fix Android.mk paths, drop shebang for python script, fix overlap
detection.
Acked-by: Jason Ekstrand <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
Tested-by: Rob Herring <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Instead of having special driver loading logic for Android, create
symlinks to gallium_dri.so so we can use the standard loading logic.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
|
|
|
|
|
| |
These variables are all used in an assert(), so release builds see no
usages.
|
|
|
|
|
|
|
|
| |
There's no reason we can't -- the mappings we expose are basically
equivalent to persistent/coherent, already.
Improves mesa-demos drawoverhead (no state change) performance by
5.21362% +/- 1.25078% (n=11).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This follows the model of imx (display) and etnaviv (render): pl111 is a
display-only device, so when asked to do GL for it, we see if we have a
vc4 renderer, make the vc4 screen, and have vc4 call back to pl111 to do
scanout allocations.
The difference from etnaviv is that we share the same BO between vc4 and
pl111, rather than having a vc4 bo and a pl11 bo and copies between the
two. The only mismatch between their requirements is that vc4 requires
4-pixel (at 32bpp) stride alignment, while pl111 requires that stride
match width. The kernel will reject any modesets to an incorrect stride,
so the 3D driver doesn't need to worry about that.
v2: Rebase on Android rework, drop unused include.
v3: Fix another Android bug, from Rob Herring's build-testing.
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Whether bindless texture operations are supported by the
underlying driver.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Coverity caught the use of dead code copy-paste for
found_colors[] and num_found_colors.
CID: 1341850
Signed-off-by: Rhys Kidd <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
If X11 did a software fallback to the entire screen, we would throw out
the BO the screen is scanning out from and allocate a new one.
Cc: [email protected]
|
|
|
|
|
|
| |
I've since found them to be more confusing by adding indirections than
clarifying by screening off resources from the handle/fd import/export
process.
|
|
|
|
|
| |
We only ever attached one vtbl, so it was a waste of space and
indirections.
|
|
|
|
|
|
| |
for skipping mapped-buffer checking in every GL draw call
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
src/gallium/targets/dri/Android.mk contains lots of conditional for
individual drivers. Let's move these details into the individual driver
makefiles.
In the process, align the make driver conditionals with automake
(i.e. HAVE_GALLIUM_*).
Signed-off-by: Rob Herring <[email protected]>
[Emil Velikov: add the radeon winsys for radeonsi]
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
The next patch will use it. This is really for svga and GL2-level drivers.
Tested-by: Edmondo Tommasina <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pipe_draw_info::indexed is replaced with index_size. index_size == 0 means
non-indexed.
Instead of pipe_index_buffer::offset, pipe_draw_info::start is used.
For indexed indirect draws, pipe_draw_info::start is added to the indirect
start. This is the only case when "start" affects indirect draws.
pipe_draw_info::index is a union. Use either index::resource or
index::user depending on the value of pipe_draw_info::has_user_indices.
v2: fixes for nine, svga
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit e1af20f18a86f52a9640faf2d4ff8a71b0a4fa9b changed the shader_info
from being embedded into being just a pointer. The idea was that
sharing the shader_info between NIR and GLSL would be easier if it were
a pointer pointing to the same shader_info struct. This, however, has
caused a few problems:
1) There are many things which generate NIR without GLSL. This means
we have to support both NIR shaders which come from GLSL and ones
that don't and need to have an info elsewhere.
2) The solution to (1) raises all sorts of ownership issues which have
to be resolved with ralloc_parent checks.
3) Ever since 00620782c92100d77c660f9783504c6d80fa1d58, we've been
using nir_gather_info to fill out the final shader_info. Thanks to
cloning and the above ownership issues, the nir_shader::info may not
point back to the gl_shader anymore and so we have to do a copy of
the shader_info from NIR back to GLSL anyway.
All of these issues go away if we just embed the shader_info in the
nir_shader. There's a little downside of having to copy it back after
calling nir_gather_info but, as explained above, we have to do that
anyway.
Acked-by: Timothy Arceri <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This will allow Raspbian's ARMv6 builds to take advantage of the new NEON
code, and could prevent problems if vc4 ends up getting used on a v7 CPU
without NEON.
v2: Drop dead NEON_SUFFIX (noted by Erik Faye-Lund)
|
|
|
|
|
|
|
|
|
| |
Android.mk was setting the flag across the entire driver, so we didn't
have non-NEON versions getting built. This was going to be a problem with
the next commit, when I start auto-detecting NEON support and use the
non-NEON version when appropriate.
Reviewed-by: Rob Herring <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
NEON is sufficiently different on arm64 that we can't just reuse this
code. Disable it on arm64 for now.
v2: Use PIPE_ARCH_ARM instead, as __ARM_ARCH may be 8 for a 32-bit build
for a v8 CPU.
Signed-off-by: Eric Anholt <[email protected]>
Cc: <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Most drivers don't need it and shouldn't need it because it can't be used
in some cases (indirect draws, primitive restart, count from streamout).
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
This version of the chip is present on the Cygnus-based 911360 enterprise
phone platform. It appears to be completely backwards compatible.
|