| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
v2: fix enabling primitive binning
Reviewed-by: Samuel Pitoiset <[email protected]>
|
| |
|
| |
|
|
|
|
| |
Only radeonsi uses them, so adjust them to match its needs.
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This was required back when MSVC didn't support C99 and was missing this
header, but since MSVC 2013 (or maybe earlier?) this isn't it does and
this code isn't doing anything anymore.
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
| |
Fixes: bb84fa146f22 ("util: use C99 declaration in the for-loop hash_table_foreach() macro")
|
|
|
|
|
|
| |
We were doing this late after nir_lower_io, but we can just reuse the core
code. By doing it at this stage, we won't even set up the VS attributes
as inputs, reducing our VPM size.
|
|
|
|
|
| |
This lets us trim unused trailing components in the vertex attributes,
reducing the size of our VPM allocations.
|
|
|
|
|
|
| |
Signed-off-by: Michał Janiszewski <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
Fixes: a537231b226280bc1e5b7 "meson: build svga driver on linux"
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the 'inorder' case (ie. FD_MESA_DEBUG=inorder, or old kernel), if the
u_blitter clear path is used (a3xx, a4xx, and some fallback cases on
newer gens), util_blitter_restore_fb_state() will set_framebuffer_state()
to something that is identical to the current fb state, which triggers
an unnecessary flush, and then eventually an assert:
(gdb) bt
#0 0x0000007fbf24a078 in kill () from /lib64/libc.so.6
#1 0x0000007fbe061278 in _debug_assert_fail (expr=0x7fbe93a820 "!batch->flushed", file=0x7fbe93a628 "../src/gallium/drivers/freedreno/freedreno_batch.c", line=491, function=0x7fbe93a990 <__func__.17380> "fd_batch_check_size") at ../src/gallium/auxiliary/util/u_debug.c:322
#2 0x0000007fbe1ccb8c in fd_batch_check_size (batch=0x55556d5a70) at ../src/gallium/drivers/freedreno/freedreno_batch.c:491
#3 0x0000007fbe1d0e08 in fd_clear (pctx=0x55555c61e0, buffers=5, color=0x55556e388c, depth=1, stencil=0) at ../src/gallium/drivers/freedreno/freedreno_draw.c:463
#4 0x0000007fbe57afa4 in st_Clear (ctx=0x55556e17b0, mask=18) at ../src/mesa/state_tracker/st_cb_clear.c:452
The assert was introduced in 4b847b38ae3, so from a functionality
standpoint this patch fixes that commit. But it should also avoid an
unnecessary flush in the 'inorder' case, fixing a performance bug.
Fixes: 4b847b38ae3 freedreno: make fd_batch a one-shot thing
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
ZSA state can change whether depth or stencil is enabled
This plus previous patch fix stk, and various things w/
FD_MESA_DEBUG=inorder
Fixes: ec717fc629 freedreno: reduce resource dependency tracking overhead
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The problem isn't directly with ec717fc629 but rather that commit
exposes the problem. When we switch batch we cannot assume previous
state is clean so we should mark all state dirty.
Fixes: ec717fc629 freedreno: reduce resource dependency tracking overhead
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Now that it is just called once per draw (instead of once for binning
and once for draw), let's just inline it. If nothing else, it makes
perf-annotate easier to look at.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Historically this wasn't in fdN_emit_state(), because prior to addition
of blitter in a5xx, fdN_emit_state() was also used in the clear path.
These days that is only true for a2xx (a3xx and a4xx use u_blitter). So
the reason for it not to be in fd6_emit_state() no longer exists.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Noticed that with webgl (in chromium, at least) we end up generating a
lot of no-op submits just to get a fence. Tracking the last fence and
returning that if there is no rendering since last flush avoids this.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
| |
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
| |
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
| |
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
The scissor maxx/maxy are non-inclusive, so don't subtract one from
framebuffer width and height.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
| |
We get a warning here for assigning a const char * pointer to
char *swizzle in struct ir2_src_register. The constructor strdups a 4
byte string here, so just memcpy to that instead.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
| |
Move it to a header and use it where possible to avoid vfunc call.
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the pursuit of lowering driver overhead, it became clear that some
amount of redesign of how libdrm_freedreno constructs the submit ioctl
would be needed. In particular, as the gallium driver is starting to
make heavier use of CP_SET_DRAW_STATE state groups/objects, the over-
head of tracking cmd buffers and relocs becomes too much. And for
"streaming" state, which isn't ever reused (like uniform uploads) the
overhead of allocating/freeing ringbuffer[1] objects is too high.
This redesign makes two main changes:
1) Introduces a fd_submit object for tracking bos and cmds table
for the submit ioctl, making ringbuffer objects more light-
weight. This was previously done in the ringbuffer. But we
have many ringbuffer instances involved in a submit (gmem +
draw + potentially 1000's of state-group rbs), and only need
a single bos and cmds table. (Reloc table is still per-rb)
The submit is also a convenient place for a slab allocator for
ringbuffer objects. Other options would have required locking
because, while we can guarantee allocations will only happen on
a single thread, free's could happen either on the application
thread or the flush_queue thread. With the slab allocator in
the submit object, any frees that happen on the flush_queue
thread happen after we know that the application thread is done
with the submit.
2) Introduce a new "softpin" msm_ringbuffer_sp implementation that
does not use relocs and only has cmds table entries for IB1 (ie.
the cmdstream buffers that kernel needs to CP_INDIRECT_BUFFER
to from the RB). To do this properly will require some updates
on the kernel side, so whether you get the softpin or legacy
submit/ringbuffer implementation at runtime depends on your
kernel version.
To make all these changes in libdrm would basically require adding a
libdrm_freedreno2, so this is a good point to just pull the libdrm code
into mesa. Plus it allows for using mesa's hashtable, slab allocator,
etc. And it lets us have asserts enabled for debug mesa buids but
omitted for release builds. And it makes life easier if further API
changes become necessary.
At this point I haven't tried to pull in the kgsl backend. Although
I left the level of vfunc indirection which would make it possible
to have other backends. (And this was convenient to keep to allow
for the "softpin" ringbuffer to coexist.)
NOTE: if bisecting a build error takes you here, try a clean build.
There are a bunch of ways things can go wrong if you still have
libdrm_freedreno cflags.
[1] "ringbuffer" is probably a bad name, the only level of cmdstream
buffer that is actually a ring is RB managed by kernel. User-
space cmdstream is all IB1/IB2 and state-groups.
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This reverts commit a5fd54f8bf6713312fa5efd7ef5cd125557a0ffe.
The whole point was to add a way to pass -DVMX86_STATS to the build,
but we can do that with a command line argument when we invoke scons.
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
Use utility function for converting h264 pipe video profile to profile idc,
instead of using array.
Signed-off-by: Boyuan Zhang <[email protected]>
Acked-by: Christian König <christian.koenig at amd.com>
|
|
|
|
|
|
|
|
| |
Use utility function for converting h264 pipe video profile to profile idc,
instead of using array.
Signed-off-by: Boyuan Zhang <[email protected]>
Acked-by: Christian König <christian.koenig at amd.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
After discussion with Timothy Arceri. disk_cache_get_function_identifier
was using only the first byte of the sha1 build-id. Replace
disk_cache_get_function_identifier with implementation from
radv_get_build_id. Instead of writing a uint32_t it now writes to a
mesa_sha1. All drivers using disk_cache_get_function_identifier are
updated accordingly.
Reviewed-by: Timothy Arceri <[email protected]>
Fixes: 83ea8dd99bb1 ("util: add disk_cache_get_function_identifier()")
|
|
|
|
|
|
|
|
|
|
| |
Following the commit 2385d7b066 and 8e798e28f7, for resource dependancy
tracking.
Fixes: dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo
with FD_MESA_DEBUG=inorder
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
To avoid wrong result when identifying the type of register.
Ie. If the reg is an array, it might be identified as address or
predicate register.
Fixes: dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.6
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Don't leave vsconst/fsconst group enabled if we switch to shader with no
uniforms.
Fixes: abcdf5627a2 freedreno/a6xx: move const emit to state group
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Would have been useful to catch the problem fixed in
8e798e28f736e22e9e1e4534ab42a36cde14b142
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This function's API changed between LLVM 5 and 6. Compile errors occur
when building with LLVM 6+ if LLVM 5 was used for a dist tarball
CC: <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107865
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Converted from x86 VFMADDPS intrinsic to generic LLVM intrinsic, and
removed createInstructionSimplifierPass, which were both removed in LLVM
7.0.0
These changes combine patches we received from the community and our own
internal patches
Reviewed-by: Bruce Cherniak <[email protected]>
Tested-by: Chuck Atkins <[email protected]>
|
|
|
|
|
|
|
|
| |
Gives a +3.89% to +5.27% FPS improvement with Hitman and +2.73% to +2.82%
FPS improvement with Dirt Rally on my GTX 1060.
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GL_EXT_texture_buffer introduced texture buffers, which can be used
in shaders through a new type imageBuffer.
Because how image access is implemented in freedreno, calling
imageSize on an imageBuffer returns the size in bytes instead of texels,
which is incorrect.
This patch adds a division of imageSize result by the bytes-per-pixel
of the image format, when image is buffer-backed.
Fixes all tests under
dEQP-GLES31.functional.image_load_store.buffer.image_size.*
v2: Pre-compute and submit the log2 of the image format's bpp as shader
constant instead of emitting the LOG2 instruction in code. (Rob Clark)
v3: Use ffs (find-first-bit) helper for computing log2 (Ilia Mirkin)
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Enable vcn jpeg decode for raven.
Signed-off-by: Boyuan Zhang <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
|
|
|
|
|
|
|
|
| |
Implement jpeg target buffer cmd by programming registers directly,
since there is no firmware for VCN Jpeg decode.
Signed-off-by: Boyuan Zhang <[email protected]>
Acked-by: Leo Liu <[email protected]>
|
|
|
|
|
|
|
|
| |
Implement jpeg bitstream buffer cmd by programming registers directly,
since there is no firmware for VCN Jpeg decode.
Signed-off-by: Boyuan Zhang <[email protected]>
Acked-by: Leo Liu <[email protected]>
|
|
|
|
|
|
|
|
| |
Move the previous get_mjpeg_slice_heaeder function and eoi from
"radeon/vcn" to "st/va".
Signed-off-by: Boyuan Zhang <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
|
|
|
|
|
|
|
|
| |
Add a new file to handle VCN Jpeg decode specific functions. Use Jpeg
specific cmd sending function in end_frame call.
Signed-off-by: Boyuan Zhang <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
|
|
|
|
|
|
|
|
| |
Use function pointer for sending cmd in end_frame call. By doing this, we can
assign different cmd sending logics for Jpeg decode later.
Signed-off-by: Boyuan Zhang <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
|
|
|
|
|
|
|
| |
Add RING_VCN_JPEG for VCN Jpeg decode, and keep RING_VCN_DEC for other codecs.
Signed-off-by: Boyuan Zhang <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
|
|
|
|
|
|
|
| |
Add a new ring type for vcn jpeg.
Signed-off-by: Boyuan Zhang <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
|
|
|
|
|
|
|
| |
Add VCN Jpeg decode interfaces and register defines.
Signed-off-by: Boyuan Zhang <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
|
|
|
|
|
|
|
|
| |
Move radeon_decoder definition from "radeon_vcn_dec.c" to "radeon_vcn_dec.h",
so that it can be included by other files later.
Signed-off-by: Boyuan Zhang <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
|