| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
If someone did TF into a UBO, we might have left the TF job un-flushed at
the point of reading.
|
|
|
|
|
| |
This simplifies a bunch of our texture handling, while introducing the
slots necessary for adding new shader stages.
|
|
|
|
|
| |
The default attributes are long-lived (the state struct is cached), and
only 256 bytes each.
|
|
|
|
|
|
| |
Shaders are usually quite short, and are private to the context. We can
save memory and reduce the work the kernel needs to do at exec time by
packing them together in a stream uploader for long-lived state.
|
|
|
|
|
| |
We were missing the invalidate between bin and render (possibly relevant
for SSBOs), and still trying to flush the nonexistent L2C on 3.3+.
|
|
|
|
|
| |
This is a separate, dedicated hardware unit for texture layout conversions
and mipmap generation.
|
|
|
|
|
|
|
|
|
| |
The TFU lets us format raster and SAND images into formats that can be
read by the texture engine, and do mipmap generation.
The UAPI comes from drm-next e69aa5f9b97f ("Merge tag
'drm-misc-next-2018-12-06' of git://anongit.freedesktop.org/drm/drm-misc
into drm-next")
|
|
|
|
|
|
|
| |
The HW apparently has some issues (or at least a much more complicated VCM
calculation) with non-combined segments, and the closed source driver also
uses combined I/O. Until I get the last CTS failure resolved (which does
look plausibly like some VPM stomping), let's use combined I/O too.
|
|
|
|
|
|
|
|
| |
We were exposing ARB_texture_float, but apparently not the OES subset
flag. Fixes regression from GLES3 support to GLES2.
Fixes: fcf9fcee3c8a ("mesa/main: do not require float-texture filtering
for es3")
|
|
|
|
|
| |
This is the actual native format for the hardware, without swizzling.
Noticed while debugging why GLES3 disappeared.
|
|
|
|
|
|
| |
Rather than just hard-coding BRANCHSTACK size.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
copy/pasta from older gens
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
There is not much to do in freedreno - tile layout and multisample
state for gmem renderings is programmed based on the pfb sample count,
while resolve blits take the destination sample count from the resource.
Reviewed-by: Rob Clark <[email protected]>
Signed-off-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kristian H. Kristensen <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
User are encouraged to switch to LLVM 7.0 released in September 2018.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
A couple of simple fixes for building on Android with autotools.
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is
one if 8, 16, 32, or 64. This leads to having a few more opcodes but
now everything is consistent and booleans aren't a weird special case
anymore.
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It doesn't seem like the exact number has too much effect on the
performaince in "teximage". However setting it to just about anything
prevents some OOMs from getting hit. These values are not well-tuned,
but don't seem too bad.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Since the max attrib stride is 2048, the max src offset makes sense as
2047.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All TXF operations implicitly use sampler 0, and fail if it's not bound
to anything. This does not happen in LINKED_TSC mode, but we don't
currently use this.
We ensure that TSC entry at id 0 has the SRGB conversion bit enabled
(and all samplers we normally generate will too). Then when the TSC at
*slot* 0 (not to be confused with entry 0 in the global TSC table) is
unbound, we bind it to entry 0. This way, TXF operations are not
dependent on there being a regular sampler bound there.
Fixes arb_texture_buffer_object-subdata-sync among others. (TBO's are
particularly susceptible to this as they don't bind a sampler.)
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Fixes: 8d4bb6e5c (virgl: Add command and flags to initiate debugging on the host (v2))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new approach is that samplers don't get unbound even if they won't be used
in a draw and we should just leave them be as well.
Fixes a regression in multiple windows games using gallium nine and nouveau.
v2: adjust num_samplers to keep track of the highest sampler bound
v3: rework how to set the new value of num_samplers
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106577
Fixes: 4d6fab245eec3880e2a59424a579851f44857ce8
"cso: don't track the number of sampler states bound"
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can mark the buffer unclean if it's ever bound as a TBO,
SSBO, ABO, or image.
This improves
dEQP-GLES3.performance.buffer.data_upload.function_call.map_buffer_range.new_specified_buffer.flag_write_full.stream_draw
from 9.58 MB/s to 451.17 MB/s.
v2: Track buffer cleanliness as a function of bindings (Ilia).
v3: virgl_modify_clean --> virgl_dirty_res (Erik)
Tested-By: Gert Wollny <[email protected]>
Reviewed-by: Erik Faye-Lund <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We flush everytime the command buffer (16 kB) is full, which is
quite costly.
This improves
dEQP-GLES3.performance.buffer.data_upload.function_call.buffer_data.new_buffer.usage_stream_draw
from 111.16 MB/s to 1930.36 MB/s.
In addition, I made the benchmark produce buffers from 0 --> VIRGL_MAX_CMDBUF_DWORDS * 4,
and tried ((VIRGL_MAX_CMDBUF_DWORDS * 4) / 2), ((VIRGL_MAX_CMDBUF_DWORDS * 4) / 4), etc.
I didn't notice any clear differences, so let's just go with the most obvious
heuristic.
Tested-By: Gert Wollny <[email protected]>
Reviewed-by: Erik Faye-Lund <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tested running WebGL aquarium on Nvidia host (10,000 fishes)
This moves us from 7 fps to 9 fps. After quadrupling, performance
gains diminish.
v2: Remove change ID (Erik)
Tested-By: Gert Wollny <[email protected]>
Reviewed-by: Erik Faye-Lund <[email protected]>
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
| |
Acked-by: Christian König <[email protected]>
|
| |
|
|
|
|
| |
fill more tgsi_shader_info fields.
|
| |
|
|
|
|
| |
We need to look at the Dst for image stores.
|
|
|
|
| |
to stop using the workaround in si_make_buffer_descriptor.
|
|
|
|
|
|
| |
no change in behavior.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
Ported from virglrenderer.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108311
Cc: <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce a new driver-private transfer flag RADEON_TRANSFER_TEMPORARY
that specifies whether the caller will use buffer_unmap or not. The
default behavior is set to permanent maps, because that's what drivers
do for Gallium buffer maps.
This should eliminate the need for hacks in libdrm. Assertions are added
to catch when the buffer_unmap calls don't match the (temporary)
buffer_map calls.
I did my best to update r600 for consistency (r300 needs no changes
because it never calls buffer_unmap), even though the radeon winsys
ignores the new flag.
As an added bonus, this should actually improve the performance of
the normal fast path, because we no longer call into libdrm at all
after the first map, and there's one less atomic in the winsys itself
(there are now no atomics left in the UNSYNCHRONIZED fast path).
Cc: Leo Liu <[email protected]>
v2:
- remove comment about visible VRAM (Marek)
- don't rely on amdgpu_bo_cpu_map doing an atomic write
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vtest doesn't implement the according API and would segfault:
Program received signal SIGSEGV, Segmentation fault.
#0 0x0000000000000000 in ?? ()
#1 in virgl_fence_server_sync at
src/gallium/drivers/virgl/virgl_context.c:1049
#2 in st_server_wait_sync at
src/mesa/state_tracker/st_cb_syncobj.c:155
so just don't do the call when the function pointers are not set.
Fixes dEQP:
dEQP-GLES3.functional.fence_sync.wait_sync_smalldraw
dEQP-GLES3.functional.fence_sync.wait_sync_largedraw
Fixes: d1a1c21e7621b5177febf191fcd3d3b8ef69dc96
virgl: native fence fd support
Signed-off-by: Gert Wollny <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Robert Foss <[email protected]>
|
|
|
|
|
|
| |
I've been using this with the kmsro series to test v3d on VKMS without my
old KMS hack in the v3d kernel driver. KMSRO still needs some cleanup,
but v3d RO support seems reasonable.
|
|
|
|
|
|
|
|
|
|
| |
Since 1285f71d3e landed, it needs to provide apps with proper sample
position for MSAA.
Currently no way to query this to hw, these are taken from blob driver.
Fixes: dEQP-GLES31.functional.texture.multisample.samples_#.sample_position
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We set equiv bit in SP_FS_CTRL_REG0. Somehow the hw doesn't hang with
this mismatched config, but does run slower. It is faster with either
neither bit set, or both bits set, but both is the fastest of the three
configurations. Worth a bit over 10% gain in glmark2.
Spotted-by: Jonathan Marek <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Jonathan Marek <[email protected]>
|
|
|
|
|
|
|
| |
blip_fp uses GENERIC as input, so blit_vp should match for linking
Signed-off-by: Jonathan Marek <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Adds all missing texture related logic. For everything to work it also
needs changes to ir2/fd2_program, which are part of the ir2 update patch.
Note: it needs rnndb update
Signed-off-by: Jonathan Marek <[email protected]>
[remove stray patch]
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Note: it needs rnndb update
Signed-off-by: Marek Vasut <[email protected]>
Signed-off-by: Jonathan Marek <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|