| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
It may decrease performance and it prevents compute-based primitive culling.
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
The prim discard compute shader bakes InstanceID into the output index buffer.
Tested-by: Dieter Nützel <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
If a prim discard compute shader hasn't finished compilation, we don't want
to any shader.
Tested-by: Dieter Nützel <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
It will be modified by compute shader culling.
Tested-by: Dieter Nützel <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
so that the fields can be changed by the driver.
Tested-by: Dieter Nützel <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
The primitive discard compute shader will get the position output this way.
Tested-by: Dieter Nützel <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adjusts the capabilities returned
by the SWR driver and the documentation to correctly
report the following extensions:
GL_ARB_texture_query_lod, GL_ARB_texture_cube_map_array,
GL_ARB_gpu_shader_fp64, GL_ARB_texture_gather,
GL_ARB_vertex_attrib_64bit.
Reviewed-by: Alok Hota <[email protected]>
|
|
|
|
|
|
|
| |
Since the using output optimization is only for back buffer case
Signed-off-by: Leo Liu <[email protected]>
Acked-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
transform feedback draws get the number of vertices from the transform
feedback object. In draw, we'll figure this out with the number of bytes
written divided by the stride. However, it is apparently possible we end
up with a stride of 0 there (not entirely sure it could happen with GL).
Probably when nothing was actually ever written (so we don't actually
have a stride set). Just avoid the division by zero by setting the count
to 0.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
| |
This avoids some lower_instructions bits in st.
|
|
|
|
|
|
|
|
|
| |
This commit adds a bunch of new load/store opcodes, largely related to
OpenCL, as well as adjusting the name of existing opcodes to be more
uniform. The immediate effect is compute shaders are substantially
easier to interpret now.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Midgard ALU features two types of constants: embedded constants (128-bit
chunk, zero/one per schedule bundle) and inline constants (16-bit
splattered into the op, second source if present). Inline constants are
much more efficient from a space and scheduling freedom standpoint, so
it's desirable to inline when possible. Now that integer ops are well
understood and in use, we enable inlining of integers constants in
addition to floats (which have been inlined since forever).
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
The previous commit fixes the issue this patched around.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By default, the "normal" output modifier is set on ALU ops. This is the
correct default for float outputs -- for floats, it preserves the semantic
value. Unfortunately, when used with integers, it does not preserve the
bitstream encoding, causing misbehaviour. (It's an open question what
happens when `normal` is used with integers -- does it apply some other
transformation? or does it do floating point normalization/etc on the
ints as if they were floats?).
Instead, we default to the "clamp to integer" output modifier for
ops writing integers. Semantically, this makes sense (clamping an
integer to the nearest integer is the identity function). In the
hardware with an integer opcode, this is the actual "normal".
This fixes numerous sporadic and sometimes bizarre bugs relating to
integers, especially integer moves. With this in place, we no longer
care about the types involved; it's just bits on the wire again.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From Gallium (and our) perspective, the stride of a BO is arbitrary. For
internal buffers, we can make it something nice, but for imported linear
buffers (e.g. EGL clients), we don't always have that luxury. To cope,
we calculate the expected stride of a texture, compare it to the BO's
actual reported stride, and if they differ, set the latter as a custom
stride.
Fixes rendering of windows not on tile boundaries (noticeable in Weston
with es2gears_wayland, for instance). Also, this should fix stride
issues with bufer reloading.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
With a special flag, texture descriptors can include custom stride(s).
We haven't seen a case of this used for mipmaps/cubemaps, so it's not
clear how that will be encoded, but this dumps correctly for single
one-level 2D textures.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
One field was not dumped for some reason. It's observed to be 0, but
it's still good to have it available.
Also, extra fields might be snuck in the bitmaps array (it's
variable-lengthed at the end), and we want to guard against that
possibility, so we dump a little more.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Acked-by: Dave Airlie <[email protected]>
We already use GFX9 and I don't want us to have confusing naming
in the driver. GFXn naming is better from the driver perspective,
because it's the real version of the gfx portion of the hw. Also,
CIK means Bonaire-Kaveri-Kabini, it doesn't mean CI.
It shouldn't confuse our SDMA, UVD, VCE etc. code much. Those have
nothing to do with GFXn and they have their own version numbers.
|
|
|
|
| |
Trivial.
|
|
|
|
|
|
|
|
|
|
| |
Handle PIPE_TRANSFER_DONT_BLOCK and PIPE_TRANSFER_MAP_DIRECTLY.
Make virgl_resource_transfer_prepare return an enum instead of a
bool for extensibility (e.g., instruct the callers to map
differently).
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
virgl_resource_transfer_prepare should be called before mapping to
prepare the resource. It does flush, readback, and wait as needed.
virgl_res_needs_flush and virgl_res_needs_readback become internal
helpers to the new function.
There should be no externally visible change.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
|
|
|
|
|
|
|
| |
Add comments and follow the coding style of virgl_res_needs_flush.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
|
|
|
|
|
|
|
| |
Fixes: c704c0226 ("gallium: Add a PIPE_CAP_FRAGMENT_SHADER_INTERLOCK")
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Andrii Kryvytskyi <[email protected]>
Signed-off-by: Danylo Piliaiev <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
There's some debate about whether we should support this on older
hardware as well. Currently i965 turns it off on Gen8- though, so
we follow suit. If this changes, we can update this as well.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Corresponding to GL_ARB_fragment_shader_interlock and
GL_NV_fragment_shader_interlock. Currently, only the NIR paths
support this functionality, but someone could conceivably add it
to TGSI too.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Our tessellation control shaders can be dispatched in several modes.
- SINGLE_PATCH (Gen7+) processes a single patch per thread, with each
channel corresponding to a different patch vertex. PATCHLIST_N will
launch (N / 8) threads. If N is less than 8, some channels will be
disabled, leaving some untapped hardware capabilities. Conditionals
based on gl_InvocationID are non-uniform, which means that they'll
often have to execute both paths. However, if there are fewer than
8 vertices, all invocations will happen within a single thread, so
barriers can become no-ops, which is nice. We also burn a maximum
of 4 registers for ICP handles, so we can compile without regard for
the value of N. It also works in all cases.
- DUAL_PATCH mode processes up to two patches at a time, where the first
four channels come from patch 1, and the second group of four come
from patch 2. This tries to provide better EU utilization for small
patches (N <= 4). It cannot be used in all cases.
- 8_PATCH mode processes 8 patches at a time, with a thread launched per
vertex in the patch. Each channel corresponds to the same vertex, but
in each of the 8 patches. This utilizes all channels even for small
patches. It also makes conditions on gl_InvocationID uniform, leading
to proper jumps. Barriers, unfortunately, become real. Worse, for
PATCHLIST_N, the thread payload burns N registers for ICP handles.
This can burn up to 32 registers, or 1/4 of our register file, for
URB handles. For Vulkan (and DX), we know the number of vertices at
compile time, so we can limit the amount of waste. In GL, the patch
dimension is dynamic state, so we either would have to waste all 32
(not reasonable) or guess (badly) and recompile. This is unfortunate.
Because we can only spawn 16 thread instances, we can only use this
mode for PATCHLIST_16 and smaller. The rest must use SINGLE_PATCH.
This patch implements the new 8_PATCH TCS mode, but leaves us using
SINGLE_PATCH by default. A new INTEL_DEBUG=tcs8 flag will switch to
using 8_PATCH mode for testing and benchmarking purposes. We may
want to consider using 8_PATCH mode in Vulkan in some cases.
The data I've seen shows that 8_PATCH mode can be more efficient in
some cases, but SINGLE_PATCH mode (the one we use today) is faster
in other cases. Ultimately, the TES matters much more than the TCS
for performance, so the decision may not matter much.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Add comments and some minor cleanups.
v2: document the function
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]> (v1)
Reviewed-by: Gurchetan Singh <[email protected]>
Signed-off-by: Chia-I Wu <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
|
|
|
| |
virgl_res_needs_flush should suffice.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Both apps and we (see virgl_buffer_transfer_flush_region) might
flush regions that are unmodified. We have to read back for those
flushes.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
It currently has no user and is probably incorrect (resource_wait is
required in some more cases). Remove it so that we can focus on
transfers first.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
|
| |
Drop some checks that are already done by ISL.
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
| |
No surface requires an auxiliary surface to operate correctly. Fall back
to an uncompressed surface if mesa fails to create and allocate an
auxiliary surface. This enables adding more restrictions to ISL without
having to update iris.
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
type provides
Otherwise we risk to read past the end of the buffer.
In addition, change the loop counters to unsigned to be consistent
with the types.
Fixes: afa8707ba93a7d226a76319acda2a8dd89524db7
softpipe: add SSBO/shader atomics support.
Signed-off-by: Gert Wollny <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
As with the previous value of 5000 we seemed to be reaching OOM in some
circumstances.
Signed-off-by: Tomeu Vizoso <[email protected]>
Acked-by: Alyssa Rosenzweig <[email protected]>
|