| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
There are PE formats not supported by RS, so we can't have a single
to translate both.
Use RS only for same formats until we have a translate_rs_format and test
the possible different format blits.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Removes the incorrect usage of translate_rs_format
* Disables use of BLT engine for different src/dst format
We only really need the BLT engine for tiling/detiling right now, but it
would be nice to support as many blit cases as possible to avoid using PE
for that.
To deal with different formats we need to:
* Have a translate_blt_format which has all supported formats
* Fix the swizzle translation from gallium (current version was wrong)
* Set the src/dst sRGB bits as needed
* Find which type conversions the BLT engine can actually do
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
| |
In vec4, we can just not run the pass. In fs, things are a bit more
deeply intertwined.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
This change makes it possible to support different downsample cases
like 4 -> 2 or 4 -> 1.
Signed-off-by: Christian Gmeiner <[email protected]>
Reviewed-by: Gert Wollny <[email protected]>
|
|
|
|
|
|
|
|
| |
This makes the streams more readable and comparable with the blob's parser
as it parses the VS and PLBU stream and shows the currently known values.
Reviewed-by: Qiang Yu <[email protected]>
Signed-off-by: Andreas Baierl <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Change the dump, that the output looks more like the output of
mali-syscall-tracker [1].
This is a preparation for a more detailed stream analysis.
Reviewed-by: Qiang Yu <[email protected]>
Signed-off-by: Andreas Baierl <[email protected]>
[1]: https://gitlab.freedesktop.org/lima/mali-syscall-tracker
|
|
|
|
|
|
|
|
|
|
|
|
| |
GEN10_FORMAT_TABLE_INPUTS requires correction of u_format.csv file path
in order to avoid following build error:
ninja: error: 'external/mesa/util/format/u_format.csv',
needed by 'out/target/product/x86_64/gen/STATIC_LIBRARIES/libmesa_pipe_radeonsi_intermediates/radeonsi/gfx10_format_table.h',
missing and no known rule to make it
Fixes: 882ca6d ("util: Move gallium's PIPE_FORMAT utils to /util/format/")
Signed-off-by: Mauro Rossi <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We want fma. This decreases compile times by 4% for Borderlands 2.
48505 shaders in 30515 tests
Totals:
SGPRS: 2206584 -> 2204784 (-0.08 %)
VGPRS: 1647892 -> 1648964 (0.07 %)
Spilled SGPRs: 6256 -> 6078 (-2.85 %)
Spilled VGPRs: 72 -> 72 (0.00 %)
Private memory VGPRs: 2176 -> 2176 (0.00 %)
Scratch size: 2240 -> 2240 (0.00 %) dwords per thread
Code Size: 49680804 -> 49837988 (0.32 %) bytes
LDS: 74 -> 74 (0.00 %) blocks
Max Waves: 371387 -> 371352 (-0.01 %)
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
| |
glxgears has dead temps after lowering color inputs to load intrinsics.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
State was leaking from previous frames as we weren't updating the
descriptor in all cases.
Signed-off-by: Tomeu Vizoso <[email protected]>
Tested-by: Andre Heider <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Per the spec, the units passed to glPolygonOffset are to be multiplied
by an implementation-defined constant.
On Midgard, this constant seems to be 2.
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the case of glibc, pthread_t is internally a pointer. If
lp_rast_destroy() passes a 0-value pthread_t to pthread_join(), the
latter will SEGV dereferencing it.
pthread_create() can fail if either the user's ulimit -u or Linux
kernel's /proc/sys/kernel/threads-max is reached.
Choosing to continue, rather than fail, on theory that it is better to
run with the one main thread, than not run at all.
Keeping as many threads as we got, since lack of threads severely
degrades llvmpipe performance.
Signed-off-by: Nathan Kidd <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
| |
So nir_validate happens properly. Unfortunately this means we have
to play the metadata song and dance, so walk over all impls and say
that we didn't hurt anything.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
When demoting it from an output to a global, we need to actually move
it to the correct list. While here, we also refactor so it's clear
we aren't mutating the list while iterating.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2106
Fixes: f9fd04aca15 ("nir: Fix non-determinism in lower_global_vars_to_local")
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To make PIPE_FORMATs usable from non-gallium parts of Mesa, I want to
move their helpers out of gallium. Since u_format used
util_copy_rect(), I moved that in there, too.
I've put it in a separate directory in util/ because it's a big chunk
of related code, and it's not clear to me whether we might want it as
a separate library from libmesa_util at some point.
Closes: #1905
Acked-by: Marek Olšák <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
It's now unused, in favour of LCRA.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is less complicated than previously thought. Note we have no way of
specifying the work register count for blend shaders; it must be
strictly less than the work register count of the corresponding fragment
shader (which is fine since we force the fragment shader to report a
count of 16 with a blend shader as a major hack until we get register
pressure down for blend shaders).
TODO: pandecode the flags.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
| |
This code is kinda stand-alone, and it makes it a bit easier to find the
right source in the source-tree.
|
|
|
|
|
| |
This code is kinda stand-alone, and it makes it a bit easier to find the
right source in the source-tree
|
|
|
|
| |
This will help code-reuse a bit in the next commit.
|
|
|
|
|
| |
This code is more or less stand-alone, and this keeps the formats array
a bit more encapsulated.
|
|
|
|
|
|
|
|
|
| |
This is a driver-param (loaded from uniform), not a sysval (populated by
hw into a register). So it has no value to having a sysval slot.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
Acked-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can end up with scenarios where last_fence is associated with a batch
that is flushed through some other path before needs_out_fence_fd gets
set. Resulting in returning a fence that has no backing fd.
The simplest thing is to just skip the optimization to try and avoid
no-op batches when a fence-fd is requested. This should normally be
just once a frame anyways.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
It seems whatever was causing this is no longer an issue. So let's get
rid of the hack here.
Signed-off-by: Erik Faye-Lund <[email protected]>
|
| |
|
| |
|
|
|
|
|
|
|
| |
A640 seems to work without any other changes (glmark and vkcube).
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
If we have an accelerated path for a particular framebuffer format,
let's use it to save a bunch of instructions in a blend shader.
[Tomeu: Only use the faster intrinsic on >T760]
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Tomeu Vizoso <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using packed vulkan-formats on little-endian systems, we need to
swap the components for the gallium formats. And since Zink isn't
big-endian safe yet, little-endian is the only endianess we care about
right now.
This fixes a bunch of piglit tests, amongs others:
- spec@arb_depth_texture@depth-level-clamp
- spec@arb_depth_texture@depthstencil-render-miplevels * d=z24
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-blit
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-copypixels
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-drawpixels
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-readpixels
Signed-off-by: Erik Faye-Lund <[email protected]>
Fixes: 8d46e35d16e ("zink: introduce opengl over vulkan")
|
|
|
|
|
|
|
|
| |
This fixes the following piglit:
spec@ati_fragment_shader@ati_fragment_shader-render-fog
Signed-off-by: Erik Faye-Lund <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The instruction count is (mostly) a measure of what optimization passes
can do, while # of nops is more an indication of how effectively the
scheduler is balancing register pressure vs instruction count. So track
these independently.
(There could be opportunities to rematerialize values to reduce register
pressure, swapping some nop's with other alu instructions, so nothing is
truely independent.. but it is still useful to break these stats out.)
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Set flag based on actual output reg type.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
We should really be setting this based on the actual output register
type.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
This partially reverts 8b30114dda8.
Fixes: 8b30114dda8 "radeonsi/nir: call nir_serialize only once per shader"
|
|
|
|
|
|
|
|
| |
We were calling it twice.
First serialize it, then use it to compute the cache key.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Mesa emulates planar format sampling with per-plane samplers. Virgl now
supports this by allowing the plane index to be passed when creating a
sampler view from a planar image. With this change, mesa now passes that
information to virgl.
Signed-off-by: David Stevens <[email protected]>
Reviewed-by: Lepton Wu <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Enable / add to features.txt:
- Enhanced textureGather.
- Geometry shader instancing.
- Geometry shader multiple streams.
Reviewed-by: Jan Zielinski <[email protected]>
|
|
|
|
|
|
|
| |
- Fixed proper setting gl_InvocationID.
- Fixed GS vertices output memory overflow.
Reviewed-by: Jan Zielinski <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The panfrost BO cache can only grow since all newly allocated BOs are
returned to the cache (unless they've been exported).
With the MADVISE ioctl that's not a big issue because the kernel can
come and reclaim this memory, but MADVISE will only be available on 5.4
kernels. This means an app can currently allocate a lot memory without
ever releasing it, leading to some situations where the OOM-killer kicks
in and kills the app (or even worse, kills another process consuming
more memory than the GL app) to get some of this memory back.
Let's try to limit the amount of BOs we keep in the cache by evicting
entries that have not been used for more than one second (if the app
stopped allocating BOs of this size, it's likely to not allocate
similar BOs in a near future).
This solution is based on the VC4/V3D implementation.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We will soon introduce an LRU list to evict BOs that have been unused
for more than 1 second. Let's first move all BO cache fields to a
sub-struct to clarify which fields are used by the BO caching logic.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
Wow. Very triangle. So shader.
Signed-off-by: Kristian H. Kristensen <[email protected]>
Acked-by: Eric Anholt <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
When other geometry stages are present, we chose two quads and no
merged regs.
Acked-by: Eric Anholt <[email protected]>
Signed-off-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
We have tessellation state now.
Signed-off-by: Kristian H. Kristensen <[email protected]>
Acked-by: Eric Anholt <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
At least the gallium blitter helper will call us to draw with
tessellation shaders set but a non-patch primitive.
Signed-off-by: Kristian H. Kristensen <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
It seems like tiling could work in the Adreno architecture, but we've
only ever seen bypass rendering with tessellation. For now, let's do
that too.
Signed-off-by: Kristian H. Kristensen <[email protected]>
Acked-by: Eric Anholt <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Kristian H. Kristensen <[email protected]>
Acked-by: Eric Anholt <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|