| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
We will need the full info. This also speeds up
virgl_attach_res_atomic_buffers and fixes resource leaks when the
context is destroyed.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
|
|
|
|
|
|
|
| |
It replaces virgl_context::images.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
|
|
|
|
|
|
|
| |
It replaces virgl_context::ssbos.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
|
|
|
|
|
|
|
| |
It replaces virgl_context::ubos.
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
virgl_shader_binding_state will be used to manage all per-stage
shader bindings. For now, it manages only sampler views.
This replaces virgl_textures_info and fixes some issues
- start_slot is now honored
- views outside of [start_slot, slart_slot+count) are unmodified
- views are released when the context is destroyed
Signed-off-by: Chia-I Wu <[email protected]>
Reviewed-by: Alexandros Frantzis <[email protected]>
|
|
|
|
|
|
| |
Fixes valgrind errors when running two CTS tests back to back:
- KHR-GL45.shader_image_load_store.basic-allTargets-loadStoreT*
(The first test has an actual TCS, the second uses passthrough.)
|
|
|
|
|
|
|
|
|
| |
This restriction was accidentally added to the BSpec/PRM as an
unrestricted restriction starting with the HSW docs and it was never
removed. However, it only ever applied to HSW and actually potentially
causes problems on BDW and above where we have mipmapped fast-clears.
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Split out a separate program config state group to run early before the
other groups.
This seems to help w/ intermittent "missed tiles" (although I had
assumed that was a mem2gmem issue), or at least I can't reproduce that
issue with this patch, but can without.
It has the benefit of HLSQ_VS_CNTL.CONSTLEN matching for VS and BS.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the newer (v1.76) fw, we were getting hangs (compared to older
v1.66 fw). Re-work the GMEM code to structure things a bit closer to
the blob. This moves some PKT7 packets from IB2 to IB1, which I think
is what was confusing SQE and causing it to get stuck in an infinite
loop. But in general structuring things at least closer to the same way
blob does makes it easier to compare cmdstream.
Note: this is a bit on the large side for what I'd normally consider for
stable.. but right now it is looking like it is the newer fw that is
headed for linux-firmware. This should defn have some soak time on
master, but probably a good idea for this patch to end up in distro mesa
builds by the time a630_sqe.fw hits linux-firmware.
Cc: [email protected]
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This seems to be in a block of non buffered/context regs. Blob always
WFIs before write, so probably a good idea.
Annoyingly, compared to ealier gens, it is a bit harder to tell from the
register offset whether it is a buffered reg, it isn't as simple as
everything below 0x2000, it seems.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases the draw for the text wasn't working. This seems to be
fixed by resyncing some of the "golded registers" from blob (initial
values were based on somewhat older blob version).
Perhaps good to have a bit of soak time on master, but would be good
to eventually land in 19.x stable branches.
Cc: [email protected]
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On CNL+, the clear color struct is composed of RGBA channel values and
fields which are either reserved by the HW or used to control
fast-clears. Currently anv initializes the channel values to zero and
allows the other fields to be undefined.
Satisfy the MBZ field requirements by removing an optimization that
doesn't hold true for CNL+ and pulling in the number of dwords to
initialize from ISL.
Cc: <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix compilation where the DWORD type is used with a format, after
-Werror-format added by c9c1e261.
Some Win32 API types are different fundamental types in the 32-bit and
64-bit versions. This problem is then further compounded by the fact
that whilst both 32-bit Cygwin and 32-bit MinGW use the ILP32 data
model, 64-bit MinGW uses the LLP64 data model, but 64-bit Cygwin uses
the LP64 data model. This makes it near impossible to write printf
format specifiers which are correct for all those targets.
In the Win32 API, DWORD is an unsigned, 32-bit type. So, it is defined
in terms of an unsigned long, except in the LP64 data model used by
64-bit Cygwin, where it is an unsigned int.
It should always be safe to cast it to unsigned int and use %u or %x.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
bind_state is possibly the worst name ever. For create, we used
create_shader_state, which is more descriptive. Put shader in the name.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I recently discovered that the following code lead to valgrind errors:
struct isl_swizzle swizzle = ISL_SWIZZLE_IDENTITY;
VALGRIND_CHECK_MEM_IS_DEFINED(&swizzle, sizeof(swizzle));
which is surprising, because struct isl_swizzle is simply:
struct isl_swizzle {
enum isl_channel_select r:4;
enum isl_channel_select g:4;
enum isl_channel_select b:4;
enum isl_channel_select a:4;
};
and the above code initializes all of them with a C99 initializer.
Iván Briano reminded me that C99 initializers don't necessarily zero
padding. A quick inspection revealed that sizeof(struct isl_swizzle)
was 4 (rather than the expected 2). Ian Romanick suggested changing
it to uint16_t, since this is essentially dicing up an unsigned, and
that worked.
This patch marks enum isl_channel_select packed, changing its size
from 4 bytes to 1 byte. This then makes struct isl_swizzle 2 bytes,
with no bogus padding fields. This eliminates valgrind undefined
memory warnings.
These isl_swizzle values become part of our BLORP blit program keys,
which are then hashed. This undefined padding was being included in
the hashing, possibly leading to issues. I originally saw this error
when running KHR-GL45.texture_size_promotion.functional in iris under
valgrind.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
These depended on the wallpaper reload.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
We were previously lowering to inand, but the second arg was not
duplicated so inot would always return ~0. Oops.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
Trivial.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
Trivial cleanup.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
This fixes bugs with complex control flow.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
This helps with debugging scheduling/emission.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
Just makes it a little more obvious what's going on.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This uses the new mesa/st functionality for NIR I/O vectorization, which
eliminates a number of corner cases (resulting in assorted dEQP
failures and regressions) and should improve performance substantial due
to lessened pressure on the load/store pipe.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
This pass interfered with the more delicate path required for
non-vectorized I/O. It's also ugly and duplicating the job of an actual
honest-to-goodness scheduler.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
Fixes: cd73b6174b093b75f581 "nir/lower_to_source_mods: Stop turning add, sat, and neg into mov"
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This now boils down to just picking between binning or vertex shader
and dummy_fs or real fs, which we can do in a couple of lines of code
instead. The constlen logic isn't doing what it thinks it's doing,
both constlens at this point
MAX2(s[VS].constlen, align(state->bs->constlen, 4));
are binning shader constlens. We'll have to revisit the constlen
logic, but this commit doesn't change how it works.
Reviewed-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
a6xx only supports indirect shaders.
Reviewed-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
We have a similar function in fd6_program.c. Move to fd6_emit.h and
share.
Reviewed-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
There's already a bit of duplicated logic here and tessellation will
add more. Build up dword 0 in fd6_draw_vbo() and drop the a4xx in the
process.
Reviewed-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
In preparation for refactoring fd6_draw.c a bit.
Reviewed-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-By: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
This is now supported.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-By: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From the Vulkan spec 1.1.109:
"Some implementations may need to evaluate depth image values
while performing image layout transitions. To accommodate this,
instances of the VkSampleLocationsInfoEXT structure can be
specified for each situation where an explicit or automatic
layout transition has to take place. [...] and
VkRenderPassSampleLocationsBeginInfoEXT can be chained from
VkRenderPassBeginInfo to provide sample locations for layout
transitions performed implicitly by a render pass instance."
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-By: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-By: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From the Vulkan spec 1.1.109,
"Some implementations may need to evaluate depth image values
while performing image layout transitions. To accommodate this,
instances of the VkSampleLocationsInfoEXT structure can be
specified for each situation where an explicit or automatic
layout transition has to take place. VkSampleLocationsInfoEXT
can be chained from VkImageMemoryBarrier structures to provide
sample locations for layout transitions performed by
vkCmdWaitEvents and vkCmdPipelineBarrier calls."
This handles explicit depth/stencil layout transitions performed
with CmdWaitEvents() or CmdPipelineBarrier().
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-By: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-By: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
If VK_EXT_sample_locations is used, the driver might need to emit
the sample locations specified during layout transitions.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-By: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
This will be used for the depth decompress pass that might need
to emit variable sample locations during layout transitions.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-By: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
We run a ton of backend specific passes here (mostly brw_preprocess_nir)
and ought to sweep up any unused memory at this point, since we're going
to hang on to this NIR for as long as the linked program lives.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shader-db stats courtesy of Eric Anholt:
total instructions in shared programs: 6480215 -> 6475457 (-0.07%)
instructions in affected programs: 662105 -> 657347 (-0.72%)
helped: 1209
HURT: 13
total constlen in shared programs: 1432704 -> 1427769 (-0.34%)
constlen in affected programs: 100063 -> 95128 (-4.93%)
helped: 512
HURT: 0
total max_sun in shared programs: 875561 -> 873387 (-0.25%)
max_sun in affected programs: 46179 -> 44005 (-4.71%)
helped: 1087
HURT: 0
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, ir3 backend compiler is lowering integer multiplication from:
dst = a * b
to:
dst = (al * bl) + (ah * bl << 16) + (al * bh << 16)
by emitting this code:
mull.u tmp0, a, b ; mul low, i.e. al * bl
madsh.m16 tmp1, a, b, tmp0 ; mul-add shift high mix, i.e. ah * bl << 16
madsh.m16 dst, b, a, tmp1 ; i.e. al * bh << 16
which at that point has very low chances of being optimized.
This patch adds a new nir_algebraic.AlgebraicPass to performs this
lowering during NIR algebraic optimization passes, giving it a better
chance for optimizing the resulting code.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For umul_low (al * bl), zero is returned if the low 16-bits word of either
source is zero.
for imadsh_mix16 (ah * bl << 16 + c), c is returned if either 'ah' or 'bl'
is zero.
A couple of nir_search_helpers are added:
is_upper_half_zero() returns true if the highest word of all components of
an integer NIR alu src are zero.
is_lower_half_zero() returns true if the lowest word of all components of
an integer nir alu src are zero.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
They directly emit ir3_MULL_U and ir3_MADSH_M16 respectively.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
'umul_low' is the low 32-bits of unsigned integer multiply. It maps
directly to ir3's MULL_U.
'imadsh_mix16' is multiply add with shift and mix, an ir3 specific
instruction that maps directly to ir3's IMADSH_M16.
Both are necessary for the lowering of integer multiplication on
Freedreno, which will be introduced later in this series.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
We still need to emit them in V3D 3.x since there there is no mechanism to
disable them.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 2282ec0a refactored drawable creation across various platforms
into a new dri2_create_drawable helper function.
The GBM code in platform_drm.c code passed in dri2_surf->gbm_surf as the
loaderPrivate, while most other backends passed in dri2_surf directly.
To try and handle this, the patch checked if dri2_surf->gbm_surf was
non-NULL, and if so, presumed that the caller is the DRM platform and
we should use the dri2_surf->gbm_surf pointer.
This worked for most platforms, which calloc their dri2_surf structure,
zeroing the data. Unfortunately, platform_x11.c used malloc, leaving
most of the dri2_surf as garbage. In particular, dri2_surf->gbm_surf
was often non-NULL, causing dri2_create_drawable to try and use it,
passing a garbage pointer to the createNewDrawable hook, usually leading
to a SIGBUS or SIGSEGV when trying to dereference that bad pointer.
Since most callers calloc the data, make platform_x11.c follow suit.
Fixes crashes with i915_dri.so when running dEQP-GLES2.
Reviewed-by: Mathias Fröhlich <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|