| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
In order to make ARB_shader_image_load_store, we have to share
the CB space with RATs, so we should only steal the dual src
space if we have dual src enabled.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
The GL driver had a driconf option (which doesn't make much sense) and
the Vulkan driver had a hand-rolled environment variable. Instead,
let's tie both into the INTEL_DEBUG mechanism and unify things.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
| |
This makes it so that you don't get an "Implement gen7 HiZ" perf warning
when you manually disable HiZ on gen8.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
| |
Select higher of current 1G default or 10% of filesystem where
cache is located.
Acked-by: Timothy Arceri <[email protected]>
Reviewed-by: Grazvydas Ignotas <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Currently only a one in one out eviction so if at max_size and
cache files were to constantly increase in size then so would the
cache. Restrict to limit of 8 evictions per new cache entry.
V2: (Timothy Arceri) fix make check tests
Reviewed-by: Grazvydas Ignotas <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Still using fast random selection of two-character subdirectory in
which to check cache files rather than scanning entire cache.
v2: Factor out double strlen call
v3: C99 declaration of variables where used
Reviewed-by: Grazvydas Ignotas <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
If we fail to randomly select a two letter cache dir, don't select
an empty dir on fallback.
In real world use we should never hit the fallback path but it can
be hit by tests when the cache is set to a very small max value.
Reviewed-by: Grazvydas Ignotas <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should help reduce any overhead added by the shader cache
when programs are not found in the cache.
To avoid creating any special function just for the sake of the
tests we add a one second delay whenever we call dick_cache_put()
to give it time to finish.
V2: poll for file when waiting for thread in test
V3: fix poll delay to really be 100ms, and simplify the wait function
Reviewed-by: Grazvydas Ignotas <[email protected]>
|
|
|
|
|
|
|
| |
V2: Make a copy of the data so we don't have to worry about it being
freed before we are done compressing/writing.
Reviewed-by: Grazvydas Ignotas <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Grazvydas Ignotas <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LLVM 4.0 released with a pretty messy regression, that hopefully
get fixed in the future.
This work around was proposed by Tom, and it fixes the CTS regressions
here at least, I'm not sure if this will cause any major side effects,
but correctness over speed and all that.
radeonsi should possibly consider the same workaround until an llvm
fix can be found.
Acked-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fix is extracted from amdgpu-pro shader traces.
It appears the gather4 workaround for integer types doesn't
work for cubes, so instead if forces a float scaled sample,
then converts to integer.
It modifies the descriptor before calling the gather.
This also produces some ugly asm code for reasons specified
in the patch, llvm could probably do better than dumping
sgprs to vgprs.
This fixes:
dEQP-VK.glsl.texture_gather.basic.cube.rgba8*
Acked-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I couldn't really find an encoding in the spec. I'm not sure it
prescribes VK_MAKE_VERSION format, but vulkan.gpuinfo.org interprets
it that way by default. vulkaninfo gives the raw number, so we could
alternatively do something like 17001000, but that doesn't show
up right on vulkan.gpuinfo.org again. Looking at that site, the -pro
driver also uses VK_MAKE_VERSION, so keeping consistency is probably
best.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Acked-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I've skimmed to changes from 1.0.5 to 1.0.42 and I think we have all
changes. We're still not conformant ofcourse, but this should not
regress stuff,
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Acked-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Need to flush before updating the buffer to ensure that the copy is
ordered after previous accesses (assuming the app has performed the
appropriate barriers).
This fixes potential issues due to draws prior to an update reading
the new buffer content, despite having the necessary barriers between
them.
Signed-off-by: Alex Smith <[email protected]>
Cc: 17.0 <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
The flushes could be due to TRANSFER barriers.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Cc: 17.0 <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
BSD regex library doesn't support extended RE escapes (e.g. \+) and
shorthand character classes (e.g. \s, \S) and SVR4-style word
delimiters[1] (on DragonFly and NetBSD). Both GNU and BSD sed support
-E and -r to enable extended RE but OS X still lacks -r.
[1] https://www.illumos.org/issues/516
Reviewed-by: Eric Engestrom <[email protected]>
Tested-by: Eric Engestrom <[email protected]> (GNU sed)
|
| |
|
|
|
|
|
|
|
|
| |
Apart from avoiding some unneeded size cases, this shouldn't have any
actual functional impact.
Reviewed-by: Dylan Baker <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The NIR story on conversion opcodes is a mess. We've had way too many
of them, naming is inconsistent, and which ones have explicit sizes was
sort-of random. This commit re-organizes things and makes them all
consistent:
- All non-bool conversion opcodes now have the explicit size in the
destination and are named <src_type>2<dst_type><size>.
- Integer <-> integer conversion opcodes now only come in i2i and u2u
forms (i2u and u2i have been removed) since the only difference
between the different integer conversions is whether or not they
sign-extend when up-converting.
- Boolean conversion opcodes all have the explicit size on the bool and
are named <src_type>2<dst_type>.
Making things consistent also allows nir_type_conversion_op to be moved
to nir_opcodes.c and auto-generated using mako. This will make adding
int8, int16, and float16 versions much easier when the time comes.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
| |
Using the helper is way better than hand-coding the universe.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The original version was very convoluted and tried way too hard to not
just have the nested switch statement that it needs. Let's just write
the obvious code and then we know it's correct. This fixes a bunch of
missing cases particularly with int64.
Reviewed-by: Plamena Manolova <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The original bit-size validation wasn't capable of properly dealing with
instructions with variable bit sizes. An attempt was made to handle it
by looking at source and destinations but, because the validation was
done in validate_alu_(src|dest), it didn't really have the needed
information. The new validation code is much more straightforward and
should be more correct.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've always required bit sizes to match but the rules for number of
components have been a bit loose. You've never been allowed to source
from something with less components than you consume, but more has
always been fine. This changes the validator to require that they match
exactly. The fact that they don't always match has been a source of
confusion in NIR for quite some time and it's time we got rid of it.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Using coord_components of the source texture is correct for everything
except cube maps where it's off by one.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Some SPIR-V texturing instructions pack more than the texture coordinate
into the coordinate source. We need to mask off the unused channels.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the near future we are going to require that the num_components in a
src dereference match the num_components of the SSA value being
dereferenced. To do that, we need copy_prop to not remove our MOVs from
a larger SSA value into an instruction that uses fewer channels.
Because we suddenly have to know how many components each source has,
this makes the pass a bit more complicated. Fortunately, copy
propagation is the only pass that cares about the number of components
are read by any given source so it's fairly contained.
Shader-db results on Sky Lake:
total instructions in shared programs: 13318947 -> 13320265 (0.01%)
instructions in affected programs: 260633 -> 261951 (0.51%)
helped: 324
HURT: 1027
Looking through the hurt programs, about a dozen are hurt by 3
instructions and the rest are all hurt by 2 instructions. From a
spot-check of the shaders, the story is always the same: They get a
vec4 from somewhere (frequently an input) and use the first two or three
components as a texture coordinate. Because of the vector component
mismatch, we have a mov or, more likely, a vecN sitting between the
texture instruction and the input. This means that the back-end inserts
a bunch of MOVs and split_virtual_grfs() goes to town. Because the
texture coordinate is also used by some other calculation, register
coalesce can't combine them back together and we end up with an extra 2
MOV instructions in our shader.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
Cc: "17.0 13.0" <[email protected]>
|
|
|
|
|
|
|
| |
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
Cc: "17.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
For render passes with multiple subpasses on gen7, we only fast-clear at
the top but an input attachment use can cause us to do a resolve in the
middle of the render pass. Once we've done so, we are no longer have a
fast-cleared surface so we can just set aux_usage to NONE.
Reviewed-by: Topi Pohjolainen <[email protected]>
Cc: "17.0" <[email protected]>
|
|
|
|
|
|
|
| |
otherwise generated entrypoint headers are not found during build
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
fixes build error when brw_nir.h not found in the generated file
brw_nir_trig_workarounds.c.
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds missing error-checking and fixes resource leak in
allocation failure path on anv_CreateDevice()
v2: Fixes from Jason Ekstrand's review
a) Add missing destructors for all of the state pools on allocation
failure path
b) Add missing destructor for batch bo pools on allocation failure path
v3: Fixes from Emil Velikov's review
Add missing destructor for queue and scratch_pool on allocation failure
path
Signed-off-by: Mun Gwan-gyeong <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ported from radeonsi, pointed out by Tom.
"This prevents LLVM from using sext instructions for local memory
offsets and allows the backend to fold immediate offsets into the
instruction. This also prevents some incorrect code generation for
ptrtoint and inttoptr instructions."
Cc: "13.0 17.0" <[email protected]>
Reviewed-by: Tom Stellard <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This must be set to ICD_LOADER_MAGIC by vkAllocateCommandBuffers, which
was being done when allocating a new buffer but not when reusing an
existing one in the cache. This would hit an assertion and crash in
debug builds of the Vulkan loader.
Fixes: 682248db451f ("radv: Cache command buffers in command pool.")
Signed-off-by: Alex Smith <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
otherwise, cached shaders aren't dumped.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
This should prevent cases when a buffer was incorrectly mapped without
synchronization just because this wasn't done.
Cc: 13.0 17.0 <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
| |
otherwise, cached shaders aren't dumped.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
| |
No intended change in behavior. Just a refactor.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
No intended change in behavior. Just a refactor.
v2: Replace vk_outarray_is_incomplete() with vk_outarray_status(). For
Jason.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This is a wrapper for a Vulkan output array. A Vulkan output array is
one that follows the convention of the parameters to
vkGetPhysicalDeviceQueueFamilyProperties().
v2: Replace vk_outarray_is_incomplete() with vk_outarray_status(). For
Jason.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this change, the generator could print this kind of things :
const uint32_t v0 =
__gen_uint(values->ValidBit, 0, 0) |
__gen_uint(values->FaultType, 1, 2) |
__gen_uint(values->SRCIDofFault, 3, 10) |
__gen_uint(values->GTTSEL, 11, 1) |
dw[0] = __gen_combine_address(data, &dw[0], values->VirtualAddressofFault, v0);
This change fix the trailing '|'.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes the following segmentation fault:
signal SIGSEGV: invalid address (fault address: 0x0)
frame #0: 0x00007fffe718e117 radeonsi_dri.so hud_draw_background_quad hud_context.c:170
167
168 assert(hud->bg.num_vertices + 4 <= hud->bg.max_num_vertices);
169
-> 170 vertices[num++] = (float) x1;
171 vertices[num++] = (float) y1;
172
173 vertices[num++] = (float) x1;
(lldb) bt
* frame #0: 0x00007fffe718e117 radeonsi_dri.so`hud_draw_background_quad
frame #1: 0x00007fffe718f458 radeonsi_dri.so`hud_draw
frame #2: 0x00007fffe712967f radeonsi_dri.so`dri_flush
Signed-off-by: Marek Olšák <[email protected]>
|