| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Since we really cannot share them ever.
Also remove an unused switch.
Fixes: b70829708ac "radv: Implement VK_KHR_external_memory"
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Derived from the Intel code.
For the internal format we just use the internal Vulkan format,
as we have Vulkan formats for all android formats we care about.
For the ycbcr properties we just do something. I do not have a real
clue what would be recommended.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
| |
You cannot go from BO to Android hardware buffer, so for export we
have to remember it.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
| |
Still disabled but now we can add entrypoints.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
| |
For better test coverage of this corner case.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The minigbm comment really says it all. We should
fix minigbm as well, but for now this is the more
robust solution.
Note that this only changes width and height for
the surface creation, not for the image and hence
also not for the sampler, where it would wreak
havoc due to the normalized coords.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We want this flexibility because in GFX10 we lose any stride fields,
so we have to make sure our width/height are in alignment with
the external image we import.
Furthermore, we need the ability to inject tiling modifiers on import
time which is strictly after create time for Android. So, with the
layout & patch functions being fully independent of pCreateInfo, we
can delay it until import/bind time.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
| |
So we can delay the layout until later in some import cases.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
| |
Less duplication/complexity.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
| |
Unused stride/offset args.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the timestamp is not ready (ie. UINT64_MAX), the availabily bit
should be zero. The previous code used to copy the timestamp value
as the availabily bit and that's completely wrong.
Because it's not that simple to emit a conditional with the CP, the
driver now uses a compute shader for copying timestamp query results.
Fixes dEQP-VK.pipeline.timestamp.misc_tests.reset_query_before_copy.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
Otherwise, the GPU might write timestamp queries after the reset
operation. This is similar to other query operations.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The spec has probably been misinterpreted during RADV bringup.
This fixes GPU hangs with dEQP-VK.binding_model.*offset_nonzero*.
Fixes: f4e499ec791 ("radv: add initial non-conformant radv vulkan driver")
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
NIR->LLVM and ACO already support nir_intrinsic_shader_clock.
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit is a step towards the goal of being able to build RADV
without LLVM. In the future we would like to offer the option to
use RADV solely with ACO. There is still a need for the common AMD
code located in amd/common but the LLVM specific parts need to be
separated.
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Acked-by: Marek Olšák <[email protected]>
Acked-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This simplifies ACO and allows the lowered code to be optimized (in
particular, constant folded).
Totals from affected shaders:
SGPRS: 1776 -> 1776 (0.00 %)
VGPRS: 1436 -> 1436 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 203452 -> 203564 (0.06 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 103 -> 103 (0.00 %)
At least some of the code size increase seems to be from literals being
applied to instructions as a result of constant folding.
v2: remove fmod/frem handling in init_context()
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This lowers fmod and frem at NIR level like RadeonSI. fmod is
already lowered directly in NIR->LLVM, and frem will be lowered by
LLVM anyways.
This fixes a LLVM crash with:
dEQP-VK.glsl.builtin.precision_fp16_storage32b.frem.compute.scalar.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
uintptr_t is 32 bits in a 32-bits build, resulting in shifting out
of bounds.
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We need the continue CS for referencing the tess/GDS/sample position BOs.
Fixes: 46e52df34d3 "radv: add tessellation ring allocation support. (v2)"
Fixes: e1dc3ab7534 "radv/gfx10: allocate GDS/OA buffer objects for NGG streamout"
Fixes: 1171b304f30 "radv: overhaul fragment shader sample positions."
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
| |
Random hangs no longer happen, I'm actually not sure if they were
related to this.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
Forgot to amend the commit before updating the MR.
Signed-off-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
| |
This was actually the wrong fix.
This reverts commit 0a313cc285c2939de9cac07f045b0b699bc208ca.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Make sure to export the expected clear values to the depth
stencil attachment.
This fixes dEQP-VK.pipeline.depth_range_unrestricted.* on GFX10.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The number of vertices has to be adjusted with the output primitive
type.
This fixes dEQP-VK.transform_feedback.simple.triangle_strip_*.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The GS outputs are stored differently in the LDS storage, they
are indexed by out_idx which is incremented for each stored DWORD.
Thus, we need a different path for exporting the stream outputs.
This fixes a bunch of CTS failures when NGG GS is force enabled.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
It's unnecessary to store/load more components that needed.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The LDS storage allocated for stream outputs is 4 * N, where N
is the number of outputs. So, we have to store/load with N as index
and not with the output location as index.
This doesn't fix anything known but it should fix out-of-bounds
access and it also reduces the number of outputs written to the
LDS storage.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
The buffer isn't necessarily used before.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
Trivial.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
40228 shaders in 20236 tests
Totals:
SGPRS: 2045512 -> 2046496 (0.05 %)
VGPRS: 1430856 -> 1430464 (-0.03 %)
Spilled SGPRs: 1077 -> 1077 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 10348 -> 10348 (0.00 %) dwords per thread
Code Size: 77202840 -> 77151832 (-0.07 %) bytes
LDS: 863 -> 863 (0.00 %) blocks
Max Waves: 260729 -> 260754 (0.01 %)
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Android building rules are added in src/amd/Android.compiler.mk
libmesa_aco static library is built conditionally to radeonsi
as done for vulkan.radv module
This will prevent Android build errors for non x86 systems
filter-out compiler/aco_instruction_selection_setup.cpp source,
as already included by compiler/aco_instruction_selection.cpp
and would cause several multiple definition linker errors
NOTE: libLLVM requires AMDGPU Disassembler to build radv with aco
Fixes: 93c8ebf ("aco: Initial commit of independent AMD compiler")
Fixes: a70a998 ("radv/aco: Setup alternate path in RADV to support the experimental ACO compiler")
Signed-off-by: Mauro Rossi <[email protected]>
|
|
|
|
|
|
|
|
| |
According to radeonsi, GLM doesn't support WB alone, so
we have to set INV too when WB is set.
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
This new option can help debug shader compiler problems when
there are issues with the meta shaders.
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Add a function called ac_get_fs_input_vgpr_cnt which will return
the number of input VGPRs used by an AMD shader. Previously,
radv and radeonsi had the same code duplicated, but this commit also
allows them to share this code.
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
This commit allows RADV to set the shared VGPR count according to
the shader config.
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We were setting this twice. The second time, we weren't later disabling
it if unsupported.
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Daniel Schürmann <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
Fixes: cdc6efddf918bc07d30d ("radv: implement all depth/stencil resolve modes using graphics")
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Released today and hangs on RADV. We don't have the root cause yet,
but this should unblock people playing the game.
No drirc because the radv debugflags are not usable from drirc and
I want this backported.
CC: <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
LLVM does this anyway, but for ACO we need to do it in NIR.
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
For now, this extension will only be enabled for ACO.
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
These work with both, LLVM and ACO.
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
LLVM remains default and ACO can be enabled with RADV_PERFTEST=aco.
Co-authored-by: Daniel Schürmann <[email protected]>
Co-authored-by: Rhys Perry <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Apparently we already enabled it without having support ...
Not sure if we also need to set disable_start_of_prim when the PS
has memory writes, but this mirrors radeonsi.
Doubles fillrate in my dual_quad_bench from ~16 pixels/cycles to
~32 pixels/cycle on a Raven.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
| |
When actually implementing it, Talos on low is still 3% slower.
Reviewed-by: Samuel Pitoiset <[email protected]>
|