| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
The EXT values are really large, e.g.
VK_DYNAMIC_STATE_DISCARD_RECTANGLE_EXT = 1000099000, so 1 << value
is not going to fit into a 32-bit mask.
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
It makes more sense to rely on nir_intrinsic_load_push_constant
instead of the pipeline layout.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
nir_to_llvm_context will always be NULL for radeonsi so we need
work around this.
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes the following piglit tests in radeonsi:
vs-tcs-tes-tessinner-tessouter-inputs-quads.shader_test
vs-tcs-tes-tessinner-tessouter-inputs-tris.shader_test
vs-tes-tessinner-tessouter-inputs-quads.shader_test
vs-tes-tessinner-tessouter-inputs-tris.shader_test
v2: make use of si_shader_io_get_unique_index_patch()
via the helper in the previous patch rather than
shader_io_get_unique_index()
Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
VGPR1 = InstanceID / StepRate0; // StepRate0 can be set to 1
Ported from RadeonSI.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For Vega10 and Raven that need a special workaround for the
scissor bug.
This seems to give a minor boost for Talos and Dota 2, at least.
To reduce the cost of memcmp, the driver checks if it's
really useful to do the comparison.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
VGPR1 is only needed for topology that needs 3 offsets like
triangles or quads.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
Needed for the following commit.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
| |
Found by inspection.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to RadeonSI.
This fixes:
dEQP-VK.image.texel_view_compatible.graphic.basic.attachment_read.bc*r16g16b16a16_sfloat
dEQP-VK.image.extended_usage_bit.attachment_write.r16_sfloat
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When allocate_user_sgprs() was called, ctx->stage was actually
unset and 0 is for the vertex shader. This doesn't change
anything for now because of the spill support thing.
Though, the number of user SGPRs has to be fixed for merged
shaders on GFX9. It was broken before anyway.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This might decrease VGPR spilling, because we no longer
have to use v4i32 for 2D fetches when level == 0. We now
use v2i32 for those cases.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
Simplifies the logic a little and asserts index is 0.
Suggested-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
We will call these from the radeonsi NIR backend.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
This also enables some code sharing with tes.
V2: drop type param and just use ctx->i32
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
V2: drop type param and just use ctx->i32
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Copied from radeonsi.
Putting in the correct metadata flush commands for eventually not
flushing L2 on CB/DB switch.
Does not remove the need for V_028A90_CACHE_FLUSH_AND_INV_TS_EVENT
at the moment.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
| |
These are just shaders reads, so we need to invalidate L1.
Fixes: 6dbb0eaccc "radv: handle subpass cache flushes"
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
| |
This can still be improved, but let's start with this.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It makes more sense to move all scan stuff in the same place.
Also, we don't really need to duplicate the uses_primid field
for each stages.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
sync_files are in linux since 4.7, while the amdgpu fence_to_handle
ioctl is only in 4.15.
In particular we don't need it for sync_file in radv, because
everything happens via syncobjs, which got support earlier than
fence_to_handle.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Fixes: f4e499ec791 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
When rasterization is disabled we can have that few.
Fixes: 76603aa90b8 "radv: Drop the default viewport when 0 viewports are given."
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Seems like users are actually hitting 0xFFFFFFFF actually making
things broken for them, and the mad max regression is fixed, so
lets put this in once more.
v2: Use 0xf for depth-only htile. (Dave)
Fixes: af2844116fd "radv: Revert HTILE reset word to 0xFFFFFFFF."
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Overall it does not really help or hurt. The deferred demo gets 1%
improvement and some games a 3% decrease, so I don't think this
should be enabled by default.
But with the code upstream it is easier to experiment with it.
v2: Remove initializing the registers from si_emit_config.
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Letting it be disabled by default.
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
Those are implemented as texture sampling, so we need to make the
texture TC-compatible too.
Fixes: 34d23e82ca9 "radv: set some dcc parameters depending on if texture will be sampled"
Reviewed-by: Fredrik Höglund <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Before this DCC was in practice disabled for most games. This
enables practical DCC use. Expect a 5-10% perf increase on a
bunch of games on vega @ 4k.
Reviewed-by: Dave Airlie <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
If both source and destination are DCC compressed, and their formats
are not compatible, we need to decompress one of them to make
sure we can do reinterpretation (which needs src format == dst format)
.
Reviewed-by: Dave Airlie <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Apps can use this for render feedback loops, where things are
defined if they render each pixel only once. However, DCC fails
here, as the level of coherence is a block not a pixel, so disable it.
This is also going to help implementing other stuff.
Even if we optimize this later to only happen if there actually is
a loop (if possible at all ...), then the machinery is still useful
to exclude images accessible by the SDMA queue when that is implemented.
Reviewed-by: Dave Airlie <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
| |
It should already be valid there + the RB will update it during
rendering.
Reviewed-by: Dave Airlie <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|