| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch switches non-TGSI compute shaders over to using the HSA
ABI described here:
https://github.com/RadeonOpenCompute/ROCm-Docs/blob/master/AMDGPU-ABI.md
The HSA ABI provides a much cleaner interface for compute shaders and allows
us to share more code in the compiler with the HSA stack.
The main changes in this patch are:
- We now pass the scratch buffer resource into the shader via user sgprs
rather than using relocations.
- Grid/Block sizes are now passed to the shader via the dispatch packet
rather than at the beginning of the kernel arguments.
Typically for HSA, the CP firmware will create the dispatch packet and set
up the user sgprs automatically. However, in Mesa we let the driver do
this work. The main reason for this is that I haven't researched how to
get the CP to do all these things, and I'm not sure if it is supported
for all GPUs.
v2:
- Add comments explaining why we are setting certain bits of the scratch
resource descriptor.
v3:
- Use amdgcn-mesa-mesa3d triple instead of amdgcn--mesa3d.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The LLVM compiler can CSE interp intrinsics thanks to
LLVMReadNoneAttribute.
26011 shaders in 14651 tests
Totals:
SGPRS: 1146340 -> 1132676 (-1.19 %)
VGPRS: 727371 -> 711730 (-2.15 %)
Spilled SGPRs: 2218 -> 2078 (-6.31 %)
Spilled VGPRs: 369 -> 369 (0.00 %)
Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread
Code Size: 35841268 -> 36009732 (0.47 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 222559 -> 224779 (1.00 %)
Wait states: 0 -> 0 (0.00 %)
v2: don't call load_input for fragment shaders in emit_declaration
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was missed in:
commit 0d2e43fcb1198a6e67c85feadb1ca8c360ddc284
Author: Marek Olšák <[email protected]>
Date: Thu Aug 18 16:30:00 2016 +0200
gallium/radeon: derive buffer placement and flags only at initialization
Tested-by: Michel Dänzer <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Fixes segfaults in EG compute since:
commit 21de3be8e62b2b093569a99550e6356ed2f106b4
radeonsi: fix texture format reinterpretation with DCC
Signed-off-by: Jan Vesely <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
The radeonsi driver doesn't and shouldn't care about the buffer index.
Only the virtual addresses matter.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
In some places (e.g. shader program pointers) we require 256 bytes alignment.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
not used in any useful way
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
There is nothing special happening in those code blocks.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The problem is that TC-compatible DCC clear codes translate
into different clear values when you change the format.
I have a new piglit reproducing the issue.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
It should be possible to get TC-compatible fast clear more often now.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
| |
|
|
|
|
|
|
|
| |
This just moves these to a common header file.
Acked-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Step one to merging radv would be to move some files around.
This only adds the include path to r600/radeonsi, because later
we want to avoid having to add it to the generic target paths.
Acked-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Calculate depth ranges from viewport states and
pipe_rasterizer_state::clip_halfz.
The evergreend.h change is required to silence a warning.
This fixes this recently updated piglit: arb_depth_clamp/depth-clamp-range
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
num-cs-flushes will mean compute shader flushes
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
DCC is limited in how texture formats can be reinterpreted using texture
views. If we get a view format that is incompatible with the initial
texture format with respect to DCC, disable DCC.
There is a new piglit which tests all format combinations.
What works and what doesn't was deduced by looking at the piglit failures.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
just do what the comment says
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
For coherency with the current context.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
For coherency with the current context.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Invalidated buffers don't have to go through it.
Split r600_init_resource into r600_init_resource_fields and
r600_alloc_resource.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A few weeks ago, Jose Fonseca suggested [0] we use .editorconfig files
to try and enforce the formatting of the code, to which Michel Dänzer
suggested [1] we start by importing the existing .dir-locals.el
settings. The first draft was discussed in the RFC [2].
These .editorconfig are a first step, one that has the advantage of
requiring little to no intervention from the devs once the settings
files are in place, but the settings are very limited. This does have
the advantage of applying while the code is being written.
This doesn't replace the need for more comprehensive formatting tools
such as clang-format & clang-tidy, but those reformat the code after
the fact.
[0] https://lists.freedesktop.org/archives/mesa-dev/2016-June/121545.html
[1] https://lists.freedesktop.org/archives/mesa-dev/2016-June/121639.html
[2] https://lists.freedesktop.org/archives/mesa-dev/2016-July/123431.html
Acked-by: Nicolai Hähnle <[email protected]>
Acked-by: Eric Anholt <[email protected]>
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: document the new cap
v3: fix 80 char limit in screen.rst
Signed-off-by: Jan Vesely <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
This fixes: GL45-CTS.texture_barrier.*
Tested-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
If the kernel driver doesn't support it, it returns 0.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
there's no reason to separate these
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
It crashes.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97413
|
|
|
|
|
|
|
| |
We can take advantage of the fact that multi_fence does the obvious thing
with NULL fences.
This fixes unflushed fences that can get stuck due to empty IBs.
|
|
|
|
|
|
|
|
| |
radeonsi needs to do some operations (DCC decompression) for OpenGL-OpenCL
interop and this is the only way to make it coherent with the current
context. It can optionally be set to NULL.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
Set the flag on when dual instance encoding is supported,
otherwise set it to off.
Signed-off-by: Boyuan Zhang <[email protected]>
|
|
|
|
| |
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
| |
just FYI, the kernel receives priority/4
Acked-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
free 60..63, move CP_DMA up
Acked-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
and rename the enum
Acked-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Acked-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This is just a workaround. The problem is described in the code.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96541
v2: say that it's only between the current context and aux_context
Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
Avoid building all those store 0 / store undef instruction pairs that
end up getting removed anyway.
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
They can lead to VM faults and worse, which goes against the GL robustness
promises.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
v2: take actual writemasks into account
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Also, prepare for using tgsi_array_info.
This also opens the door for properly handling allocation failures, but I'm
leaving that for a separate change.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Doing the write-back of the temporary vector in radeon_llvm_emit_store makes
no sense.
This also allows us to get rid of get_alloca_for_array.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
To have the same signature as get_array_range.
Reviewed-by: Marek Olšák <[email protected]>
|