| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
so that we can put multiple different TGSI shaders into one module.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This code can be shared by radv, we bump the max to
VARYING_SLOT_MAX here, but that shouldn't have too
much fallout.
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
47109 shaders in 29632 tests
Totals:
SGPRS: 1917364 -> 1916620 (-0.04 %)
VGPRS: 1165802 -> 1165202 (-0.05 %)
Spilled SGPRs: 1880 -> 1843 (-1.97 %)
Spilled VGPRs: 70 -> 65 (-7.14 %)
Private memory VGPRs: 1184 -> 1184 (0.00 %)
Scratch size: 1312 -> 1308 (-0.30 %) dwords per thread
Code Size: 60211356 -> 60192268 (-0.03 %) bytes
LDS: 1077 -> 1077 (0.00 %) blocks
Max Waves: 428597 -> 428674 (0.02 %)
Wait states: 0 -> 0 (0.00 %)
Totals from affected shaders:
SGPRS: 238173 -> 237429 (-0.31 %)
VGPRS: 149556 -> 148956 (-0.40 %)
Spilled SGPRs: 1263 -> 1226 (-2.93 %)
Spilled VGPRs: 25 -> 20 (-20.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 20 -> 16 (-20.00 %) dwords per thread
Code Size: 10457904 -> 10438816 (-0.18 %) bytes
LDS: 50 -> 50 (0.00 %) blocks
Max Waves: 41283 -> 41360 (0.19 %)
Wait states: 0 -> 0 (0.00 %)
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Because the buffer is new, it can't be referenced by any CS.
This can save few CPU cycles by skipping the whole
PIPE_TRANSFER_UNSYNCHRONIZED if in amdgpu_bo_map().
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
A merged LS-HS shader needs both fix_fetch and inputs_to_copy
for compilation.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Cc: 17.0 17.1 <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Cc: 17.1 <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Cc: 17.1 <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Cc: 17.1 <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Cc: 17.1 <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are 2 major hw changes:
- The address must always point to the address of level 0. GFX9 tiling
modes don't allow binding to a non-0 level.
- 3D must always be bound as 3D, because 2D and 3D use entirely different
tiling modes, and the texture target determines which set of modes is
used.
Cc: 17.1 <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Cc: 17.1 <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This hides the overhead of everything in the driver after the CS flush and
before returning from pipe_context::flush.
Only microbenchmarks will benefit.
+2% FPS for glxgears.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
cleanup
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
The VS state sets it.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
Not possible with GL and it will make future gallium rework easier.
(also it's something I wouldn't like to support)
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
because the compression is skipped with non-dirty textures.
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
For robustness and testing purposes.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gl_BaseVertex is supposed to be 0 in non-indexed draws. Unfortunately, the
way they're implemented, the VGT always generates indices starting at 0,
and the VS prolog adds the start index.
There's a VGT_INDX_OFFSET register which causes the VGT to start at a
driver-defined index. However, this register cannot be written from
indirect draws.
So fix this unlikely case by setting a bit to tell the VS whether the
draw is indexed or not, so that gl_BaseVertex can be adjusted accordingly
when used.
Fixes a bug in
KHR-GL45.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters.*
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
v2: fix incorrect change in get_tcs_out_patch_stride
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Avoid conflicts when merging various VS state bits.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
We will merge other derived state information into this register.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Especially with subsequent changes, this makes it easier to see the
sequence of state emits at the higher level.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
It is unused.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
For bindless.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
For bindless.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
v2: - rename to depth_needs_decompression() instead
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
No need to check all color buffers.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
No need to compute the offset in the descriptor twice.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
Require LLVM 5.0 or later because LLVM 4.0 is easily fooled into
putting the lane select of llvm.amdgcn.readlane into a VGPR and then
fails to continue to compile.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Notably, llvm.amdgcn.readfirstlane and llvm.amdgcn.icmp may be hoisted
out of loops or if/else branches in cases like
if (cond) {
v = readFirstInvocationARB(x);
... use v ...
} else {
v = readFirstInvocationARB(x);
... use v ...
}
===>
v = readFirstInvocationARB(x);
if (cond) {
... use v ...
} else {
... use v ...
}
The optimization barrier is a heavy hammer to stop that until LLVM
is taught the semantics of the intrinsic properly.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LLVM will lift inline assembly out of if-else-blocks if both paths have
the same inline assembly. Prevent this by adding an irrelevant unique
text to the assembly.
This requires the LLVM assembly parser to be initialized.
Furthermore, allow forcing subsequent computations to happen after the
optimization barrier by defining a data dependency.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
64-bit system values are stored as v2i32 to simplify the fetch logic.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
For simplicitly, always store system values as 32-bit values or arrays
of 32-bit values. 64-bit values are unpacked and packed accordingly.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
ARB_shader_ballot introduces 7 new system values that can be used
in all shader stages.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
v2:
- fill in DRM version requirement
- disable on SI due to CP DMA faults
Reviewed-by: Marek Olšák <[email protected]>
|