| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
useful for radeonsi performance counters
v2: allow specifying both : and =
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
It's redundant with the source modifier.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
It's redundant with the source modifier.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
This is required to drop gallium SUB.
Signed-off-by: Axel Davy <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
This is required to remove gallium SUB.
Signed-off-by: Axel Davy <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
This is required for gallium SUB and ABS to be removed.
Signed-off-by: Axel Davy <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
This fixes the mistake introduced in commit
b6737a8bcd03ea68952799144c0c6e6e6679bee9
Signed-off-by: Nayan Deshmukh <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
| |
Old test caused breakage with llvm-svn (4.0.0svn), and not needed as
the minimum required llvm version for swr is 3.6.
Reviewed-by: George Kyriazis <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
wrap lp_bld_type.h around extern "C".
Windows decorates global variables, so when used from .cpp files, need
to use an undecorated version.
Also, removed related and unneeded code from swr_screen.cpp
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This seems to fix the GPU hangs caused by:
commit ed3190b3f3a776fc8c75b1e6130a88079166d115
Author: Marek Olšák <[email protected]>
Date: Sun Nov 13 18:41:43 2016 +0100
radeonsi: don't export ClipVertex and ClipDistance[] if clipping is disabled
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99219
Tested-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
| |
Drivers with good compilers don't need aggressive optimizations before TGSI.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
The context may be used by texture_get_handle.
Reviewed-by: Christian König <[email protected]>
Cc: 13.0 <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The context may be used by texture_get_handle.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99158
Reviewed-by: Christian König <[email protected]>
Cc: 13.0 <[email protected]>
|
|
|
|
|
|
|
|
| |
Useful when debugging with R600_DEBUG=vm,check_vm to match
addr in both outputs.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
It's more user friendly and it avoids to write files in unexpected
places.
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make sure unused ops and their references are removed, prior to entering
the GCM (global code motion) pass, to stop GCM from breaking the loop
logic and thus hanging the GPU.
Turns out, that sb has problems with loops and node optimizations
regarding associative folding:
- the global code motion (gcm) pass moves ops up a loop level/basic block
until they've fulfilled their total usage count
- if there are ops folded into others, the usage count won't be
fulfilled and thus the op moved way up to the top
- within GCM the op would be visited and their deps would be moved
alongside it, to fulfill the src constaints
- in a loop, an unused op is moved out of the loop and GCM would move
the src value ops up as well
- now here arises the problem: if the loop counter is one of the src
values it would get moved up as well, the loop break condition would
never get hit and the shader turn into an endless loop, resulting in the
GPU hanging and being reset
A reduced (albeit nonsense) piglit example would be:
[require]
GLSL >= 1.20
[fragment shader]
uniform int SIZE;
uniform vec4 lights[512];
void main()
{
float x = 0;
for(int i = 0; i < SIZE; i++)
x += lights[2*i+1].x;
}
[test]
uniform int SIZE 1
draw rect -1 -1 2 2
Which gets optimized to:
===== SHADER #12 OPT ================================== PS/BARTS/EVERGREEN =====
===== 42 dw ===== 1 gprs ===== 2 stack =========================================
ALU 3 @24
1 y: MOV R0.y, 0
t: MULLO_UINT R0.w, [0x00000002 2.8026e-45].x, R0.z
LOOP_START_DX10 @22
PUSH @6
ALU 1 @30 KC0[CB0:0-15]
2 M x: PRED_SETGE_INT __.x, R0.z, KC0[0].x
JUMP @14 POP:1
LOOP_BREAK @20
POP @14 POP:1
ALU 2 @32
3 x: ADD_INT R0.x, R0.w, [0x00000002 2.8026e-45].x
TEX 1 @36
VFETCH R0.x___, R0.x, RID:0 MFC:16 UCF:0 FMT[..]
ALU 1 @40
4 y: ADD R0.y, R0.y, R0.x
LOOP_END @4
EXPORT_DONE PIXEL 0 R0.____ EOP
===== SHADER_END ===============================================================
Notice R0.z being the loop counter/break condition relevant register
and being never incremented at all. Also some of the loop content
has been moved out of it, to fulfill the requirements for the one unused
op.
With a debug build of mesa this would produce an error like
error at : PRED_SETGE_INT __, __, EM.2, R1.x.2||[email protected], C0.x
: operand value R1.x.2||[email protected] was not previously written to its gpr
and the compilation would fail due to this. On a release build it gets
passed to the GPU.
When using this patch, the loop remains intact:
===== SHADER #12 OPT ================================== PS/BARTS/EVERGREEN =====
===== 48 dw ===== 1 gprs ===== 2 stack =========================================
ALU 2 @24
1 y: MOV R0.y, 0
z: MOV R0.z, 0
LOOP_START_DX10 @22
PUSH @6
ALU 1 @28 KC0[CB0:0-15]
2 M x: PRED_SETGE_INT __.x, R0.z, KC0[0].x
JUMP @14 POP:1
LOOP_BREAK @20
POP @14 POP:1
ALU 4 @30
3 t: MULLO_UINT T0.x, [0x00000002 2.8026e-45].x, R0.z
4 x: ADD_INT R0.x, T0.x, [0x00000002 2.8026e-45].x
TEX 1 @40
VFETCH R0.x___, R0.x, RID:0 MFC:16 UCF:0 FMT[..]
ALU 2 @44
5 y: ADD R0.y, R0.y, R0.x
z: ADD_INT R0.z, R0.z, 1
LOOP_END @4
EXPORT_DONE PIXEL 0 R0.____ EOP
===== SHADER_END ===============================================================
Piglit: ./piglit summary console -d results/*_gpu_noglx
name: unpatched_gpu_noglx patched_gpu_noglx
---- ------------------- -----------------
pass: 18016 18021
fail: 748 743
crash: 7 7
skip: 1124 1124
timeout: 0 0
warn: 13 13
incomplete: 0 0
dmesg-warn: 0 0
dmesg-fail: 0 0
changes: 0 5
fixes: 0 5
regressions: 0 0
total: 19908 19908
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94900
Tested-by: Heiko Przybyl <[email protected]>
Tested-on: Barts PRO HD6850
Signed-off-by: Heiko Przybyl <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The variable actually needs to be signed, otherwise converting it to a
float doesn't work as expected.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98914
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Nayan Deshmukh <[email protected]>
Cc: "13.0" <[email protected]>
Fixes: 1fb4179f927 ("vl: Fix trivial sign compare warnings")
|
|
|
|
|
|
|
|
| |
handle the cases when vl_compositor_set_csc_matrix(),
vl_compositor_init_state() and vl_compositor_init() fail
Signed-off-by: Nayan Deshmukh <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
| |
handle the cases when vl_compositor_set_csc_matrix(),
vl_compositor_init_state() and vl_compositor_init() fail
Signed-off-by: Nayan Deshmukh <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
| |
pipe_buffer_map and pipe_buffer_create may return NULL
Signed-off-by: Nayan Deshmukh <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
| |
|
|
|
|
| |
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
The function will be used later to create the filedescriptor
for other metrics.
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Dump values for every selected data source in GALLIUM_HUD.
Every data source has its own file and the filename is
equal to the data source identifier.
Set GALLIUM_HUD_DUMP_DIR to dump values to files in this directory.
No values are dumped if the environment variable is not set, the
directory doesn't exist or the user doesn't have write access.
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
See:
dEQP-GLES2.functional.shaders.swizzles.vector_swizzles.mediump_vec2_yyyy_fragment
if we only access (in FS) varying.y then it ends up in slot zero.. I'm
not sure the hw likes that..
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
This matches the naming of nir_lower_vars_to_ssa, the other to-SSA pass.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Jonas's patch got us most of the benefit of scheduling instructions into
the delay slots of thread switch, but if there had been nothing to pair
the thrsw with, it would move the thrsw up and leave a NOP where the thrsw
was.
Instead, don't pair anything with thrsw through the normal scheduling
path, and have a separate helper function that inserts the thrsw earlier
if possible and inserts any necessary NOPs.
total instructions in shared programs: 93027 -> 92643 (-0.41%)
instructions in affected programs: 14952 -> 14568 (-2.57%)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Scan for instructions without a signal set in front of the switching
instruction and move the signal up there.
shader-db results:
total instructions in shared programs: 94494 -> 93027 (-1.55%)
instructions in affected programs: 23545 -> 22078 (-6.23%)
v2: Fix re-emitting of the instruction in the loop trying to emit NOPs,
drop a scheduling change from branch delay slots. (by anholt)
Signed-off-by: Jonas Pfeil <[email protected]>
|
|
|
|
|
| |
This successfully unrolls a new shader in GLB2.7, which also gets that
shader to successfully compile in multithreaded mode.
|
|
|
|
|
|
|
| |
It should actually be 32 for a4xx/a5xx.. we still only advertise 16 but
for a5xx the linkage map includes position/psize.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
We need this in case it is streamed out. Not sure why we were treating
it specially before. Having it as a VS out is harmless if FS doesn't
have a matching input.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We'll need to revisit when adding hw binning pass support, whether we
can still do this in main draw step, as we do w/ a3xx/a4xx, or if we
needed to move it to the binning stage.
Still some failing piglits but most tests pass and the common cases seem
to work.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Pull in a5xx streamout related regs. Also fixes a couple incorrect
register definitions.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Update address calculation to support 64b addresses.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Rework how we lay out driver constants (driver-params, UBO/TFBO buffer
addresses, immediates) for more flexibility. For a5xx+ we need to deal
with the fact that gpu ptrs are 64b instead of 32b, which makes the
fixed offset scheme not work so well. While we are dealing with that
we might also make the layout more dynamic to account for varying # of
UBOs, etc.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Reloc for the buffer address is two dwords on 64b devices (a5xx+)
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Seems to be imilar to a4xx, and sampler state "array-pitch" needs
to be aligned to page size.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
For dealing w/ 32b vs 64b gpu addresses, I need to rework how we pass
UBO buffer addresses to shader, and knowing up front the # of UBOs is
useful. But I noticed ttn wasn't setting this.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Presently errors from frontend are handled only if they occur in
clang::CompilerInvocation::CreateFromArgs(). This patch uses
clang::DiagnosticsEngine to detect errors such as invalid values for
Clang frontend arguments.
Fixes Piglit's cl/program/build/fail/invalid-version-declaration.cl
test.
v2: fix inconsistent code formatting
Signed-off-by: Vedran Miletić <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
Tested-by: Aaron Watry <[email protected]>
|