| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Requires Evergreen/Cayman and radeon kernel module
2.41.0 or newer.
Expected piglit fails due to hardware limitations:
* arb_draw_indirect-draw-arrays-prim-restart
Restarts not applied for DrawArrays commands
* arb_draw_indirect-vertexid
Base vertex offset is not included in vertex id
Marek: bump vgt_state num_dw by 3 (= space needed for one register write)
Signed-off-by: Glenn Kennard <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
Since alu does not support abs() modifier on source operands, spill
and apply the modifiers to a temp register when needed.
Signed-off-by: Xavier Bouchoux <[email protected]>
Reviewed-by: Glenn Kennard <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Caveat: Shaders using UBO/sampler indexing will
not be optimized by SB, due to SB not currently
supporting the necessary CF_INDEX_[01] index
registers.
Signed-off-by: Glenn Kennard <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit 32f2fd1c5d6088692551c80352b7d6fa35b0cd09, several calls to
_mesa_calloc(x) were replaced with calls to calloc(1, x). This is strictly
equivalent to what the code was doing previously.
But for cases where "x" involves multiplication, now that we are explicitly
using the two-argument calloc, we can do one step better and replace:
calloc(1, A * B);
with:
calloc(A, B);
The advantage of the latter is that calloc will detect any overflow that would
have resulted from the multiplication and will fail the allocation, (whereas
the former would return a small allocation). So this fix can change
potentially exploitable buffer overruns into segmentation faults.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
it also matches GL 4.2
further discussion:
http://lists.freedesktop.org/archives/mesa-dev/2013-August/042680.html
Cc: [email protected]
|
|
|
|
| |
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is my first attempt at enabling r600/r700 geometry shaders,
the basic tests pass on both my rv770 and my rv635,
It requires this kernel patch:
http://www.spinics.net/lists/dri-devel/msg52745.html
v2: address Alex comments.
Signed-off-by: Dave Airlie <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is Vadim's initial work with a few regression fixes squashed in.
v2: (airlied)
fix regression in glsl-max-varyings - need to use vs and ps_dirty
fix regression in shader exports from rebasing.
whitespace fixing.
v2.1: squash fix assert
Signed-off-by: Vadim Girlin <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
| |
It looks like we need these for geom shaders in the future.
Signed-off-by: Dave Airlie <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: fix regression on r600 NOP instructions.
Signed-off-by: Vadim Girlin <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
| |
Also slightly optimize r600_buffer_map_sync_with_rings.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This streamout state code will be used by radeonsi.
There are new structures r600_common_context and r600_common_screen.
What is inherited by what is shown here:
pipe_context -> r600_common_context -> r600_context
pipe_screen -> r600_common_screen -> r600_screen
The common structures reside in drivers/radeon. Currently they only contain
enough functionality to be able to handle streamout. Eventually I'd like
the whole pipe_screen implementation to be shared and some of the context
stuff too.
This is quite big, but most changes are because of the new structures and
the fact r600_write_value is replaced by radeon_emit.
Thanks to Tom Stellard for fixing the build for r600g/compute.
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Christian König <[email protected]>
Tested-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Vadim Girlin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Tom Stellard <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
| |
byteswap.h and bswap_32 aren't portable, replace them with calls to
gallium's util_bswap32 as suggested by Mark Kettenis. Lets these files
build on OpenBSD.
Signed-off-by: Jonathan Gray <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
New disassembler is not completely isolated yet from further processing
in r600g/sb that is not required for printing the dump, so it has higher
probability to fail in case of any unexpected features in the bytecode.
This patch adds "sbdisasm" flag for R600_DEBUG that allows to use new
disassembler in r600g/sb for shader dumps when shader optimization
is not enabled.
If shader optimization is enabled, new disassembler is used by default.
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
|
|
| |
Optimization is enabled with "R600_DEBUG=sb".
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
| |
Warning: "Conditional jump or move depends on uninitialised value(s)".
|
|
|
|
|
| |
This fixes bug 62756 :
https://bugs.freedesktop.org/show_bug.cgi?id=62756#c12
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Reduced stack size allows to run more threads in some cases,
improving performance for the shaders that use stack (that is, for the
shaders with control flow instructions). E.g. with unigine-based apps.
v4: implement exact computation taking into account wavefront size
v5: add cases for RV620, RS880
Signed-off-by: Vadim Girlin <[email protected]>
|
| |
|
| |
|
|
|
|
| |
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Only the disassembler is used to dump shaders. Here's a few examples
how to use R600_DEBUG.
Log compute info:
R600_DEBUG=compute
Dump all shaders:
R600_DEBUG=fs,vs,gs,ps,cs
Dump pixel shaders only:
R600_DEBUG=ps
Disable Hyper-Z:
R600_DEBUG=nohyperz
Disable the LLVM backend:
R600_DEBUG=nollvm
Or use any combination of the above, or print all options:
R600_DEBUG=help
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
| |
Fixes a llvm uncovered (rare) bug where consecutive exports were
merged even if they have incompatible mask.
|
|
|
|
|
| |
Tested-by: Vincent Lejeune <vljn at ovi.com>
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
R600_DUMP_SHADERS environment var now allows to choose dump method:
0 (default) - no dump
1 - full dump (old dump)
2 - disassemble
3 - both
v2: fix output for burst_count > 1
v3: use more human-readable output for kcache data in CF_ALU_xxx clauses,
improve output for ALU_EXTENDED, other minor fixes
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v3: added some flags including condition codes for ALU,
fixed issue with CF reverse lookup (overlapping ranges of CF_ALU_xxx
and other CF instructions)
rebased on current master
Signed-off-by: Vadim Girlin <[email protected]>
|
|
|
|
|
|
|
|
| |
r600_bytecode::ar_chan stores the register channel for the value that
will be loaded into the AR register.
At the moment, this field is only used by the LLVM backend. The default
backend always sets ar_chan = 0.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We keep track of ring emission order in a stack, whenever we need to
flush we empty the stack in a fifo order. There is few helpers function
for bo mapping and other ring activities that will make sure that
the ring stack is properly flush and submitted.
v2: fix st flush path, and other flush path to properly flush all
rings if necessary
v3: - improve name of ring helpers
- make sure that each time a cs is gona be written it endup at
top of the stack to avoid any issue such as :
STACK[0] = dma (withbo A,B)
STACK[1] = gfx (withbo C,D)
Now if code try to emit a dma command relative to bo C or D
it will start writting cmd stream into the cs and once it
reach the point where it adds relocation it will flush.
At that point the cs will have cmd that don't have proper
relocation into the relocation buffer and kernel will just
refuse to run.
v4: - Drop the stack idea as it turn out there is no way to use it
or benefit from it. Any time the driver start command on other
ring, it always need to flush the previous ring. So make code
simpler by not using a stack.
Signed-off-by: Jerome Glisse <[email protected]>
|
|
|
|
| |
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds TBO support to r600g, and with GLSL 1.40 enabled,
we now get 3.1 core profiles advertised for r600g.
The r600/700 implementation is a bit different from the evergreen one,
as r6/7 hw lacks vertex fetch swizzles. So we implement it by passing 5
constants per sampler to the shader, the shader uses the first 4 as masks
for each component and the 5th as the alpha value to OR in.
Now TXQ is also broken so we have to pass a constant for the buffer size,
on evergreen we just pass this, on r6/7 we pass it as the 6th element
in the const info buffer.
v1.1: drop return as DDX doesn't use a texture type
v2: add r600/700 support.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Fixes resource leak defect reported by Coverity.
Signed-off-by: Vinson Lee <[email protected]>
|
|
|
|
|
|
|
|
| |
Upcoming async dma support rely on winsys knowing about GPU families.
Signed-off-by: Jerome Glisse <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Fetch shaders are usually destroyed at the context destruction by the state
tracker, so we can put them all in a large buffer without wasting memory.
This reduces the number of relocations sent to the kernel a little bit.
Tested-by: Aaron Watry <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Alex Deucher <alexander.deucher at amd.com>
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
|
|
|
|
| |
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
|
|
|
|
|
|
|
|
|
|
| |
The 2x and 4x MSAA cases are completely broken. The lfdptr instruction returns
garbage there.
The 8x MSAA case is broken on Cayman, though at least the result looks somewhat
correct.
Only the 8x MSAA case works on Evergreen and is enabled.
|
|
|
|
| |
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
|
| |
|
|
|
|
|
|
|
| |
The state object is actually a buffer, it's literally a buffer containing
the shader code.
Reviewed-by: Jerome Glisse <[email protected]>
|
|
|
|
|
|
| |
Not sure if this is the best way to fix it.
NOTE: This is a candidate for the stable branches.
|
|
|
|
|
|
|
|
|
|
| |
LOOP_START_DX10 ignores the LOOP_CONFIG* registers, so it is not limited
to 4096 iterations like the other LOOP_* instructions. Compute shaders
need to use this instruction, and since we aren't optimizing loops with
the LOOP_CONFIG* registers for pixel and vertex shaders, it seems like
we should just use it for everything.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Signed-off-by: Tom Stellard <[email protected]>
|