| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
The alignment of a virtual memory area must always be at least 4096 bytes.
It only worked because size was aligned to 4096 outside of the function.
Signed-off-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Cleanup the duplicating flags and consolidate into a sigle variable.
Note: this patch adds VISIBILITY_CFLAGS to the following targets
* freedreno/drm
* i915/{drm,sw}
* nouveau/drm
* sw/fbdev
* sw/null
* sw/wayland
* sw/wrapper
* sw/xlib
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
Update additional register fields.
Reviewed-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
|
|
| |
The type-2 NOPs are said to be unstable. It doesn't make a difference here.
|
|
|
|
|
|
| |
Otherwise OpenGL/VDPAU interop won't work as expected.
Signed-off-by: Christian König <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
| |
Using atomic function for ncs is superfluous since it is
protected by a mutex anyway. Also lock the mutex only once
while retrieving the next CS for submission.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Calling radeon_drm_cs_flush from multiple threads might cause deadlocks,
fix this by immediately signaling the semaphore after waiting for it.
This is a candidate for the stable branch(es).
Partially fixes: https://bugs.freedesktop.org/show_bug.cgi?id=70123
v2: some fixes on commit message
Signed-off-by: Christian König <[email protected]>
|
|
|
|
| |
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Only create one screen for each winsys instance.
This helps with buffer sharing and interop handling.
v2: rebased and some minor cleanup
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Similar to GFX and DMA.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Share the winsys between different fd's if they point to the same device.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Waiting for an empty queue is nonsense and can lead to deadlocks if we have
multiple waiters or another thread that continuously sends down new commands.
Just post the cs to the queue and immediately wait for it to finish.
This is a candidate for the stable branch.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Kill the thread only after we checked that it's not used any more, not before.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This aligns the gfx, compute, and dma IBs to 8 DW boundries.
This aligns the the IB to the fetch size of the CP for optimal
performance. Additionally, r6xx hardware requires at least 4
DW alignment to avoid a hw bug. This also aligns the DMA
IBs to 8 DW which is required for the DMA engine. This
alignment is already handled in the gallium driver, but that
patch can be removed now that it's done in the winsys.
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
CC: "9.2" <[email protected]>
CC: "9.1" <[email protected]>
|
|
|
|
|
|
| |
And remove libdrm/ from a winsys include statement.
Signed-off-by: Jonathan Gray <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It moves all sampler view descriptors to a buffer.
It supports partial resource updates and it can also unbind resources
(required for FMASK texturing).
The buffer contains all sampler view descriptors for one shader stage,
represented as an array. On top of that, there are N arrays in the buffer,
which are used to emulate context registers as implemented by the previous
ASICs (each array is a context).
This uses the RCU synchronization approach to avoid read-after-write hazards
as discussed in the thread:
"radeonsi: add FMASK texture binding slots and resource setup"
CP DMA is used to clear the descriptors at context initialization and to copy
the descriptors from one context to the next.
v2: - use PKT3_DMA_DATA on CIK (I'll test CIK later)
- turn the bool CP DMA parameters into self-explanatory flags
- add a nice simple API for packet emission to radeon_winsys.h
- use 256 contexts, 128 causes texture corruption in openarena
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Cayman and trinity systems still seem to suffer from
stability problems with GPUVM. This also fixes compute
on these asics. It can still be enabled for testing
by setting env var RADEON_VA=true.
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=65958
Signed-off-by: Alex Deucher <[email protected]>
CC: "9.2" <[email protected]>
CC: "9.1" <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
| |
The original idea was that cs=NULL should be allowed here, but we never used
NULL until 862f69fbe1e54e0e9a3c439450a14f. This fixes a segfault in CoreBreach.
|
|
|
|
|
|
|
| |
Add the infrastructure to differentiate them.
Just treat them like SI for now.
Signed-off-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
| |
Covers the entire family.
Signed-off-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Set env var RADEON_VA=0 to disable VM on Cayman/Trinity.
Useful for debugging.
Note: this is a candidate for the 9.1 branch.
Signed-off-by: Alex Deucher <[email protected]>
Reviewed-by: Tom Stellard <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
RADEON_GEM_WAIT_IDLE is declared DRM_IOW but mesa
uses it with drmCommandWriteRead instead of drmCommandWrite
which leads to the ioctl being unmatched and returning an
error on at least OpenBSD.
Problem originally noticed in libdrm by Mark Kettenis.
Dave Airlie pointed out that mesa has the same issue.
Signed-off-by: Jonathan Gray <[email protected]>
|
|
|
|
|
|
|
| |
Note: this is a candidate for the 9.1 branch
Signed-off-by: Alex Deucher <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
| |
One build system for linux/unix only drivers should be enough.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48694
Acked-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This move the tracing timeout and printing into winsys and add
an debug environement variable for it (R600_DEBUG=trace_cs).
Lot of file touched because of winsys API changes.
v2: Do not write lockup file if ib uniq id does not match last one
Signed-off-by: Jerome Glisse <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Separated from UVD patch for clarity.
v2: sync with next tree for 3.10
v3: as pointed out by Andreas Bool check for drm minor >= 32
http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.10-wip
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Andreas Boll <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Build time option, set RADEON_CS_DUMP_ON_LOCKUP to 1 in radeon_drm_cs.h to
enable it.
When enabled after each cs submission the code will try to detect lockup by
waiting on one of the buffer of the cs to become idle, after a timeout it
will consider that the cs triggered a lockup and will write a radeon_lockup.c
file in current directory that have all information for replaying the cs.
To build this file :
gcc -O0 -g radeon_lockup.c -ldrm -o radeon_lockup -I/usr/include/libdrm
v2: Add radeon_ctx.h file to mesa git tree
v3: Slightly improve dumped file for easier editing, only dump first faulty cs
Signed-off-by: Jerome Glisse <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The problem is that we mix bo handles and flinked names in the hash
table. Because kms type handles are not flinked they should not be
added to the hash table. If we do that we will sooner or later
get a situation where we will overwrite a correct entry because
the bo handle was the same as a flinked name.
Note: this is a candidate for the stable branches.
Reviewed-by: Jerome Glisse <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the same context try to flink and open the object, use the
same bo struct instead of opening a new gem handle for the object.
This way we avoid avoid having 2 different handle pointing to the
same kernel object which can latter lead to trouble with virtual
address.
Fix:
https://bugs.freedesktop.org/show_bug.cgi?id=60200
Signed-off-by: Martin Andersson <[email protected]>
Reviewed-by: Jerome Glisse <[email protected]>
|
|
|
|
|
|
|
| |
Make sure one can identify virtual address failure from allocation
failure.
Signed-off-by: Jerome Glisse <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Alex Deucher <[email protected]>
Note: this is a candidate for the 9.1 branch
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We are now seing cs that can go over the vram+gtt size to avoid
failing flush early cs that goes over 70% (gtt+vram) usage. 70%
is use to allow some fragmentation.
The idea is to compute a gross estimate of memory requirement of
each draw call. After each draw call, memory will be precisely
accounted. So the uncertainty is only on the current draw call.
In practice this gave very good estimate (+/- 10% of the target
memory limit).
v2: Remove left over from testing version, remove useless NULL
checking. Improve commit message.
v3: Add comment to code on memory accounting precision
Signed-off-by: Jerome Glisse <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add ring support, you can create a cs for each ring. DMA ring is
bit special regarding relocation as you must emit as much relocation
as there is use of the buffer.
v2: - Improved comment on relocation changes
- Use a single thread to queue cs submittion this simplify driver
code while not impacting performances. Rational for this is that
you have to wait for all previous submission to have completed
so there was never a case while we could have 2 different thread
submitting a command stream at the same time. This code just
consolidate submission into one single thread per winsys.
v3: - Do not use semaphore for empty queue signaling, instead use
cond var. This is because it's tricky to maintain an even number
of call to semaphore wait and semaphore signal (the number of
cs in the stack would for instance make that number vary).
Signed-off-by: Jerome Glisse <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
v2: Andreas Boll <[email protected]>
- don't remove compatibility with scripts for the old build system
v3: Andreas Boll <[email protected]>
- remove more obsolete hacks
v4: Andreas Boll <[email protected]>
- add a previously removed TOP variable to fix vgapi build
|
| |
|
| |
|
|
|
|
| |
This should reduce the number of hash collisions in ETQW.
|
|
|
|
|
|
|
|
| |
Upcoming async dma support rely on winsys knowing about GPU families.
Signed-off-by: Jerome Glisse <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jerome Glisse <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I fixed the only known bugs on r500 with 0222b2bd4107b9e5cabfbc06c1a6ca3eae.
Now there are no piglit regressions with Hyper-Z and all apps I tested seem
to work.
To summarize how it works:
- Only one process can use it at a time. This is a hardware limitation.
- The first process to clear a zbuffer gets the exclusive access to use
Hyper-Z.
- Compositors don't use any zbuffer, so they won't steal it, but some web
browsers do, so make sure there's no web browser running if you want your
game to use Hyper-Z.
- There's no need to restart an app which couldn't get the access to Hyper-Z.
Just quit the app which took it, the driver can turn it on for the other app
in the middle of rendering.
- If an app gets the access to Hyper-Z, it prints "radeon: Acquired Hyper-Z"
to stdout.
r300-r400:
Hyper-Z will be enabled by default on r300-r400 once sufficient testing is
done with piglit and Lightsmark at least.
Be sure to set the env var RADEON_HYPERZ and run piglit with parameters: -c 0
|
|
|
|
|
|
|
|
|
|
|
| |
Don't cache pointers to elements of reallocatable array.
In some circumstances it caused false cache hits resulting in incorrect
command stream and gpu lockup.
Note: This is a candidate for the stable branches.
Signed-off-by: Vadim Girlin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch has been generated by the following Coccinelle semantic
patch:
// Don't cast the return value of malloc/realloc.
//
// Casting the return value of malloc/realloc only stands to hide
// errors.
@@
type T;
expression E1, E2;
@@
- (T)
(
_mesa_align_calloc(E1, E2)
|
_mesa_align_malloc(E1, E2)
|
calloc(E1, E2)
|
malloc(E1)
|
realloc(E1, E2)
)
|