| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
u_upload_mgr sets it, so that util_range_add can skip the lock.
The time spent in tc_transfer_flush_region decreases from 0.8% to 0.2%
in torcs on radeonsi.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the current code, we didn't do the space checks prior
to atomic counter setup emission, but we also didn't add
atomic counters to the space check so we could get a flush
later as well.
These flushes would be bad, and lead to problems with
parallel tests. We have to ensure the atomic counter copy in,
draw emits and counter copy out are kept in the same command
submission unit.
This reworks the code to drop some useless masks, make the
counting separate to the emits, and make the space checker
handle atomic counter space.
[airlied: want this in 18.2]
Fixes: 06993e4ee (r600: add support for hw atomic counters. (v3))
|
| |
|
|
|
|
| |
Acked-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
v2: add Glenn's fixes
Signed-off-by: Glenn Kennard <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This uses a different shader than radeonsi, as we can't address non-256
aligned ssbos, which the radeonsi code does. This passes some extra
offsets into the shader.
It also contains a set of u64 instruction implementation that may
or may not be complete (at least the u64div is definitely not something
that works outside this use-case). If r600 grows 64-bit integers,
it will use the GLSL lowering for divmod.
Reviewed-by: Roland Scheidegger <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
This just adds the compute paths to state handling for the
main objects
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This just builds on the image support. Evergreen only has ssbo
for fragment and compute no other stages.
v2: handle images and ssbo in the same shader properly (Ilia)
v3: fix RESQ on buffers,
fix missing atom emit
fix first element offset
use R32 format
write separate buffer rat store path.
(from running deqp gles3.1 tests)
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
and handle PIPE_FLUSH_HINT_FINISH in r300.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This adds the atoms and gallium api implementations,
along with support for compress/decompress paths for
shader images.
Tested-By: Gert Wollny <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
…and print error in such case. Which probably is not a rare event btw
because fopen doesn't expand ~ to $HOME.
Also get rid of unused "bool ret" variable.
Signed-off-by: Constantine Kharlamov <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is a poor man's version of radeonsi ddebug stuff, this
should get hooked into that infrastructure, and grow more stuff,
but for now, just create R600_TRACE var that points to a file
that you want to dump the last IB to.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Also change gs_output_prim type: unsigned → pipe_prim_type. The idea of
the code is mostly taken from radeonsi. The new code operating on
prev/curr rast_primitives saves ≈15 reloads of PA_SC_LINE_STIPPLE per
frame in Kane&Lynch2
Signed-off-by: Constantine Kharlamov <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
| |
Call r600_dma_emit_wait_idle only when there is a possibility of
a read-after-write hazard. Buffers not yet used by the SDMA IB don't
have to wait.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Check for device reset on flush. It would be nicer if the kernel just
reported this as an error on the submit ioctl (and similarly for fences),
but this will do for now.
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Calculate depth ranges from viewport states and
pipe_rasterizer_state::clip_halfz.
The evergreend.h change is required to silence a warning.
This fixes this recently updated piglit: arb_depth_clamp/depth-clamp-range
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
We can take advantage of the fact that multi_fence does the obvious thing
with NULL fences.
This fixes unflushed fences that can get stuck due to empty IBs.
|
|
|
|
|
|
|
| |
This will be used as a counter for whether fence_finish needs to flush
the IB.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
always set
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Alex Deucher <[email protected]>
Tested-by: Grazvydas Ignotas <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The main impact is that {upload, draw, upload, draw, ..} doesn't flush
framebuffer caches before every upload.
Reviewed-by: Alex Deucher <[email protected]>
Tested-by: Grazvydas Ignotas <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v3: use PFP_SYNC_ME on EG-CM only when supported by the kernel,
otherwise use MEM_WRITE + WAIT_REG_MEM to emulate that
Reviewed-by: Alex Deucher <[email protected]>
Tested-by: Grazvydas Ignotas <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
| |
This will allow removing useless cache & IB flushes.
Reviewed-by: Alex Deucher <[email protected]>
Tested-by: Grazvydas Ignotas <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This has been wrong all along. Fixing this will allow removing useless
cache flushes.
Cc: 11.1 11.2 12.0 <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Tested-by: Grazvydas Ignotas <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
| |
We will chain multiple chunks together and will keep pointers to the older
chunks to support IB dumping.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Mostly generated using a sed-script, with manual fix-up for multi-line
statements.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
This prevents IB rejections due to insane memory usage from
many concecutive texture uploads.
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
It's the same as radeonsi. This adds guard band support to r600g.
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Grigori Goronzy <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
not used anymore
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
All of them are paused only between IBs.
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Cons:
- it was only integrated in r600g
- it doesn't work with GPUVM
- it records buffer contents at the end of IBs instead of at the beginning,
so the replay isn't exact
- it lacks an IB parser and user-friendliness
A better solution is apitrace in combination with gallium/ddebug, which
has a complete IB parser and can pinpoint hanging CP packets.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using tessellation on eg/ni chipsets, we must disable
dynamic GPRs to workaround a hw bug where the GPU hangs
when too many things get queued.
This implements something like the r600 code to emit
the transition between static and dynamic GPRs, and to
statically allocate GPRs when tessellation is enabled.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
with tessellation vs can now run on ls, and tes can
run on vs or es, tcs runs on hs.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This moves to using an array of hw stages for the atoms.
Note this drops the 23 from the vertex shader, this value
is calculated internally when shaders are bound, so not
required here.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
Not needed anymore. A similar flag will be introduced in the next commit,
which will be private in radeonsi.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
need_cs_space isn't invoked so often and is called before all commands too.
This is a lot cleaner. The code in radeon_add_to_buffer_list always seemed
dodgy to me.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
Use the priority flags and expand them.
This information will be used for debugging.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
| |
Now that R600_NUM_ATOMS is below 64, dirty atom tracking can be
simplified.
Signed-off-by: Marek Olšák <[email protected]>
|