| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
On 32-bit we need to use PRIu64 flags for printfs,
otherwise this segfaults in R600_DEBUG=help otherwise.
Reviewed-by: Marek Olšák <[email protected]>
Cc: "11.0" <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The NIR cursor API is exactly what we want for the builder's insertion
point. This simplifies the API, the implementation, and is actually
more flexible as well.
This required a bit of reworking of TGSI->NIR's if/loop stack handling;
we now store cursors instead of cf_node_lists, for better or worse.
v2: Actually move the cursor in the after_instr case.
v3: Take advantage of nir_instr_insert (suggested by Connor).
v4: vc4 build fixes (thanks to Eric).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]> [v1]
Reviewed-by: Jason Ekstrand <[email protected]> [v4]
Acked-by: Connor Abbott <[email protected]> [v4]
|
|
|
|
| |
Trivial.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: lots of improvements
This is like identity or trace, but simpler. It doesn't wrap most states.
Run with:
GALLIUM_DDEBUG=1000 [executable]
where "executable" is the app and "1000" is in miliseconds, meaning that
the context will be considered hung if a fence fails to signal in 1000 ms.
If that happens, all shaders, context states, bound resources, draw
parameters, and driver debug information (if any) will be dumped into:
/home/$username/dd_dumps/$processname_$pid_$index.
Note that the context is flushed after every draw/clear/copy/blit operation
and then waited for to find the exact call that hangs.
You can also do:
GALLIUM_DDEBUG=always
to do the dumping after every draw/clear/copy/blit operation without
flushing and waiting.
Examples of driver states that can be dumped are:
- Hardware status registers saying which hw block is busy (hung).
- Disassembled shaders in a human-readable form.
- The last submitted command buffer in a human-readable form.
v2: drop pipe-loader changes, drop SConscript
rename dd.h -> dd_pipe.h
Acked-by: Christian König <[email protected]>
Acked-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
| |
This allows creating compute-only and debug contexts.
Reviewed-by: Brian Paul <[email protected]>
Acked-by: Christian König <[email protected]>
Acked-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I used this as some testing ground for investigating some compiler
bits initially (e.g. lrint calls etc.), figured I could do much better
in the end just for fun...
This is mathematically equivalent, but uses some tricks to avoid
doubles and also replaces some float math with ints. Good for another
performance doubling or so. As a side note, some quick tests show that
llvm's loop vectorizer would be able to properly vectorize this version
(which it failed to do earlier due to doubles, producing a mess), giving
another 3 times performance increase with sse2 (more with sse4.1), but this
may not apply to mesa.
No piglit change.
Acked-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This code (lifted straight from the extension) was doing things the most
inefficient way you could think of.
This drops some of the more expensive float operations, in particular
- int-cast floors (pointless, values always positive)
- 2 raised to (signed) integers (replace with simple exponent manipulation),
getting rid of a misguided comment in the process (implement with table...)
- float division (replace with mul of reverse of those exponents)
This is like 3 times faster (measured for float3_to_rgb9e5), though it depends
(e.g. llvm is clever enough to replace exp2 with ldexp whereas gcc is not,
division is not too bad on cpus with early-exit divs).
Note that keeping the double math for now (float x + 0.5), as the results may
otherwise differ.
Acked-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I intend to remove nir_builder::cf_node_list, so I can't have this code
poking at it directly. The proper way is to set the insertion point and
then simply insert things there.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes it easy for NIR passes to inspect what kind of shader they're
operating on.
Thanks to Michel Dänzer for helping me figure out where TGSI stores the
shader stage information.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We want to start reworking and expanding this code, but it'll be a lot
easier to do once we disentangle it from the rest of the stuff in nir.c.
Unfortunately, there are a few unavoidable dependencies in nir.c on
methods we'd rather not expose publicly, since if not used in very
specific situations they can cause Bad Things (tm) to happen. Namely, we
need to do some magical control flow munging when adding/removing jumps.
In the future, we may disallow adding/removing jumps in
nir_instr_insert_*() and nir_instr_remove(), and use separate functions
that are part of the control flow modification code, but for now we
expose them and put them in a separate, private header.
Signed-off-by: Connor Abbott <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Tessellation control shaders write to outputs as OUT[ADDR[0].x][0], make
sure to parse the indirect dimension on outputs.
Also tess control inputs/outputs and tess eval input declarations need
to receive the same treatment as geometry shader inputs.
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: - lots of changes according to Emil Velikov's comments
- implemented radeon_winsys::read_registers
v3: - a lot of new work, many of them adapt to libdrm interface changes
Squashed patches:
winsys/amdgpu: implement radeon_winsys context support
winsys/amdgpu: add reference counting for contexts
winsys/amdgpu: add userptr support
winsys/amdgpu: allocate IBs like normal buffers
winsys/amdgpu: add IBs to the buffer list, adapt to interface changes
winsys/amdgpu: don't use KMS handles as reloc hash keys
winsys/amdgpu: sync buffer accesses to different rings
winsys/amdgpu: use dependencies instead of waiting for last fence v2
gallium/radeon: unify buffer_wait and buffer_is_busy in the winsys interface (amdgpu part)
winsys/amdgpu: track fences per ring and be thread-safe
winsys/amdgpu: simplify waiting on a variable in amdgpu_fence_wait
gallium/radeon: allow the winsys to choose the IB size (amdgpu part)
winsys/amdgpu: switch to new amdgpu_cs_query_fence_status interface
winsys/amdgpu: handle fence and dependencies merge
winsys/amdgpu follow libdrm change to move user fence into UMD
winsys/amdgpu: use amdgpu_bo_va_op for va map/unmap v2
winsys/amdgpu: use the new tiling flags
winsys/amdgpu: switch to new GTT_USWC definition
winsys/amdgpu: expose amdgpu_cs_query_reset_state to drivers
winsys/amdgpu: fix valgrind warnings
winsys/amdgpu: don't use VRAM with APUs that don't have much of it
winsys/amdgpu: require LLVM 3.6.1 for VI because of bug fixes there
winsys/amdgpu: remove amdgpu_winsys::num_cpus
winsys/amdgpu: align BO size to page size
winsys/amdgpu: reduce BO cache timeout
winsys/amdgpu: remove useless flushing and waiting in amdgpu_bo_set_tiling
winsys/amdgpu: use amdgpu_device_handle as a unique device ID instead of fd
winsys/amdgpu: use safer access to amdgpu_fence_wait::signalled
winsys/amdgpu: allow maximum IB size of 4 MB
winsys/amdgpu: add ip_instance into amdgpu_fence
gallium/radeon: add RING_COMPUTE instead of RADEON_FLUSH_COMPUTE
winsys/amdgpu: set the ring type at CS initilization
winsys/amdgpu: query the GART page size from the kernel
winsys/amdgpu: correctly wait for shared buffers to become idle
winsys/amdgpu: set the amdgpu_cs_fence structure only once at fence creation
winsys/amdgpu: add a specific error message for cs_submit -> -ENOMEM
winsys/amdgpu: check num_active_ioctls before calling amdgpu_bo_wait_for_idle
winsys/amdgpu: clear user fence BO after allocating it
winsys/amdgpu: fix user fences
winsys/amdgpu: make amdgpu_winsys_create public
winsys/amdgpu: remove thread offloading
winsys/amdgpu: flatten the amdgpu_cs_context structure and simplify more
v4: require libdrm 2.4.63
|
|
|
|
|
| |
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The cumulative value is useful for queries like the number of shader
compilations.
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
vl/vl_mpeg12_bitstream.c: In function 'decode_slice':
vl/vl_mpeg12_bitstream.c:928:19: warning: unused variable 'extra' [-Wunused-variable]
unsigned extra = vl_vlc_get_uimsbf(&bs->vlc, 1);
^
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
lp_bld_tgsi_soa.c: In function 'lp_emit_immediate_soa':
lp_bld_tgsi_soa.c:3065:18: warning: unused variable 'size' [-Wunused-variable]
const uint size = imm->Immediate.NrTokens - 1;
^
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
This will help remove some duplicated code from radeon.
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Reviewed-by: Kai Wasserbäch <[email protected]>
|
|
|
|
|
|
| |
As it is needed for exp2.
Trivial.
|
|
|
|
|
| |
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
| |
This reverts commit a27ec5dc460b91dc44675f48cddbbb2631ee824f. It
breaks the intended behaviour of pipe_loader_probe() with ndev==0 as
relied upon by clover to query the number of devices available to the
pipe loader in the system.
Acked-by: Emil Velikov <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Use MSVCRT functions instead. Their semantics are slightly
different but they can be made to work as expected.
Also, use the same code paths for both MSVCRT and MinGW.
https://bugs.freedesktop.org/show_bug.cgi?id=91418
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
Its only use is to implement a custom version of LLVMDumpValue
on some Windows and embedded platforms.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
All LLVM API calls that require an ostream object have been removed from
the disassemble() function, so we don't need to use this class to wrap
_debug_printf() we can just call this function directly.
Reviewed-by: Jose Fonseca <[email protected]>
|
| |
|
| |
|
|
|
|
|
|
| |
There is no need for this.
v2: handle redundant clip state changes in st/mesa
|
| |
|
|
|
|
|
| |
Drivers can do this better, because they can skip redundant state changes
at per-slot granularity.
|
|
|
|
| |
Not needed for other shader stages.
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generated by running:
git grep -l INLINE src/gallium/ | xargs sed -i 's/\bINLINE\b/inline/g'
git grep -l INLINE src/mesa/state_tracker/ | xargs sed -i 's/\bINLINE\b/inline/g'
git checkout src/gallium/state_trackers/clover/Doxyfile
and manual edits to
src/gallium/include/pipe/p_compiler.h
src/gallium/README.portability
to remove mentions of the inline define.
Signed-off-by: Ilia Mirkin <[email protected]>
Acked-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes crashes in llvmpipe with LLVM 3.8 and also some piglit tests
on radeonsi that use the draw module.
This is just a temporary solution. The correct solution will require
creating a TargetMachine during gallivm initialization and pulling the
DataLayout from there. This will be a somewhat invasive change, and it
will need to be validatated on multiple LLVM versions.
https://llvm.org/bugs/show_bug.cgi?id=24172
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
PIPE_CAPs will be added some other time.
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PIPE_CAPs and TGSI support will be added later. The TGSI support should be
straightforward. We only need to split TGSI_FILE_RESOURCE into TGSI_FILE_IMAGE
and TGSI_FILE_BUFFER, though duplicating all opcodes shouldn't be necessary.
The idea is:
* ARB_shader_image_load_store should use set_shader_images.
* ARB_shader_storage_buffer_object should use set_shader_buffers(slots 0..M-1)
if M shader storage buffers are supported.
* ARB_shader_atomic_counters should use set_shader_buffers(slots M..N)
if N-M+1 atomic counter buffers are supported.
PIPE_CAPs can describe various constraints for early DX11 hardware.
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Inspired (copied) from Marek's commit for egl/x11
commit 0b56e23e7f3(egl/dri2: use the correct screen index)
v2: Fix copy/pasta errors.
Cc: 10.6 <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
The former handles O_CLOEXEC (and the lack of it) appropriately.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
| |
The former handles O_CLOEXEC (and the lack of it) appropriately.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It was only useful for st/egl, although I've never got to merging the
pipe-loader and inline-helpers before it was removed. There are no users
for it ATM.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|