| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We had an implicit assumption that the phi src was assigned in it's
source (pred) block leading into the phi. But this is not true with
NIR, so we can't just ignore the source block specified in the
nir_phi_src. Insert an extra mov in the source block. If it is not
required the CP pass will take it back out again.
Fixes:
./tests/spec/glsl-1.10/execution/vs-call-in-nested-loop.shader_test
./tests/spec/glsl-1.10/execution/vs-inner-loop-modifies-outer-loop-var.shader_test
and probably others.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The frontend inserts (abs) and (neg)'s to convert between NIR boolean
(~0/0) and native boolean (1/0). So we'd end up with things like:
cmps.s.ge r1.x, ...
absneg.s r1.x, (neg)r1.x
absneg.s r1.x, (abs)r1.x
sel.b32 r2.x, r0.x, r1.x, r0.y
The (neg) already gets collapsed due to the following (abs). Now by
realizing that r1.x comes from a cmps.s instruction, we can drop the
(abs) as well.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Michel Dänzer <[email protected]>
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Instead of failing an assertion, disable DCC and CMASK on the first export
that needs it, and merge the external usage flags.
v2: clear the EXPLICIT_FLUSH flag if it's not set; whitespace fixes
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise we end up with funny things like:
mov.f32f32 r0.x, r1.y
mov.f32f32 r0.x, r1.y
(It doesn't happen as much after fixing the problem w/ CP into phi src,
but it can still happen since we aren't too clever about generating phi
sources in the first place.)
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
We want to consider all the vars, not 1/32nd of them, when extending
live-ranges.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
The block defining a phi source might not have been executed. If we
allow copy propagation, we could end up pointing to a src instruction in
the wrong block.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes some transform-feedback piglits, like:
bin/ext_transform_feedback-nonflat-integral
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Turned out to be useful to debug an issue in RA. Let's keep it.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
No longer used, so drop the extra arg to ir3_instr_create()
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Been on my TODO list for a while. If nothing else this will make gdb
properly grok the opc_t enum.
This first step preserves ir3_instruction::category (with an added
assert that category matches what is encoded in opc_t). Next step is
to drop the category field (and arg to ir3_instr_create()), but that
is split into next commit for bisectability and so that we can run
piglit in the intermediate state to flush out any problems.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
For adding .v4f32 like suffixes to intrinsics, taking special care for
scalar case, which was being often neglected.
This fixes invalid IR when doing mipmap filtering on SSE2 (the only
case where we'd use intrinsics with scalars.)
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
Exactly the same code.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We could unconditionally use these instrinsics, but performance with SSE2
would suck, as LLVM falls back to calling libm.
lp_test_arit.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
For simulating less capable machines.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
| |
Trivial.
|
|
|
|
|
|
| |
It builds fine now. Probably due to C99 support.
Trivial.
|
|
|
|
|
|
|
|
|
| |
LLVM often can't determine the mask elements are all ones/zeros, and
there doesn't seem to be a good way to hint that.
Thanks to Roland Scheidegger for spotting and analyzing the issue.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
No longer needed.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Only provide a fallback for LLVM 3.3.
One less dependency on LLVM C++ interface.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
| |
The current DSQRT lowering code emits an OP_SELP, so we have to handle
its emission. This will eventually go away, but no harm supporting this
op.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes piglit tests like
tests/spec/glsl-1.10/execution/variable-indexing/vs-output-array-float-index-wr.shader_test
and related ones.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "11.1 11.2" <[email protected]>
|
|
|
|
|
|
|
| |
nvc0 and nve4 have been respectively replaced by gf100 and gk104.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Rather than the currently bound texture. This goes along with the
earlier patch to get away from examining bound textures and sampler
views during shader translation.
Fixes VMware bug 1632739.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
| |
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For radeonsi, native and TGSI use different compilers and this results
in different limits for different IR's.
The set we strictly need for radeonsi is only the MAX_BLOCK_SIZE
and MAX_THREADS_PER_BLOCK params, but I added a few others as shader
related that seemed like they would also typically depend on the
compiler.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Currently radeonsi synchronizes after every dispatch and Clover
does nothing to synchronize. This is overzealous, especially with
GL compute, so add a barrier for global buffers.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The value 0 for unknown has been chosen to so that
drivers using tgsi_scan_shader do not need to detect
missing properties if they zero-initialize the struct.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Compute support on GK110 is still unstable for weird reasons, but
this can be fixed later as the NVF0_COMPUTE envvar prevent using
compute.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
The maximum number of uniform blocks (MAX_COMPUTE_UNIFORM_BLOCKS)
per compute program must be at least 12.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
For Maxwell, the ATOMS instruction can be used to perform atomic
operations on shared memory instead of this load/store lowering pass.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
This fixes 84b9b8f (nvc0/ir: add missing emission of locked load
predicate).
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
Make sure to avoid out of bounds access in presence of indirect
array indexing by loading the size from the driver constant buffer.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The grid size is stored as three 32-bits integers in the indirect
buffer but the launch descriptor uses a 32-bits integer for both
griddim_y and griddim_z like this (z << 16) | y. To make it work,
the 16 high bits of griddim_y are overwritten by griddim_z.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Reduce likelihood of collision with real buffers by placing the
hole at the top of the 4G area. This fixes some indirect draw+compute
tests with large buffers.
Suggested by Ilia Mirkin.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Uniform buffer objects will be sticked to the driver constant buffer
like buffers because the launch descriptor only allows 8 CBs.
Input kernel parameters for OpenCL are still uploaded to screen->parm
which is bound on c0, but this will be changed later with a new series.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of using the screen->parm buffer object which will be removed,
upload auxiliary constants to uniform_bo to be consistent regarding
what we already do for Fermi.
This breaks surfaces support (for compute only) but this will be
properly re-introduced later for ARB_shader_image_load_store.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
By using os_log_message directly, as _debug_vprintf truncates messages
to 4K.
Also cleanup the disassemble interface.
Spotted by Roland.
Trivial.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Android Bionic does not support strchrnul() string function,
gallium auxiliary util/u_string.h provides util_strchrnul()
This change avoids the following building error:
external/mesa/src/gallium/drivers/radeonsi/si_shader.c:3863: error:
undefined reference to 'strchrnul'
collect2: error: ld returned 1 exit status
Cc: [email protected]
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an old patch I had around.
Vector selects seem to work well from LLVM 3.3. Using them should
improve code quality, as it might make constant propagation pass more
effective.
Tested lp_test_*
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
vdpau has recently come to rely on this, so make sure to check it
properly.
Signed-off-by: Ilia Mirkin <[email protected]>
|