| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
We shouldn't encounter these right now but if we did it wouldn't be
possible for the SIMD lowering pass to split it into multiple
instructions because of its side effects on control flow, so just
assert in order to kill the program.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
lowering pass.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change addresses a number of hardware restrictions on the source
and destination regions and other execution controls of regular
FPU-like instructions that in some cases can be avoided by reducing
the execution size of the instruction. Some of these restrictions
(e.g. the one about 3src instructions not supporting compression on
some hardware) are currently being worked around case by case in the
generator with ad-hoc splitting code that is buggy in several ways
(e.g. doesn't handle non-trivial execution controls which would break
SIMD32 code), but it seems cleaner to implement as many restrictions
as we can in a single lowering pass since that will allow us to
simplify some of the surrounding code considerably and also make sure
that we don't forget applying them in the future.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This teaches the SIMD lowering pass about the hardware limits on the
execution size of math instructions, which will allow simplifying the
generator code and at the same time get rid of a number of bugs in the
manual SIMD unrolling done currently that prevent SIMD32 codegen from
working.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Seems like this texturing opcode was missing its logical counterpart
which would prevent it from taking advantage of the SIMD lowering
infrastructure, define it and plumb it through the back-end. At some
point we'll likely want to emit a single SAMPLEINFO message shared
among all channels irrespective of this change, but for the moment
this should be enough to get the intrinsic working in SIMD32 mode.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
The benefit is we will be able to use the SIMD lowering pass to unroll
math instructions of unsupported width and then remove some cruft from
the generator.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
opcodes.
This was causing the scheduler to be rather optimistic about the
latency of pull constant opcodes on Gen7+. This might seem to
increase the cycle count estimate calculated by the scheduler itself
for some shaders, even though the actual cycle count should actually
be decreased.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
For consistency with the Gen7 variant. I'm not doing the same to the
uniform pull constant message at this point because the non-GEN7 one
is still overloaded to be either an expression-like logical
instruction or a Gen4-specific physical send message.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Varying pull constant loads inherit the same limitation of pre-ILK
hardware that requires expanding SIMD8 texel fetch instructions to
SIMD16, we can deal with pull constant loads in the same way it's done
for texturing during SIMD lowering.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
This will allow the SIMD lowering pass to split 32-wide varying pull
constant loads (not natively supported by the hardware) into 16-wide
instructions.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The case where the source type of the instruction is smaller than the
immediate type could be handled by calculating the portion of the
immediate read by the instruction (assuming that the source channels
are aligned with the destination channels of the copy) and then
representing the same value as an immediate of the source type
(assuming such an immediate type exists), but the code below doesn't
do that, so just bail for the moment.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the LOAD_PAYLOAD instruction only has header sources it's possible
for the number of registers written to be less than or equal to the
SIMD component size, in which case it would take the single-MOV path
at the bottom which would cause the channel enable masks to be applied
incorrectly to the header contents and/or cause it to write past the
end of the allocated temporary. If the instruction is either
LOAD_PAYLOAD or doesn't write exactly one component the MOV path is
going to mess up the program so just don't use it.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the source value is going to the same for all SIMD-lowered chunks
of the instruction there should be no need to unzip the value into
multiple temporary registers one for each lowered chunk. As a side
effect this fixes SIMD lowering of instructions with a vector
immediate source. In the long term it *might* still be worth fixing
offset() to handle vector immediates correctly though, this should be
good enough for the moment.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
This will be useful in the SIMD lowering pass.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was introduced in cf375a3333e54a01462f192202d609436e5fbec8 but
the blame is mine because the pseudocode I sent in my review comment
for the original patch suggesting to do things this way already had
the off-by-one error. This may have caused copy propagation to be
unnecessarily strict while checking whether VGRF writes interfere with
any ACP entries and possibly miss valid optimization opportunities in
cases where multiple copy instructions write sequential locations of
the same VGRF.
Cc: Iago Toral Quiroga <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
| |
v2 (Jason Ekstrand): Destroy command buffers in DestroyCommandPool().
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95034
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Trivial.
|
|
|
|
|
|
|
|
| |
This way, if you have other cards installed, the Vulkan driver will still
work. No guarantees about WSI working correctly but offscreen should at
least work.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95537
|
|
|
|
|
| |
This way we don't have to assume that the string coming in is a piece of
constant data that exists forever.
|
| |
|
|
|
|
|
|
| |
At one point in time, we may have used the mapping to ISL_FORMAT_RAW for
certain buffer surfaces but that time has long since passed. This fixes a
bug where doing format queries on VK_FORMAT_UNDEFINED would assert-fail.
|
| |
|
|
|
|
|
|
|
| |
gcc didn't warn about the unsigned / enum pipe_prim_type mismatch
between the .c and .h file.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In d8347f12ead89c5a58f69ce9283a54ac8487159c, we added support for
skipping SIMD8 generation when the program local size is too large for
SIMD8 to be usable. This change was missed in that commit.
This bug would impact gen7 platforms when the compute shader local
size is greater than 512, and gen8 platforms when the local size is
greater than 448.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Also, we don't actually need it for clipping because meta always colors
inside the lines and, for all other operations, the user is required to set
a scissor. Since DRAWING_RECTANGLE stalls the GPU, we want to emit it as
little as possible.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This is in contrast to emitting it directly in vkCmdPipelineBarrier. This
has a couple of advantages. First, it means that no matter how many
vkCmdPipelineBarrier calls the application strings together it gets one or
two PIPE_CONTROLs. Second, it allow us to better track when we need to do
stalls because we can flag when a flush has happened and we need a stall.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
This has been declared as a uint since SNB but it's only one bit.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Instead of blasting it out as part of the pipeline, we put it in the
command buffer and only blast it out when it's really needed. Since the
PUSH_CONSTANT_ALLOC commands aren't pipelined, they immediately cause a
stall which we would like to avoid.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Tested-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old code called this on the prelinked shader list,
but at this point we have the linked shader, so we should
call the interface on that alone.
This fixes a regression in:
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.13
introduced in
5b2675093e863a52b610f112884ae12d42513770
glsl: handle implicit sized arrays in ssbo
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96228
Reviewed-by: Timothy Arceri <[email protected]>
Reported-by: Mark James
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
These all show up as unused warnings here, so drop them for now.
Reviewed-by: Emil Velikov <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Without this, samplers get left hanging as derefs to variables that don't
actually exist.
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
| |
Now that we have the better nir_foreach_block macro, there's no reason to
use the archaic block version for everything.
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
| |
Suggested-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Instead of doing a add and then mask out the upper bits, we can
simply do a add with a half wide type (this, of course, assumes
the hw can actually do it...), so we'll get the required zero
in the upper bits automatically.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
There were complaints from a mingw build:
u_draw.h:134:14: error: invalid conversion from ‘uint {aka unsigned int}’
to ‘pipe_prim_type’ [-fpermissive]
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
| |
Reviewed-by: Charmaine Lee <[email protected]>
|
|
|
|
| |
Signed-off-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes:
isl.c:62:22: warning: self-comparison always evaluates to true [-Wtautological-compare]
assert(ISL_DEV_GEN(dev) == dev->info->gen);
^~
isl.c:63:33: warning: self-comparison always evaluates to true [-Wtautological-compare]
assert(ISL_DEV_USE_SEPARATE_STENCIL(dev) == dev->use_separate_stencil);
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For a load locked, we might not use the first result but the second
result is the predicate result of the locking. In that case the load
splitting logic doesn't apply (which is designed for splitting 128-bit
loads). Instead we take the predicate and move it into the first
position (as having a dead result in first def's position upsets all
sorts of things including RA). Update the emitters to deal with this as
well.
Signed-off-by: Ilia Mirkin <[email protected]>
Tested-by: Dave Airlie <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|