| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
We need to move this to the shared layer.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the VUE map has slots at the end which the shader does not write,
then we'd "flush" (constructing an URB write) on the last output it
actually wrote. Then, we'd construct another SEND with EOT, but with
no actual payload data. That's not legal.
For example, SSO programs have clip distance slots allocated no matter
what, but the shader may not write them. If it doesn't write any user
defined varyings, then the clip distance slots will be the last ones.
Found while debugging
dEQP-VK.tessellation.shader_input_output.gl_position_vs_to_tcs_to_tes
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We can use this to track various features that may or may not be supported
by the hw / kernel. Currently, we usually do this by checking the generation
and supported command parser versions in various places thoughtout the driver
code. With this patch, we centralize all these checks in just once place at
screen creation time, then we just query the bitfield wherever we need to
check if a particular feature is supported.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Instead, check the screen field directly.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Moving the test to the screen places it alongside the other global HW
feature tests that want to be shared between contexts.
Also, we need to know if we support pipelined register writes at
screen creation time so that we can tell if we can expose OpenGL 4.0
in gen7.
Signed-off-by: Chris Wilson <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
function was removed by b3360d23ac1db61390b2ac8963756c6133ba6e23
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes tests 'dEQP-GLES3.functional.texture.mipmap.*.generate.rgba5551*' on
Intel Broadwell 0x1616.
The GL 4.5 spec describes the algorithm of glGenerateMipmap as:
The contents of the derived images are computed by repeated, filtered
reduction of the level base image. [...] No particular filter algorithm is
required, though a box filter is recommended as the default filter.
Consider a texture for which all pixels are identical at level 0.
From the spec's description above, one may reasonably assume that the "filtered
reduction" of level 0 produces a new miplevel for which again all pixels are
identical. For any 2x2 subspan of identical pixels, it is difficult to see how
the "filtered reduction" of that subspan can produce a pixel that differs from
the source pixels.
Dithering during _mesa_meta_GenerateMipmap() violated that reasonable
assumption.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99210
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In its current state the unified i965 backend for
AMD_performance_monitor and INTEL_performance_query isn't able to report
meaningful Observation Architecture metrics since we haven't so far had
the necessary kernel support to fully configure the OA unit, nor the
corresponding support for normalizing the counters into a form that can
be usefully interpreted by application developers (as opposed to raw
values that may, for example, scale by the number of EUs there are).
So that we can focus on implementing just one of these extensions fully
and since we anticipate some significant backend changes as we look to
use a new kernel interface to configure the OA unit, this patch removes
the current backend. This will simplify our ability to update the
frontend infrastructure and backend interface before updating our
support for performance counters.
Signed-off-by: Robert Bragg <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
FROM_DOUBLE opcodes are setup so that they use a dst register
with a size of 2 even if they only produce a single-precison
result (this is so that the opcode can use the larger register to
produce a 64-bit aligned intermediary result as required by the
hardware during the conversion process). This creates a problem for
spilling though, because when we attempt to emit a spill for the
dst we see a 32-bit destination and emit a scratch write that
allocates a single spill register, making the intermediary writes
go beyond the size of the allocation.
Prevent this by avoiding to spill the destination register of these
opcodes.
Alternatively, we can avoid this by splitting the opcode in two: one
that produces a 64-bit aligned result and one that takes the 64-bit
aligned result as input and produces a 32-bit result from it.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When 64-bit registers are (un)spilled, we need to execute data shuffling
code before writing to or after reading from memory. If we have instructions
that operate on 64-bit data via 32-bit instructions, (un)spills for the
register produced by 32-bit instructions will not do data shuffling at all
(because we only see a normal 32-bit istruction seemingly operating on
32-bit data). This means that subsequent reads with that register using DF
access will unshuffle data read from memory that was never adequately
shuffled when it was written.
Fixing this would require to identify which 32-bit instructions write
64-bit data and emit spill instructions only when the full 64-bit
data has been written (by multiple 32-bit instructions writing to different
offsets of the same register) and always emit 64-bit unspills whenever
64-bit data is read, even when the instruction uses a 32-bit type to read
from them.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current spilling code can't spill vgrf allocations larger than 1
but SIMD4x2 doubles require 2 vgrfs, so we need to permit this case (which
is handled properly for DF data types by emitting 2 scratch messages and
doing data shuffling). We accomplish this by not auto-disabling spilling
for vgrf allocations with a size of 2, and then disable spilling on any
register with an offset != 0B (which indicates array access).
Disable spilling of partial DF reads/writes because these don't read/write
data for both logical threads and our scratch messages for 64-bit data
need data for both threads to be present.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Spilling of 64-bit data requires data shuffling for the corresponding
scratch read/write messages. This produces unsupported swizzle regions
and writemasks that we need to scalarize.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
8-wide compressed DF operations are executed as two separate 4-wide
DF operations. In that scenario, we have to be careful when we allocate
register space for their operands to prevent the case where the first
half of the instruction overwrites the source of the second half.
To do this we mark compressed instructions as having hazards to make
sure that ther register allocators assigns a register regions for the
destination that does not overlap with the region assigned for any
of its source operands.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By exploiting gen7's hardware decompression bug with vstride=0 we gain the
capacity to support additional swizzle combinations.
This also fixes ZW writes from X/Y channels like in:
mov r2.z:df r0.xxxx:df
Because DF regions use 2-wide rows with a vstride of 2, the region generated
for the source would be r0<2,2,1>.xyxy:DF, which is equivalent to r0.xxzz, so
we end up writing r0.z in r2.z instead of r0.x. Using a vertical stride of 0
in these cases we get to replicate the XX swizzle and write what we want.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Certain swizzles like XYZW can be supported by translating only the first two
64-bit swizzle channels to 32-bit channels. This happens with swizzles such
that the first two logical components, when translated to 32-bit channels and
replicated across the second dvec2 row, select the same channels specified by
the 3rd and 4th logical swizzle components.
Notice that this opens up the possibility that some instructions are not
scalarized and can end up with XY or ZW 32-bit writemasks. Make sure we always
scalarize in such cases.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Stages that use interleaved attributes generate regions with a vstride=0
that can hit the gen7 hardware decompression bug.
v2:
- Make static the function and fix indent (Matt)
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
This came in handy when debugging the payload setup for Tess Eval,
since it prints correct subnr for attributes that can be loaded
in the second half of a register.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use a width of 2 with 64-bit attributes.
Also, if we have a dvec3/4 attribute that gets split across two registers
such that components XY are stored in the second half of a register and
components ZW are stored in the first half of the next, we need to fix
regioning for any instruction that reads components Z/W of the attribute.
Notice this also means that we can't support sources that read cross-dvec2
swizzles (like XZ for example).
v2: don't assert that we have a single channel swizzle in the case that we
have to fix up Z/W access on the first half of the next register. We
can handle any swizzle that does not cross dvec2 boundaries, which
the double scalarization pass should have prevented anyway.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
v2: use byte_offset() instead of offset()
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
v2: use byte_offset() instead of offset()
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
v2: use byte_offset() instead of offset()
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2 (Iago):
- Adapt 64-bit path to component packing changes.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Signed-off-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We need to shuffle the data before it is written to the URB. Also,
dvec3/4 need two vec4 slots.
v2: use byte_offset() instead of offset().
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This way callers don't need to know about 64-bit particularities and
we reuse some code.
v2:
- use byte_offset() instead of offset()
- only mark the surface as used once
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
v2: adapt to changes in offset()
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
v2: adapt to changes in offset()
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Mostly the same stuff as usual: we ned to shuffle the data before we
write and we need to emit two 32-bit write messages (with appropriate
32-bit writemask channels set) for a full dvec4 scratch write.
v2: use byte_offset() instead of offset().
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: Setup for a 64-bit scratch read by checking the type size of the
correct register
v3: Use byte_offset() instead of offset()
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
A vec4 is 16 bytes and a dvec4 is 32 bytes so for doubles we have
to multiply the reladdr by 2. The reg_offset part is in units of 16
bytes and is used to select the low/high 16-byte chunk of a full
dvec4, so we don't want to multiply that part of the address.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
64-bit scratch read/writes require to shuffle data around so we need
to have access to the full 64-bit data. We will do the right thing
for these when we emit the messages.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The BDW PRM says that it is not supported, but it seems that gen7 is also
affected, since doing DepCtrl on double-float instructions leads to
GPU hangs in some cases, which is probably not surprising knowing that
this is not supported in new hardware iterations. The SKL PRMs do not
mention this restriction, so it is probably fine.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
v2:
- Add Broxton as Intel's internal PRMs says that it is needed (Matt).
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
This means we would copy propagate partial reads or writes and that can affect
the result.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Otherwise we end up producing code that violates the register region
restriction that says that when execsize == width and hstride != 0
the vstride can't be 0.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
In gen < 8 instructions that write more than one register need to read
more than one register too. Make sure we don't break that restriction
by copy propagating from a uniform.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
Because the meaning of the swizzles and writemasks involved is different,
so replacing the source would lead to different semantics.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
v2: Also check if the instruction source target is 64-bit. (Samuel)
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|