| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
| |
s/otions/options/, and while here let's give the full path to xmlpool.h
since `../` won't be true in the generated file.
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
| |
We were at least cleaning up this reference, but we were failing to
pin it in iris_restore_compute_saved_bos.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Today, we stream the compute shader thread IDs simply because they're
(annoyingly) relative to dynamic state base address. We could upload
them once at compile time, but we'd need a separate non-streaming
uploader for IRIS_MEMZONE_DYNAMIC, and I'm not sure it's worth it.
stream_state pins the buffer for use in the current batch, but also
returns a reference to the pipe_resource. We dropped this reference
on the floor, leaking a reference basically every time we dispatched
a compute shader after switching to a new one.
The reason it returns a reference is so that we can hold on to it and
re-pin it in iris_restore_compute_saved_bos, which we were also failing
to do. So if we actually filled up a batch with repeated dispatches to
the same compute shader, and flushed, then continued dispatching, we
would fail to pin it and likely GPU hang.
|
|
|
|
|
|
| |
We were unconditionally uploading the new data, but then conditionally
using it with MEDIA_CURBE_LOAD. If we're not going to emit the command,
there's no point in uploading the data.
|
|
|
|
|
|
| |
We only use push the compute shader thread IDs, not any actual constant
buffer data. So we should track the compute shader variant changing,
not constbuf changes.
|
|
|
|
| |
Compute doesn't use UBO ranges (annoyingly), so this is dead code.
|
|
|
|
|
|
|
|
| |
MEDIA_INTERFACE_DESCRIPTOR's Interface Descriptor Data Start Address
field's docs say: "This bit specifies the 64-byte aligned address..."
And we were doing 32. Superfluous thread ID uploading was apparently
saving us from GPU hangs in most cases.
|
|
|
|
|
|
|
|
| |
v2: standard 'bool' can be used
( Eric Engestrom <[email protected]> )
Reviewed-by: Eric Engestrom <[email protected]>
Signed-off-by: Andrii Simiklit <[email protected]>
|
|
|
|
|
|
|
|
| |
When the fault_pointer field in the header is set, we can get some idea
of which descriptor the HW isn't happy with if we know their addresses.
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
Then we can get some information back about any exception that might
have happened.
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
Arm's kernel driver mentions how to decode this field, which makes a bit
clearer what had happened.
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The only exception is the GS copy shader which emits them
unconditionally.
Totals from affected shaders:
SGPRS: 71320 -> 71008 (-0.44 %)
VGPRS: 54372 -> 54240 (-0.24 %)
Code Size: 2952628 -> 2941368 (-0.38 %) bytes
Max Waves: 9689 -> 9723 (0.35 %)
This helps Dota2, Doom, GTAV and Hitman 2.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
This doesn't fix anything known, but it's likely going to
break if layerCount is ~0U.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Leave it to NIR instead, like i965 does. Thanks to Tim Arceri for
noticing that I'd left this enabled by accident.
shader-db results on Skylake:
total instructions in shared programs: 15522628 -> 15521642 (<.01%)
instructions in affected programs: 94008 -> 93022 (-1.05%)
helped: 34
HURT: 33
helped stats (abs) min: 12 max: 48 x̄: 33.82 x̃: 42
helped stats (rel) min: 0.06% max: 22.14% x̄: 9.86% x̃: 10.89%
HURT stats (abs) min: 1 max: 16 x̄: 4.97 x̃: 3t
HURT stats (rel) min: 0.82% max: 3.77% x̄: 1.73% x̃: 1.53%
95% mean confidence interval for instructions value: -20.08 -9.35
95% mean confidence interval for instructions %-change: -5.95% -2.36%
Instructions are helped.
total cycles in shared programs: 367105221 -> 367074230 (<.01%)
cycles in affected programs: 10017660 -> 9986669 (-0.31%)
helped: 266
HURT: 184
helped stats (abs) min: 1 max: 9556 x̄: 151.35 x̃: 12
helped stats (rel) min: 0.08% max: 59.91% x̄: 4.66% x̃: 1.67%
HURT stats (abs) min: 1 max: 1716 x̄: 50.37 x̃: 6
HURT stats (rel) min: <.01% max: 24.40% x̄: 2.42% x̃: 0.85%
95% mean confidence interval for cycles value: -133.90 -3.84
95% mean confidence interval for cycles %-change: -2.44% -1.10%
Cycles are helped.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch changes the code which sets EmitNoIndirectSampler to check
the core profile GLSL version, rather than the ARB_gpu_shader5 extension
enable. st/mesa exposes ARB_gpu_shader5 if GLSLVersion (in core
profiles) or GLSLVersionCompat (in compat profiles) >= 400.
The Intel drivers do not currently expose ARB_gpu_shader5 in compat
profiles. But the backend can absolutely handle indirect samplers.
Looking at the core profile version number should be a good indication
of what the driver supports.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Nothing uses this, it must be a remnant from an earlier approach.
|
|
|
|
|
|
|
|
| |
The helpers are needed so we can use the syntax `instr(cond)` in the
algebraic rules. Add simple rule for dropping a pair of mul-div of
the same value when wrapping is guaranteed to not happen.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When handling the specified ALU operations, check for the decorations
and set nir_alu_instr no_signed_wrap and no_unsigned_wrap flags accordingly.
v2: Add a glsl_base_type_is_unsigned_integer() helper. (Karol)
v3: Rename helper to glsl_base_type_is_uint().
v4: Use two flags, so we don't need the helper anymore. (Connor)
v5: Pass alu directly to handle function. (Jason)
Reviewed-by: Karol Herbst <[email protected]> [v3]
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
They indicate the operation does not cause overflow or underflow.
This is motivated by SPIR-V decorations NoSignedWrap and
NoUnsignedWrap.
Change the storage of `exact` to be a single bit, so they pack
together.
v2: Handle no_wrap in nir_instr_set. (Karol)
v3: Use two separate flags, since the NIR SSA values and certain
instructions are typeless, so just no_wrap would be insufficient
to know which one was referred to. (Connor)
v4: Don't use nir_instr_set to propagate the flags, unlike `exact`,
consider the instructions different if the flags have different
values. Fix hashing/comparing. (Jason)
Reviewed-by: Karol Herbst <[email protected]> [v1]
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
This is an emergency release due to a critical bug.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
There doesn't seem to be any reason to keep these opcodes around:
* fnot/fxor are not used at all.
* fand/for are only used in lower_alu_to_scalar, but easily replaced
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
We can vectorize instructions with different constant sources by creating
a new load_const and using that.
Signed-off-by: Jonathan Marek <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In Midgard, a bundle consists of a few ALU instructions. Within the
bundle, there is room for an optional 128-bit constant; this constant is
shared across all instructions in the bundle.
Unfortunately, many instructions want a 128-bit constant all to
themselves (how selfish!). If we run out of space for constants in a
bundle, the bundle has to be broken up, incurring a performance and
space penalty.
As an optimization, the scheduler now analyzes the constants coming in
per-instruction and attempts to merge shared components, adjusting the
swizzle accessing the bundle's constants appropriately. Concretely,
given the GLSL:
(a * vec4(1.5, 0.5, 0.5, 1.0)) + vec4(1.0, 2.3, 2.3, 0.5)
instead of compiling to the naive two bundles:
vmul.fmul [temp], [a], r26
fconstants 1.5, 0.5, 0.5, 1.0
vadd.fadd [out], [temp], r26
fconstants 1.0, 2.3, 2.3, 0.5
The scheduler can now fuse into a single (pipelined!) bundle:
vmul.fmul [temp], [a], r26.xyyz
vadd.fadd [out], [temp], r26.zwwy
fconstants 1.5, 0.5, 1.0, 2.3
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
| |
Fixes: 3e6c6bb0 ("panfrost: Merge checksum buffer with main BO")
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the past, each query object had their own BO. Checking if the batch
referenced that BO was an easy way to check if commands were still
queued to compute the query value. If so, we needed to flush.
More recently (c24a574e6c), we started using an u_upload_mgr for query
objects, placing multiple queries in the same BO. One side-effect is
that iris_batch_references is a no longer a reasonable way to check if
commands are still queued for our query. Ours might be done, but a
later query that happens to be in the same BO might be queued. We don't
want to flush in that case.
Instead, check if the current batch's signalling syncpt is the one we
referenced when ending the query. We know the syncpt can't have been
reused because our query is holding a reference, so a simple pointer
comparison should suffice.
Removes all batch flushing caused by query objects in Shadow of Mordor.
Reviewed-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
| |
This returns a pointer to the signalling syncpt, without incrementing
the reference count. This can be useful for comparisons.
Reviewed-by: Chris Wilson <[email protected]>
|
|
|
|
|
|
|
|
| |
The ss local var is guaranteed to be != NULL. Get rid of this useless
check.
Signed-off-by: Boris Brezillon <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
| |
"Collabora, Ltd." should be listed in lieu of simply "Collabora"
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Suggested-by: Daniel Stone <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
The `force` arg has been unused for a while.. but apparently I forgot to
garbage collect it.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Fixes: d19fe5e67a39 ("st/glsl: support clamping color outputs in compat for gs/tes")
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
This support requires the driver to be a NIR driver as we use the
NIR lowering pass to do the clamping.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
This will be used to add compat profile support for higher GL
versions.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix round64 function to handle round to nearest even cases specially
with positive and negative numbers with fraction part 0.5.
v2: 1) Simplify unused bits (Elie Tournier)
Fixes:
KHR-GL45.gpu_shader_fp64.builtin.round_dvec2
KHR-GL45.gpu_shader_fp64.builtin.round_dvec3
KHR-GL45.gpu_shader_fp64.builtin.round_dvec4
KHR-GL45.gpu_shader_fp64.builtin.roundeven_double
KHR-GL45.gpu_shader_fp64.builtin.roundeven_dvec2
KHR-GL45.gpu_shader_fp64.builtin.roundeven_dvec3
KHR-GL45.gpu_shader_fp64.builtin.roundeven_dvec4
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Elie Tournier <[email protected]>
Acked-by: Anuj Phogat <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
A ton of tests were fixed by this series. A few were incorrectly passing
before (QualityError, for instance) and now are explicitly failing. A
few legitimate regressions but overwhelmingly positive.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes cube map tests due to disagreements between Mesa, dEQP, and the
spec...
Signed-off-by: Alyssa Rosenzweig <[email protected]>
Cc: Tomeu Vizoso <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
| |
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|
|
|
|
|
|
| |
Fixes rendering to e.g. alpha textures.
Signed-off-by: Alyssa Rosenzweig <[email protected]>
|