| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
The separate stencil buffer was not also getting marked as valid if
written by a draw/clear, resulting in gmem2mem getting skipped. Move
this into fd_batch_resource_used() which also handles the separate
stencil case.
Also fix restore_buffers typo.
Fixes: 4ab6ab80365 freedreno: avoid mem2gmem for invalidated buffers
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Freedreno doesn't treat buffers and images differently, so it's use was
kind of pointless.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise huge amount of spam from instr-a2xx.h.. gcc has no way to know
that freedreno was never built with such an old gcc version to care
about the bugs in old gcc ;-)
Reported-by: Rob Clark <[email protected]>
Signed-off-by: Eric Engestrom <[email protected]>
[added commit message]
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes: 61d7676df77 "nvc0/ir: add support for 64-bit shift lowering on SM20/SM30"
Fixes fs-shift-scalar-by-scalar.shader_test from piglit for the current
set-up:
uniform int64_t ival -0x7dfcfefbdf6536ff # bit pattern: 0x82030104209ac901
uniform uint64_t uval 0x1400000085010203
uniform int shl 36
uniform int shr 36
uniform int64_t iexpected_shl 0x09ac901000000000
uniform int64_t iexpected_shr -0x7dfcff0 # bit pattern: 0xfffffffff8203010
uniform uint64_t uexpected_shl 0x5010203000000000
uniform uint64_t uexpected_shr 0x0000000001400000
draw rect ortho 12 0 4 4
Signed-off-by: Pierre Moreau <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
V2: make use of driver_location and don't expose NIR to the ABI.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This creates a common function that can be shared by the tgsi
and nir backends.
v2: use LLVMBuildBitCast() directly
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
v2: use LLVMBuildBitCast() directly
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
v2: use LLVMBuildBitCast() directly
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
v2: add emit_gs_epilogue() helper function to reduce duplication.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
v2: make use of existing si_tgsi_emit_epilogue()
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
v2: make use of existing si_tgsi_emit_epilogue()
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Because NIR can create non vec4 variables when implementing component
packing we need to make sure not to reprocess the same slot again.
Also we can drop the fs_attr_idx counter and just use driver_location.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Instructions with no barrier_class can move wrt. an EVERYTHING barrier.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
It isn't just load instructions that have write-after-read hazzard.
Fixes stk gaussian blur compute shaders.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Useful mostly for debugging indirect draw.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
In some cases we can end up trying to add a write dependency on ourself,
which shouldn't trigger a flush.
Avoids an extra couple flushes per from in stk.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
ctx->last_fence isn't such a terribly clever idea, if batches can be
flushed out of order. Instead, each batch now holds a fence, which is
created before the batch is flushed (useful for next patch), that later
gets populated after the batch is actually flushed.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
In transfer_map(), when we need to flush batches that read from a
resource, we should be holding screen->lock to guard against race
conditions. Somehow deferred flush seems to make this existing
race more obvious.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Since almost all BOs will be in one CL at a time, this cache will almost
always hit except for the first usage of the BO in each CL.
This didn't show up as statistically significant on the minetest trace
(n=340), but if I lop off the throttled lobe of the bimodal distribution,
it very clearly does (0.74731% +/- 0.162093%, n=269).
|
|
|
|
|
|
|
|
| |
No significant difference in the minetest replay, but it should reduce
overhead by not requiring that we write quad indices to index buffers that
we repeatedly re-upload (and making the draw packet smaller, as well).
Over the course of the series the actual game seems to be up by 1-2 fps.
|
|
|
|
|
|
|
|
|
| |
Now that there's only one user of it, it's pretty obvious how to avoid
emitting redundant ones. This should save a bunch of kernel validation
overhead.
No statistically sigificant difference on the minetest trace I was looking
at (n=169), but the maximum FPS is up by .3%
|
|
|
|
|
|
| |
Originally there was CL code for handling various relocations back when I
had relocs for the TSDA/TA buffers. Now that the kernel handles those
entirely on its own, I can inline that code into the one place using it.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We failed to take the start into account for how many vertices to draw in
this round, so we would end up decrementing count below 0, which as an
unsigned number meant we would loop until the CLs soon ran out of space.
When I wrote the code I was thinking about how to use the previously
emitted shader state (no index bias baked into the elements) by emitting
up to 65535 and then only re-emitting with bias for the second wround, but
that doesn't work if the start is over 65535. Instead, just delay
emitting shader state until we get into the drawarrays GFXH-515 loop and
always bake the bias in when we're doing the workaround.
|
|
|
|
| |
For triangle strips, we step by max_verts - 2.
|
|
|
|
|
|
|
|
| |
They are the same thing, but this is more consistent with the rest of
the project.
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: - Make -Dlmsensors=false work
- Simplify auto and true cases
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
gen_rasterizer*.cpp depends on gen_ar_eventhandler.hpp.
Account for new dependency.
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This just builds on the image support. Evergreen only has ssbo
for fragment and compute no other stages.
v2: handle images and ssbo in the same shader properly (Ilia)
v3: fix RESQ on buffers,
fix missing atom emit
fix first element offset
use R32 format
write separate buffer rat store path.
(from running deqp gles3.1 tests)
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
On cayman it appears the cmp component is now in Z.
Fixes:
arb_shader_image_load_store-dead-fragments on cayman.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
These didn't handle the TGSI at all properly, this fixes
them to use the common path for 64->32 then adds the 32->int
on at the end.
Fixes:
generated_tests/spec/arb_gpu_shader_fp64/execution/conversion/*
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
VI has 11 dwords at least. GFX9 has 10 dwords.
Cc: 17.2 17.3 <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes issues seen with certain versions of Unreal Engine 4 editor
and games built with that using GLSL 4.30.
v2: add driinfo_gallium change (Emil Velikov)
Signed-off-by: Tapani Pälli <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97852
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103801
Acked-by: Andres Gomez <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Prepare for two texture handling paths, the descriptor-based
path will be added in a future commit. These are structured
so that the texture implementation handles its own state
emission.
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
| |
This needs to be shared between texture_plain and texture_desc.
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
| |
This will be shared with the texture descriptor path.
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
| |
Need this to efficiently emit texture descriptor invalidations.
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
| |
Track varying component offset of the point size output, as well as
provide the offset of the point coord input.
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
| |
Update state objects to add new state, and emit function to emit new
state.
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
| |
- This core must load shaders from memory (AFAIK)
- Yet another new location for UNIFORMS
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
| |
Update context reset for HALTI3..HALTI5, sorting states for the HALTI
version that has them.
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|