| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
| |
Add new field in SWR_BACKEND_STATE::vertexClipCullOffset to specify the
start of the clip/cull section of the vertex header. Removed use of
hardcoded slot from binner.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Moved from from SWR_RASTSTATE to SWR_BACKEND_STATE.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
| |
SwrStallBE stalls the backend threads until all work submitted before
the stall has finished. The frontend threads can continue to make
forward progress.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
This reverts commit 1cda9a2fee05effd9c64bd773bc6005281593662.
It works now.
|
|
|
|
|
|
|
|
|
|
| |
This removes the barrier and LDS stores and loads for tess factors
when it's possible. The removal of the barrier seems more important
to me though.
In one shader, it removes 17 * 4 bytes from the shader binary.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The pass tries to deduce whether tess factors are always written by
all shader invocations.
The implication for radeonsi is that it doesn't have to use a barrier
near the end of TCS, and doesn't have to use LDS for passing the tess
factors to the epilog.
v2: Handle barriers and do the analysis pass for each code segment
surrounded by barriers separately, and AND results from all
such segments writing tess factors. The change is trivial in the main
switch statement.
Also, the result is renamed to "tessfactors_are_def_in_all_invocs"
to make the name accurate.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
This also cleans things up.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
This will be more useful once we have sync_file support.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
Now we should get IB submissions with bo_list == NULL when DRI buffers
aren't referenced.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
This kinda fragiile, but it at least unbreaks the driver.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
it appears that texcoord.z/w will be 0 in all cases already,
so just put them into the vbo always.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Tested-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
Now, depth-only clears and custom passes don't read memory in VS.
Reviewed-by: Nicolai Hähnle <[email protected]>
Tested-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
Add ZW coordinates to the draw_rectangle callback and use it.
Reviewed-by: Nicolai Hähnle <[email protected]>
Tested-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
| |
They are done with instancing.
Reviewed-by: Nicolai Hähnle <[email protected]>
Tested-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Tested-by: Brian Paul <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With GALLIVM_DEBUG=perf set, output the relevant stats for shader cache usage
whenever we have to evict shader variants.
Also add some output when shaders are deleted (but not with the perf setting
to keep this one less noisy).
While here, also don't delete that many shaders when we have to evict. For fs,
there's potentially some cost if we have to evict due to the required flush,
however certainly shader recompiles have a high cost too so I don't think
evicting one quarter of the cache size makes sense (and, if we're evicting
based on IR count, we probably typically evict only very few or just one
shader too). For vs, I'm not sure it even makes sense to evict more than
one shader at a time, but keep the logic the same for now.
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was implemented since forever, but not enabled.
It passes all piglit tests except one, arb_pipeline_statistics_query-frag.
The reason is that the test (for drawing a 10x10 rect) expects between
100 and 150 pixel shader invocations. But since llvmpipe counts this with
4x4 granularity (and due to the rect being 2 tris) we end up with 224
invocations. I believe however what llvmpipe is doing violates neither the
spirit nor the letter of the spec (our fragment shader granularity really
is 4x4 pixels, albeit we will bail out early on 2x2 or 4x2 (the latter
if AVX is available) granularity), the spec allows to count additional
invocations due to implementation reasons.
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gather is defined in terms of bilinear filtering, just without the filtering
part. However, there's actually some subtle differences required in our
implementation, because we use some tricks to simplify coord wrapping for the
two coords per direction.
For bilinear filtering, we don't care if we end up with an incorrect
texel, as long as the filter weight is 0.0 for it. Likewise, the order of
the texels doesn't actually matter (as long as they still have the correct
filter weight).
But for gather, these tricks lead to incorrect results.
Fix this for CLAMP_TO_EDGE, and add some comments to the other wrap functions
which look broken (the 3 mirror_clamp plus mirror_repeat) (too complex to fix
right now, and noone really seems to care...).
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This patch aborts shader translation upon indirect indexing of temporary
register on non-vgpu10 device. This prevents non-supported feature
sending to the device.
Tested wth MTT-piglit, glretrace.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
This reverts commit 10dec2de2d9f568675d66d736b48701fa26f7b50.
The environment variable is no longer needed with the previous change
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: use deinterlace common function
v3: make sure deinterlace only
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
| |
So that it makes more clear for buffer reallocation based
on buffers layout for both decoder and encoder.
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
| |
Since it's no longer being called outside of compositor
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
| |
v2: separate helper function in different patch
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
| |
The similar function is in OMX, and only used by OMX. Now have it
moved to vl/compositor for other state tracker to use later.
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
Same as before, writing TCS outputs to LDS is rare.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
TCS outputs are usually not written to LDS, so no stats here.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
-44 bytes in a monolithic LS-HS binary.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now it's able to generate ds_write2_b64 instead of ds_write2_b32.
-20 bytes in one shader binary. (having only 1 output)
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
-16 bytes in one shader binary.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
It looks like commit 391673af7ad1565a5f6ac8fc2f8c9fcdd1fe9908 that should
have fixed the perf regression didn't really change much if anything.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
It hangs with a high degree of reproducibility.
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Trivial. We already support tg4 for legacy tex opcodes, so the actual
texture sampling code already handles it.
(Just like TG4, we don't handle additional capabilities and always sample
red channel.)
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We're not particularly concerned with memory usage, if the tradeoff is
shader recompiles. And it's common for apps to have a lot of shaders
nowadays (and, since our shaders include a LOT of context state of course
we may create quite a bit more shaders even).
So quadruple the amount of shaders draw will cache (from 128 to 512).
For llvmpipe (fs shaders) quadruple the number of instructions, keep the
number of variants the same for now (only with very simple, non-texturing
shaders the variant limit could really be reached), and simplify the
definition, it's probably easier to just have one different definition
per branch...
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
| |
Fixes:7319ff87("radeon/uvd: add YUYV format support for target buffer")
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|