| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
This passes all the tests in piglit.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reducing Bucket index calculation to O(1).
This algorithm calculates the index using matrix method. Assuming
PAGE_SIZE is 4096, matrix arrangement is as below:
1*4096 2*4096 3*4096 4*4096
5*4096 6*4096 7*4096 8*4096
10*4096 12*4096 14*4096 16*4096
20*4096 24*4096 28*4096 32*4096
... ... ... ...
... ... ... ...
... ... ... max_cache_size
From this matrix its clearly seen that every row follows the below way:
... ... ... n
n+(1/4)n n+(1/2)n n+(3/4)n 2n
Row is calculated as log2(size/PAGE_SIZE) Column is calculated as
converting the difference between the elements to fit into power size of
two and indexing it.
Final Index is (row*4)+(col-1)
Tested with Intel Mesa CI.
Improves performance of 3DMark on BXT by 0.705966% +/- 0.229767% (n=20)
v4: Review comments on style and code comments implemented (Ian).
v3: Review comments implemented (Ian).
v2: Review comments implemented (Jason).
Signed-off-by: Aravindan Muthukumar <[email protected]>
Signed-off-by: Kedar Karanje <[email protected]>
Reviewed-by: Yogesh Marathe <[email protected]>
Signed-off-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
Currently the target has a redundant guard, and the state tracker isn't
properly guarded.
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
This will allow us to simplify some guards within the gallium directory.
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
| |
Fixes tex-miplevel-selection GL2:texture() 1D
|
|
|
|
|
|
| |
Otherwise, the simulator would complain in tex-miplevel-selection that the
min/max clamp was out of order. The actual HW seems to have clamped to
the max anyway.
|
|
|
|
|
| |
We were overflowing, because of all the little 4k allocations for CLs that
were getting expanded to 128kb in the simulator due to the GMP alignment.
|
|
|
|
|
|
| |
Keep non-default simd8 frontend code running for comparison purposes.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
| |
Disabled for now.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
General cleanup, and prep work for possibly moving to llvm masked
gather intrinsic.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
| |
Needed to ensure alignment for avx512.
Fixes address sanitizer crash.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
Fixes piglit glsl-1.20:vs-clip-vertex-primitives and
glsl-1.30:vs-clip-distance-primitives.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
| |
Widen fetch shader to SIMD16, enable SIMD16 types in the jitter,
and provide utility EXTRACT/INSERT SIMD8 <-> SIMD16 utility functions.
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
| |
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
|
|
|
| |
We could always do the flush asynchronously, but if we're going to wait
for a fence anyway and the driver thread is currently idle, the additional
communication overhead isn't worth it.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
It is really only required when we need to flush for deferred fences.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Fixes: c9fefa062b36 ("ddebug: rewrite to always use a threaded approach")
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Fixes: b47727a83ad6 ("ddebug: implement pipelined hang detection mode")
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
This got lost in a rebase but never hurt anything because we happened
to always sync in fence_finish anyway...
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
The relevant define changed in the final revision of the simple mutex
patch.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Fixes: e3a8013de8ca ("util/u_queue: add util_queue_fence_wait_timeout")
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
With threaded gallium, the driver may currently be running in another
thread. In that case, we will execute all remaining commands in that
thread instead of syncing, which should be better for cache locality.
Reviewed-by: Andres Rodriguez <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Asynchronous flushes require a proper implementation of
st_server_wait_sync, because we could have the following with
threaded Gallium:
Context 1 app Context 1 driver Context 2
------------- ---------------- ---------
f = glFenceSync
glFlush
<-- app sync --> <-- app sync -->
glWaitSync(f)
.. draw calls ..
pipe_context::flush
for glFenceSync
pipe_context::flush
for glFlush
Reviewed-by: Andres Rodriguez <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
The whole point of fence_server_sync is that it can be used to
avoid waiting in the application thread.
Reviewed-by: Andres Rodriguez <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It is required for LLVM anyway.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103658
Fixes: 7f33e94e43a6 ("amd/addrlib: update to latest version")
Tested-by: Vinson Lee <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We need to account for SGPR locations in merged shaders.
This case is exercised by KHR-GL45.enhanced_layouts.vertex_attrib_locations
Fixes: 79c2e7388c7f ("radeonsi/gfx9: use SPI_SHADER_USER_DATA_COMMON")
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
This allows to update them with only one memcpy().
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
As per the spec, the query identified by queryPool and query
must currently be active. Applications have to call vkCmdBeginQuery()
before, and thus the query pool BO will already be in the list.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Similar to how the driver sets the depth clear regs after a
fast depth clear. Most of the time, this will copy a 32-bit reg
instead of a 64-bit reg.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
For the fast path, radv_fill_buffer() ensures that the BO is
already in the list. For the slow path, the depth surface is
part of the framebuffer which means the BO is added to the list
when the framebuffer is emitted.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
Already checked in emit_clear().
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
aspects can't be zero and there is an assertion that ensures
it's not in emit_clear().
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
These just looked to be missed when this file was updated.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
generate_array_index fails to check whether the target of a subroutine
call exists in the AST, potentially passing around null ir_rvalue
pointers eventuating in abort/segfault.
Fixes: fd01840c0bd3 ("glsl: add AoA support to subroutines")
Reviewed-by: Timothy Arceri <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100438
|
|
|
|
|
|
|
|
| |
The original spec I had didn't expose integer textures and suggested that
you use unfiltered floats. Now there are proper formats for them.
Fixes 16- and 32-bit texwrap integer tests in piglit, and
dEQP-GLES3.functional.fbo.completeness.renderable.renderbuffer.color0.rgb10_a2ui.
|
|
|
|
|
|
|
|
| |
When we tried to clear color while storing depth, it assertion failed
about basically not having enough information to decide which color RT to
clear. It turns out the STORE_GENERAL picks the buffer according to the
color buffer being stored, or all of them if NONE. If you're doing depth,
it doesn't know which to pick.
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
The OVERWRITE bit disables destination fetches, which is exactly what
we want when there is no valid color buffer bound.
Signed-off-by: Lucas Stach <[email protected]>
Reviewed-by: Wladimir J. van der Laan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The brw_disasm_info header is included by certain tools in order to get
shader assembly from binaries so it's a semi-external header. Including
brw_cfg.h also pulls in brw_shader.h so you end up getting quite a bit
of our back-end compiler internals. Instead, make the couple of forward
declarations we need and make the header more stand-alone. This fixes
the meson build.
Reviewed-by: Matt Turner <[email protected]>
Fixes: 4f82b17287194ca7d10816f6cfe4712a3e0a03fc
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Almost all of our BO export paths were already properly marked the BO as
external and added it to the handle table. Most export use-cases go
through a prime fd or flink where we have a brw_bo export helper that
does the right thing. The one missing one happens when you call
queryImage and ask for __DRI_IMAGE_ATTRIB_HANDLE. We just grabbed the
gem handle out of the BO (because it's really easy to do that) and
handed it off to the client; what could go wrong? As it turns out, this
path is used by basically every compositor that wants to turn around and
call drmModeAddFB2 on it so it can hand it off to display. The result,
as of 4b1e70cc57d7ff5f465544644b2180dee1490cee, is that we no longer set
MOCS_PTE on those surfaces and the kernel's attempts to disable caching
fail and we scanout gets corruption.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103759
Fixes: 4b1e70cc57d7ff5f465544644b2180dee1490cee
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: [email protected]
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
| |
Fixes: 4f82b1728719 ("i965: Rewrite disassembly annotation code")
Cc: Matt Turner <[email protected]>
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
This centralizes the calculation in the surface, instead of in each
load/store.
|
|
|
|
|
| |
This should fix some GPU hangs in our (currently always single-threaded)
fragment shaders, and definitely fixes assertion failures in simulation.
|
|
|
|
| |
Fixes dEQP-GLES3.functional.depth_stencil_clear.depth.*
|
|
|
|
| |
Fixes piglit masked-clear.
|