| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Require increase of context buffer size
Signed-off-by: Leo Liu <[email protected]>
Reviewed-by: Boyuan Zhang <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: remove accidental shaderInt16 change
v2: simplify can_move_down initialization
v2: simplify VMEM_CLAUSE_MAX_GRAB_DIST
Reviewed-by: Daniel Schürmann <[email protected]>
|
|
|
|
|
|
|
|
| |
Previously, the scheduler tried to move up instructions from below depending
VMEM instructions only to move them down again when scheduling the VMEM
instruction.
Reviewed-by: Rhys Perry <[email protected]>
|
|
|
|
|
|
|
|
| |
These got lost due to some refactoring.
Due to the way our scheduler works currently, for now
we add back the reorder flag for divergent loads only.
Reviewed-by: Rhys Perry <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This patch changes VMEM scheduling in a way that they can only
be moved upwards by previous VMEM instructions but not downwards.
This way, it improves the order of VMEM instructions in relation
to their users.
Reviewed-by: Rhys Perry <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Previously, we allowed all shaders to reduce the number of max_waves to as low as 5.
Restricting this on shaders with low register demand, increases the total number of waves
while the VMEM def-use distances hardly change.
This patch also changes the max number of move operations per MEM instruction.
Reviewed-by: Rhys Perry <[email protected]>
|
|
|
|
|
|
|
|
| |
This shaves around 4-5% off of a CPU-limited example running with the
Dawn WebGPU implementation.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In 0e4a75f917, Ken added a flag brw_stage_prog_data which indicates
whether any UBO pulls ever occur. Unfortunately, he neglected to set
the bit in the vec4 back-end. This was fine at the time because the
optimization was intended for iris which does not support gen7 and using
the vec4 back-end on Gen8+ requires an environment variable. We want to
use this in Vulkan which does support Gen7 so we want the information
from the vec4 back-end as well as scalar.
Fixes: 0e4a75f917 "intel/compiler: Record whether any pull constant..."
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
RADV_PERFTEST=outooforder has been removed a while ago. This fixes
dumping the options into hang reports.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
Fixes: 6571000071d ("radv: add debug option to turn off in memory cache")
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Tapani Pälli <[email protected]>
Acked-by: Marek Olšák <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This is required for int64 to be enabled in compat profile.
Signed-off-by: Tapani Pälli <[email protected]>
Acked-by: Marek Olšák <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is actually a non-threaded implementation. I'd summarize this
as event-based submission.
When submit happens we walk a tree of submissions that depend on
the syncobj signal operations to be submitted and if those submission
we no other dependencies we start to execute them immediately.
Or, well I still use a list to avoid issues with long chains and
the stacksize when using recursion.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This does not fully do wait-before-submit, to be done in a follow
up patch.
For kernels without support for timeline syncobjs, this adds an
implementation of non-shareable timelines using legacy syncobjs.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
| |
So we can defer them.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
| |
This is in preparation to adding more types.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
| |
This simplifies code for timeline semaphores by needing to support
less configurations.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
| |
Only signalling it once.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
| |
So we have one place to do queue things if we end up deferring
submissions.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
| |
entries.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
| |
To prevent poisoning arbitrary cache entries.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
| |
If there's nothing to be done, let's actually do nothing. Seems like a
good idea.
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
Bitfields are a bit more ideomatic than explicit flags, and harder to
get wrong.
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
This will lead to fewer pipelines in the cache, which is assumed to
become our most unavoidable performance bottle-neck down the line.
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
Fixes RADV image noise.
Reviewed-by: Erik Faye-Lund <[email protected]>
|
|
|
|
| |
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This is a function with timeout support for reading from the pipe
between processes used for secure compile.
Initially we hardcode the timeout to 5 seconds. We can adjust the
timeout limit in future if needed.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
| |
This will be used in the following patch to support timeouts for
reading the pipe between processes.
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
This skips touching %ebx most times and it shows that glGetString performance
increased from 114M/s to 120M/s on my desktop.
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Lepton Wu <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This saves one return and a simple benchmark which calls glGetString
repeatedly on my desktop shows it improves calls per second from 123M
to 141M.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1997
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Lepton Wu <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Remove hard coded 16 and use entry_generate_or_patch to patch
public stubs. The generated code actually is sightly tighter
than before since the "nop" instructions before the final "jmp"
get removed.
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Lepton Wu <[email protected]>
|
|
|
|
|
|
|
|
| |
The code works exactly the same with before. Just split this function
out so we can reuse it.
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Lepton Wu <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The x86 assembly language stub in src/mapi/entry_x86_tsd.h does not
generate PIC (position-independent code). This causes text relocations
which bring troubles on recent versions of FreeBSD, OpenBSD, Android.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108541
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Lepton Wu <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Vulkan requires that only one bit for the ordering is set, but old
versions of GLSLang just set all the bits. This was fixed as part of
https://github.com/KhronosGroup/glslang/commit/c51287d744fb6e7e9ccc09f6f8451e6c64b1dad6
but we can still find older versions (or shaders compiled with it)
around.
So instead of failing, emit a warning and fallback to the effective
result of any combination of multiple bits: AcquireRelease.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2018
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: (Nanley Chery)
- Fix commit title
- Fix comment
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: Decide aux usage in get_copy_region_aux_settings (Nanley Chery)
v3: Use isl_surf_usage_is_stencil function (Nanley Chery)
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We have to resolve destination surfaces if we are bliting to and from
the same surface.
v2: Revert unrelated change (Nanley Chery)
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
| |
Avoid preparing depth resource, if we did fast depth clear before.
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Let aux surface state tracker track the stencil buffer's aux state while
clearing depth stencil buffer.
v2: Fix condition check (Nanley Chery)
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Even though stencil buffer compression looks like regular lossless color
compression w/o fast clear support, we have to resolve stencil buffer
with WM_HZ_OP packet.
v2: Check if resource is stencil with helper function (Nanley Chery)
v3: Remove unnecessary included file (Nanley Chery)
v4: (Nanley Chery)
- Avoid stencil buffer aux state transition by improving condition check
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When set, the stencil buffer is filled with the true stencil values and
we have to disable stencil buffer clear enable bit.
v2: 1) Refactor code little bit (Nanley Chery)
2) Fix assertion (Nanley Chery)
v3: 1) Remove unncessary assignment (Nanley Chery)
2) Fix GEN_GEN check (Nanley Chery)
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable stencil compression enable and control surface enable bit if
stencil buffer lossless compression is enabled.
v2: Remove unnecessary GEN_GEN check (Nanley Chery)
v3: (Nanley Chery)
- Change commit subject tag from intel/isl to intel
- Keep assignment order correct
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
| |
On Gen12+, Stencil buffer's lossless compression should be resolved
with WM_HZ_OP packet.
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We never saw any failures regarding this typo but it's good to assign
correct stencil view while constructing blorp_params.
Fixes: 0cabf93b80d0 "intel/blorp: Add an entrypoint for clearing depth and stencil"
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
On Gen12, the CCS buffer address doesn't have to be referenced in state
packets. In the case of a stencil buffer with CCS, the kernel won't know
the location of the CCS unless an extra call is made to pin its address.
To avoid this extra call, make the CCS part of the main surface.
v2. Update comment above bo_size. (Jordan)
Reviewed-by: Jordan Justen <[email protected]>
|