| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
v2:
- use nir_fmul_imm and nir_fadd_imm helpers (Jason)
v3:
- missed one case where we need to replace nir_imm_float
with nir_imm_floatN_t (Jason)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
v2:
- use nir_imm_fmul helper (Jason)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we are outputting the same value to more than one output
component rewrite the inputs to read from a single component.
This will allow the duplicate varying components to be optimised
away by the existing opts.
shader-db results i965 (SKL):
total instructions in shared programs: 12869230 -> 12860886 (-0.06%)
instructions in affected programs: 322601 -> 314257 (-2.59%)
helped: 3080
HURT: 8
total cycles in shared programs: 317792574 -> 317730593 (-0.02%)
cycles in affected programs: 2584925 -> 2522944 (-2.40%)
helped: 2975
HURT: 477
shader-db results radeonsi (VEGA):
SGPRS: 31576 -> 31664 (0.28 %)
VGPRS: 17484 -> 17064 (-2.40 %)
Spilled SGPRs: 184 -> 167 (-9.24 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 583340 -> 569368 (-2.40 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 6162 -> 6270 (1.75 %)
Wait states: 0 -> 0 (0.00 %)
vkpipeline-db results RADV (VEGA):
Totals from affected shaders:
SGPRS: 14880 -> 15080 (1.34 %)
VGPRS: 10872 -> 10888 (0.15 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 674016 -> 668396 (-0.83 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 2708 -> 2704 (-0.15 %)
Wait states: 0 -> 0 (0.00 %
V2: bunch of tidy ups suggested by Jason
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
This just cleans things up a little and make things more safe for
derefs.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
This will be reused by the following patch.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The following patches will add support for an additional
optimisation so this function will no longer just optimise varying
constants.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This will help the new opt introduced in the following patches
allowing us to remove extra duplicate varyings.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The following patch will use this with the radeonsi NIR backend
but I've added it to ac so we can use it with RADV in future.
This is a NIR implementation of the tgsi function
tgsi_scan_tess_ctrl().
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous code used a do while loop and continues after walking
a nested loop/if-statement. This means we end up evaluating the
last instruction from the nested block against the while condition
and potentially exit early if it matches the exit condition of the
outer block.
Fixes: 386d165d8d09 ("tgsi/scan: add a new pass that analyzes tess factor writes")
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This just happened not to crash/assert because all loops have at
least 1 if-statement and due to a second bug we end up matching
the same ENDIF to exit both the iteration over the if-statment
and the loop.
The second bug is fixed in the following patch.
Fixes: 386d165d8d09 ("tgsi/scan: add a new pass that analyzes tess factor writes")
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
There's no way to tell the 3D engine about swizzling on such textures.
While rendering to NPOT ones may be possible, there's no great way to
expose that in gallium, nor would there be any practical benefit.
Fixes the non-compressed-format "copyteximage 3D" failures. Something
odd going on with the compressed formats.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This caused random failures for two conditional rendering tests:
dEQP-VK.conditional_rendering.draw_clear.draw.update_with_rendering_discard
dEQP-VK.conditional_rendering.draw_clear.draw.update_with_rendering_no_discard
These wrote the predicate with the vertex shader, did a barrier and then
started the conditional rendering. However the cache flushes for the barrier
only happen on first draw, so after the predicate has been read.
Fixes: e45ba51ea45 "radv: add support for VK_EXT_conditional_rendering"
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Without this, I get the following error when building the tests with
autotools on i686:
---8<---
src/intel/common/gen_clflush.h: In function ‘gen_clflush_range’:
src/intel/common/gen_clflush.h:37:7: warning: implicit declaration of function ‘__builtin_ia32_clflush’; did you mean ‘__builtin_ia32_pause’? [-Wimplicit-function-declaration]
__builtin_ia32_clflush(p);
^~~~~~~~~~~~~~~~~~~~~~
__builtin_ia32_pause
src/intel/common/gen_clflush.h: In function ‘gen_flush_range’:
src/intel/common/gen_clflush.h:45:4: warning: implicit declaration of function ‘__builtin_ia32_mfence’; did you mean ‘__builtin_ia32_fnclex’? [-Wimplicit-function-declaration]
__builtin_ia32_mfence();
^~~~~~~~~~~~~~~~~~~~~
__builtin_ia32_fnclex
---8<---
The erros are generated for each of these files:
- mesa/src/intel/vulkan/tests/state_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool.c
- mesa/src/intel/vulkan/tests/block_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool_free_list_only.c
This is obviously because gen_clflush.h contains code that uses
intrinsics that are only available with SSE3. Since the driver already
uses SSE3, it seems reasonable to add this to the tests as well.
Signed-off-by: Erik Faye-Lund <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Acked-by: Eric Engeström <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Without this, I get the following error when building the tests using
meson on i686:
---8<---
In file included from ../../../mesa/src/intel/vulkan/anv_private.h:46,
from ../../../mesa/src/intel/vulkan/tests/state_pool_no_free.c:26:
../../../mesa/src/intel/common/gen_clflush.h: In function ‘gen_clflush_range’:
../../../mesa/src/intel/common/gen_clflush.h:37:7: error: implicit declaration of function ‘__builtin_ia32_clflush’; did you mean ‘__builtin_ia32_pause’? [-Werror=implicit-function-declaration]
__builtin_ia32_clflush(p);
^~~~~~~~~~~~~~~~~~~~~~
__builtin_ia32_pause
../../../mesa/src/intel/common/gen_clflush.h: In function ‘gen_flush_range’:
../../../mesa/src/intel/common/gen_clflush.h:45:4: error: implicit declaration of function ‘__builtin_ia32_mfence’; did you mean ‘__builtin_ia32_fnclex’? [-Werror=implicit-function-declaration]
__builtin_ia32_mfence();
^~~~~~~~~~~~~~~~~~~~~
__builtin_ia32_fnclex
---8<---
The errors are generated for each of these files:
- mesa/src/intel/vulkan/tests/state_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool.c
- mesa/src/intel/vulkan/tests/block_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool_free_list_only.c
This is obviously because gen_clflush.h contains code that uses
intrinsics that are only available with SSE3. Since the driver already
uses SSE3, it seems reasonable to add this to the tests as well.
Signed-off-by: Erik Faye-Lund <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Eric Engeström <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
s3tc layouts are a bit finicky - they're packed, but not swizzled.
Adjust logic to allow for that case:
- Don't set a uniform pitch for POT-sized compressed textures
- Adjust define_rect API to be less confused about block sizes
- Only mark a texture as linear if it has a uniform pitch set
This has been tested to fix xonotic (as well as the s3tc-* piglits)
on nv3x and keeps it working on nv4x.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
This doesn't matter since all compressed formats supported by this
hardware use square blocks, but best to use the correct helper.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
This logic mirrors what we do on nv50. The relatively new
texture_subdata callback can cause this to happen with 3D textures,
which is triggered at least by xonotic, and probably many piglits.
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the last-active context gets deleted, the pushbuf doesn't have a
bufctx to reference. Then there could be a sequence of binds which would
trigger a reset on that bin before validation was done. Instead we just
pass in the bufctx in question directly.
All other instances of PUSH_RESET happen strictly after a validation is
run.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102349
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The whole user_priv thing is a mess, but as long as it's there, it
basically has to map 1:1 to the cur_ctx. Unfortunately we were setting
user_priv to some context, then that context could get deleted without
any draws/validations in it, leading user_priv to become NULL, with
cur_ctx still pointing at some old context. Then we wouldn't run the
switch logic, which in turn led to a NULL bufctx being dereferenced.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102349
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
We can just look at the MSF flags -- if they're unset, then we're
definitely in a helper invocation. Fixes
dEQP-GLES31.functional.shaders.helper_invocation.* with GLES3.1 enabled.
|
|
|
|
|
|
| |
Fixes failures in
dEQP-GLES31.functional.shaders.builtin_functions.texture_size.samples_1_texture_2d
in the GLES3.1 suite.
|
| |
|
|
|
|
|
|
| |
Fixes
dEQP-GLES31.functional.texture.gather.offset_dynamic.min_required_offset.2d.rgba8.size_pot.clamp_to_edge_repeat
and others.
|
|
|
|
|
|
| |
This is what the GLSL ES 310 spec tells us to do, but apparently the
"gather mode" flag doesn't imply it in the HW. Fixes
dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.filter_mode.min_nearest_mipmap_linear_mag_linear
|
|
|
|
| |
Noticed while debugging a testcase.
|
|
|
|
|
|
|
|
|
| |
The greedy comparison folding in bcsel means that we may have left the
original bool-generating NIR ALU instruction dead, but DCE wasn't
eliminating the VIR code for it because of the flags updates.
total instructions in shared programs: 5186024 -> 5100894 (-1.64%)
instructions in affected programs: 1448695 -> 1363565 (-5.88%)
|
|
|
|
|
| |
This was just generated work for vir_opt_dead_code and cluttered up the
dumps.
|
|
|
|
| |
I wanted to reuse it for DCE of flags updates.
|
|
|
|
|
| |
It is just shifting probably-means-flags bits out of a value, it doesn't
actually update the flags on its own.
|
|
|
|
|
| |
This was for shader-db, but I haven't cared about NIR instruction counts
in a long time.
|
|
|
|
|
|
|
| |
This allows the original shader-db project's run.c runner to parse things
easily, and is probably a good thing to have for GL_ARB_debug_output in
general. I formatted it more like Intel's so I can mostly reuse their
report script.
|
|
|
|
|
|
|
|
|
| |
I've been using my apitrace-based shader-db so far, but it's slow
(apitrace decompression), intrusive (apitrace windows spamming the
screen), and doesn't have much coverage. The original shader-db provides
a lot more coverage and compiles faster, at the expense of not having the
actual runtime variant key. As v3d has a lot less runtime variation than
vc4 did, this tradeoff makes more sense.
|
|
|
|
| |
Fixes: 248a7fb392ba ("v3d: Do uniform pretty-printing in the QPU dump.")
|
|
|
|
|
|
|
|
|
|
| |
Otherwise there will be symbol collisions for the vector name.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108943
Distro Bug: https://bugs.gentoo.org/673622
Fixes: 42ea0631f108d82554339530d6c88aa1b448af1e
("meson: build clover")
Acked-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
Useful to spot PIPE_CONTROL flags.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
| |
Makes things easier to read rather than a long block of text.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
| |
Not decoding the shader at the right offset.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
| |
Instruction addresses are always in ppgtt space.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It's better to let most applications make use of adaptive sync
by default. Problematic applications can be placed on the blacklist
or the user can manually disable the feature.
Reviewed-by: Michel Dänzer <[email protected]>
Signed-off-by: Nicholas Kazlauskas <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The DDX driver can be notified of adaptive sync suitability by
flagging the application's window with the _VARIABLE_REFRESH property.
This property is set on the first swap the application performs
when adaptive_sync is set to true in the drirc.
It's performed here instead of when the loader is initialized for
two reasons:
(1) The window's drawable can be missing during loader init.
This can be observed during the Unigine Superposition benchmark.
(2) Adaptive sync will only be enabled closer to when the application
actually begins rendering.
If adaptive_sync is false then the _VARIABLE_REFRESH property
is deleted on loader init.
The property is only managed on the glx DRI3 backend for now. This
should cover most common applications and games on modern hardware.
Vulkan support can be implemented in a similar manner but would likely
require splitting the function out into a common helper function.
Reviewed-by: Michel Dänzer <[email protected]>
Signed-off-by: Nicholas Kazlauskas <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Applications that don't present at a predictable rate (ie. not games)
shouldn't have adapative sync enabled. This list covers some of the
common desktop compositors, web browsers and video players.
[ Michel Dänzer: Added entry for firefox-esr ]
Reviewed-by: Michel Dänzer <[email protected]>
Signed-off-by: Nicholas Kazlauskas <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This option lets the user decide whether mesa should notify the
window manager / DDX driver that the current application is adaptive
sync capable.
It's off by default.
Reviewed-by: Michel Dänzer <[email protected]>
Signed-off-by: Nicholas Kazlauskas <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some programs start with the path and command line arguments in
argv[0] (program_invocation_name). Chromium is an example of
an application using mesa that does this.
This tries to query the real path for the symbolic link /proc/self/exe
to find the program name instead. It only uses the realpath if it
was a prefix of the invocation to avoid breaking wine programs.
Cc: Timothy Arceri <[email protected]>
Signed-off-by: Nicholas Kazlauskas <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|