| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
Also updates gl_spirv to pick the right one. At the moment nothing
uses it, but upcoming functionality part of ARB_gl_spirv will use it,
and we also later can be more assertful when handling certain features
for each of the execution environments.
Reviewed-by: Alejandro Piñeiro <[email protected]>
Acked-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For GLES2+ contexts, enable EXT_gpu_shader5 if the driver exposes a
sufficiently high ESSL feature level, even if the GLSL feature level
isn't high enough.
This allows drivers to support EXT_gpu_shader5 in GLES contexts before
they support all the additional features of ARB_gpu_shader5 in GL
contexts.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Cc: [email protected]
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The glMemoryBarrier() function makes shader memory stores ordered with
respect to things specified by the given bits. Until now, st/mesa has
ignored GL_TEXTURE_UPDATE_BARRIER_BIT and GL_BUFFER_UPDATE_BARRIER_BIT,
saying that drivers should implicitly perform the needed flushing.
This seems like a pretty big assumption to make. Instead, this commit
opts to translate them to new PIPE_BARRIER bits, and adjusts existing
drivers to continue ignoring them (preserving the current behavior).
The i965 driver performs actions on these memory barriers. Shader
memory stores go through a "data cache" which is separate from the
render cache and other read caches (like the texture cache). All
memory barriers need to flush the data cache (to ensure shader memory
stores are visible), and possibly invalidate read caches (to ensure
stale data is no longer visible). The driver implicitly flushes for
most caches, but not for data cache, since ARB_shader_image_load_store
introduced MemoryBarrier() precisely to order these explicitly.
I would like to follow i965's approach in iris, flushing the data cache
on any MemoryBarrier() call, so I need st/mesa to actually call the
pipe->memory_barrier() callback.
Fixes KHR-GL45.shader_image_load_store.advanced-sync-textureUpdate
and Piglit's spec/arb_shader_image_load_store/host-mem-barrier on
the iris driver.
Roland said this looks reasonable to him.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In all instances here we can replace pipe_sampler_view_release(pipe,
view) with pipe_sampler_view_reference(view, NULL) because the views
in question are private to the state tracker context. So there's no
danger of freeing a sampler view with the wrong context.
Testing done: google chrome, misc GL demos, games
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Neha Bhende <[email protected]>
Reviewed-by: Mathias Fröhlich <[email protected]>
Reviewed-By: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As with the preceding patch for sampler views, this patch does
basically the same thing but for shaders. However, reference counting
isn't needed here (instead of calling cso_delete_XXX_shader() we call
st_save_zombie_shader().
The Redway3D Watch is one app/demo that needs this change. Otherwise,
the vmwgfx driver generates an error about trying to destroy a shader
ID that doesn't exist in the context.
Note that if PIPE_CAP_SHAREABLE_SHADERS = TRUE, then we can use/delete
any shader with any context and this mechanism is not used.
Tested with: google-chrome, google earth, Redway3D Watch/Turbine demos
and a few Linux games.
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Neha Bhende <[email protected]>
Reviewed-by: Mathias Fröhlich <[email protected]>
Reviewed-By: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When st_texture_release_all_sampler_views() is called the texture may
have sampler views belonging to several contexts. If we unreference a
sampler view and its refcount hits zero, we need to be sure to destroy
the sampler view with the same context which created it.
This was not the case with the previous code which used
pipe_sampler_view_release(). That function could end up freeing a
sampler view with a context different than the one which created it.
In the case of the VMware svga driver, we detected this but leaked the
sampler view. This led to a crash with google-chrome when the kernel
module had too many sampler views. VMware bug 2274734.
Alternately, if we try to delete a sampler view with the correct
context, we may be "reaching into" a context which is active on
another thread. That's not safe.
To fix these issues this patch adds a per-context list of "zombie"
sampler views. These are views which are to be freed at some point
when the context is active. Other contexts may safely add sampler
views to the zombie list at any time (it's mutex protected). This
avoids the context/view ownership mix-ups we had before.
Tested with: google-chrome, google earth, Redway3D Watch/Turbine demos
a few Linux games. If anyone can recomment some other multi-threaded,
multi-context GL apps to test, please let me know.
v2: avoid potential race issue by always adding sampler views to the
zombie list if the view's context doesn't match the current context,
ignoring the refcount.
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Neha Bhende <[email protected]>
Reviewed-by: Mathias Fröhlich <[email protected]>
Reviewed-By: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
| |
Make sure the inde_size parameter is meant to be in bytes.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
|
| |
The maximum value primitive restart index is different for each index data
type. Use the appropriate fixed restart index value.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The standard requires that the primitive restart comparison happens before
the basevertex value is added. Do this now, drop a reference to the standard
why this happens at this place.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Since mapping and unmapping the buffer objects in a VAO is handled
directly from the VAO, this part of the _NEW_ARRAY state is no longer
used. So remove this part of array element state.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Due to the use of bitmaps, the _mesa_vao_{,un}map_arrays functions
should provide comparable runtime efficienty to the currently used
_ae_{,un}map_vbos functions. So use this functions and enable
further cleanup.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Make use of the newly factored out _mesa_array_element function
in display list compilation. For now that duplicates out the
primitive restart logic. But that turns out to need a fix in
display list handling anyhow.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The factored out function handles emitting the vertex attributes
at the given index. The now public accessible function gets used
in the following patches.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Provide a set of functions that maps or unmaps all VBOs held
in a VAO. The functions will be used in the following patches.
v2: Update comments.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead, we do UBO and SSBO deref lowering in NIR after we've given it a
chance to optimize SSBO access:
Shader-db results on Kaby Lake:
total instructions in shared programs: 15235775 -> 15235484 (<.01%)
instructions in affected programs: 14992 -> 14701 (-1.94%)
helped: 19
HURT: 20
total cycles in shared programs: 339220331 -> 339027307 (-0.06%)
cycles in affected programs: 79831981 -> 79638957 (-0.24%)
helped: 540
HURT: 602
total loops in shared programs: 4402 -> 4348 (-1.23%)
loops in affected programs: 186 -> 132 (-29.03%)
helped: 27
HURT: 0
total spills in shared programs: 23261 -> 23234 (-0.12%)
spills in affected programs: 38 -> 11 (-71.05%)
helped: 1
HURT: 0
total fills in shared programs: 31442 -> 31371 (-0.23%)
fills in affected programs: 98 -> 27 (-72.45%)
helped: 1
HURT: 0
LOST: 12
GAINED: 12
Most of the help and hurt in instruction counts was just churn caused by
re-ordering of optimizations and the fact that the NIR deref lowering
code is emitting slightly different instructions. Nothing was hurt by
more than three instructions and most things weren't helped by more than
four. The primary exception to this is one Car Chase shader:
shaders/non-free/gfxbench4/carchase/341.shader_test CS SIMD32: 1144 -> 821 (-28.23%)
There is also one compute shader in Manhattan 3.1 and a fragment shader
in the UE4 Shooter Game demo that now get a loop partially unrolled.
Those showed up in the results as hurt instructions but were manually
removed to get the results above.
The lost/gained was a dozen Car Chase shaders that went from SIMD8 to
SIMD16 thanks to improved register pressure:
shaders/non-free/gfxbench4/carchase/366.shader_test CS
shaders/non-free/gfxbench4/carchase/368.shader_test CS
shaders/non-free/gfxbench4/carchase/370.shader_test CS
shaders/non-free/gfxbench4/carchase/372.shader_test CS
shaders/non-free/gfxbench4/carchase/376.shader_test CS
shaders/non-free/gfxbench4/carchase/378.shader_test CS
shaders/non-free/gfxbench4/carchase/380.shader_test CS
shaders/non-free/gfxbench4/carchase/382.shader_test CS
shaders/non-free/gfxbench4/carchase/384.shader_test CS
shaders/non-free/gfxbench4/carchase/388.shader_test CS
shaders/non-free/gfxbench4/carchase/4.shader_test CS
shaders/non-free/gfxbench4/carchase/6.shader_test CS
Given how much it appeared to be improved, I ran Car Chase on my laptop.
Unfortunately, I wasn't able to see any measurable improvement. It
might be helped by 1-2% but it's in the noise. It does render correctly
as far as I can tell so the improvement is legitimate.
All of the loops that got delete were in dolphin uber shaders. I've had
no opportunity to test them for correctness or performance.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Starting a glxgears and closing it, I was seeing a lot of leaked TGSI for
the fixed function VPs.
v2: drop unused delete_ir() arg.
Fixes: 3b4929ec6e64 ("st/mesa: Copy VP TGSI tokens if they exist, even for NIR shaders.")
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
GLSL NIR gets freed on relink by _mesa_delete_program(), but for ARB
programs we need to free the old NIR when PSN is used to set up new NIR in
the same gl_program. Additionally, set the base .nir field so that it
will get freed by _mesa_delete_program().
Fixes: 3d7611e9a6c6 ("st/nir: use NIR for asm programs")
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Erik Faye-Lund <[email protected]>
|
|
|
|
|
|
|
| |
In preparation for the definition of a function to log a formatted
string.
Reviewed-by: Erik Faye-Lund <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
_mesa_log_msg must provide the length of the string passed into the
KHR_debug api. When the string formatted by _mesa_gl_vdebugf exceeds
MAX_DEBUG_MESSAGE_LENGTH, the length is incorrectly set to the number
of characters that would have been written if enough space had been
available.
Fixes: 30256805784450b8bb9d4dabfb56226271ca9d24
("mesa: Add support for GL_ARB_debug_output with dynamic ID allocation.")
Reviewed-by: Erik Faye-Lund <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
ARB_fragment_shader_interlock depends on memory fences to
ensure fragment ordering and this ordering guarantee is
only supported from GEN9 onwards.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109980
Fixes: 939312702e35 "i965: Add ARB_fragment_shader_interlock support."
Signed-off-by: Plamena Manolova <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch removes scaling factors introduced in 2a2e69f975b but leaves
option to use scaling in place as it could be useful with other upcoming
YUV formats.
We did this scaling because ffmpeg was shifting channel bits down, however
it seems this is not the right place as compositor wants to flip same
buffers directly to display as well and therefore bitshifting needs to be
done by the client when receiving frame from ffmpeg.
Now P0x formats are treated the same, e.g. P010 is same as P016 but with
lower 6 bits set to zeros.
Fixes: 2a2e69f975b "i965: add P0x formats and propagate required scaling factors"
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a segfault when we try to access the array using a
-1 when the array wasn't allocated in the first place.
Before 7536af670b75 we would just access a pre-allocated array
that was also load/stored to/from the shader cache. But now the
cache will no longer allocate these arrays if they are empty.
The change resulted in tests such as the following segfaulting
when run with a warm shader cache.
tests/spec/arb_arrays_of_arrays/execution/sampler/fs-struct-const-index.shader_test
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements virtually all documented PIPE_CONTROL restrictions
in a centralized helper. You now simply ask for the operations you
want, and the pipe control "brain" will figure out exactly what pipe
controls to emit to make that happen without tanking your system.
The hope is that this will fix some intermittent flushing issues as
well as GPU hangs. However, it also has a high risk of causing GPU
hangs and other regressions, as this is a particularly sensitive
area and poking the bear isn't always advisable.
Mark Janes noted that this patch helps with some GPU hangs on Icelake.
This does re-enable the VF Invalidate => Write Immediate workaround
on Gen8, which had been disabled (bug 103787) due to GPU hangs. The
old code did this workaround after another which would have added CS
stall bits, so it missed a workaround. The new code orders them
properly and appears to work.
v4: Don't pass "bo, offset, imm" to a recursive CS stall (caught by
Topi Pohjolainen), drop Gen10 workarounds that are unnecessary for
production hardware.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
While this does add a bunch of boilerplate, it also protects us against
the hardware moving bits, or changing their meaning. For something as
finnicky as PIPE_CONTROL, the extra safety seems worth it.
We turn PIPE_CONTROL_* into an bitfield of arbitrary flags, and then
pack them appropriately.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
| |
Clearer name.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
| |
This will let us make multiple genX_*.c files, without copy and pasting
all this boilerplate.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rename st_texture_free_sampler_views() to
st_delete_texture_sampler_views() to align with
st_DeleteTextureObject(), its only caller.
Move the call to st_texture_release_all_sampler_views() from
st_DeleteTextureObject() to st_delete_texture_sampler_views()
so all the sampler view clean-up code is in one place.
Reviewed-by: Neha Bhende <[email protected]>
|
|
|
|
|
|
|
| |
To st_texture_release_context_sampler_view() to be more clear
that it's context-specific.
Reviewed-by: Neha Bhende <[email protected]>
|
|
|
|
| |
Reviewed-by: Neha Bhende <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
st_init_driver_functions() is only called in st_context.c so there's
no need for the prototype in st_context.h
To avoid a forward declaration of st_init_driver_functions() in
st_context.c, we need to move around several other functions.
No functional change.
Reviewed-by: Neha Bhende <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To de-clutter st_context.h.
Clean up remaining function prototypes in st_context.h.
The st_vp_uses_current_values() helper is only used in st_context.c
so move it there.
The st_get_active_states() function is only used in st_context.c so
remove its prototype in st_context.h
Reviewed-by: Neha Bhende <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
for fragment programs we already treat fog as a single component value,
but for vp we didn't.
Fixes fog related piglit tests with my out of tree Nouveau nir patches.
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Since the compiler may not zero-out padding in the object.
Add a couple comments about this to prevent misunderstandings in
the future.
Fixes: 67d96816ff5 ("st/mesa: move, clean-up shader variant key decls/inits")
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
| |
Trivial.
|
|
|
|
|
|
|
| |
Move the variant key declarations inside the scope they're used.
Use designated initializers instead of memset() calls.
Reviewed-by: Neha Bhende <[email protected]>
|
|
|
|
|
|
|
| |
This is necessary for legacy texture buffer object formats, where we'll
need to use a swizzle to fake e.g. luminance.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
An MI_BATCH_BUFFER_START in the ring buffer acts as a second level
batchbuffer (aka jump back to ring buffer when running into a
MI_BATCH_BUFFER_END).
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
| |
Some commands like MI_BATCH_BUFFER_START have this indicator.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The NIR and TGSI paths are currently intertwined which makes it
not only hard to follow but also makes it hard to take advantage
of the differences in IR.
Here we take the first step to splitting that path apart. With
this we take the opportunity to no longer call the GLSL IR
optimisation passes after the final lowering calls for NIR. We
can instead just use the NIR passes which can produce better code
and should also result in faster compile times.
The speed-up can be measured in some dolphin uber shaders due to
no longer calling lower_if_to_cond_assign() for example
dolphin/ubershaders/120.shader_test goes from ~1.63 -> ~1.53
seconds on my machine.
There are some code changes as a result of not calling
lower_if_to_cond_assign(), this is because it flattens ifs that
contain UBOs where as NIR's peephole select doesn't. This is
were most of the regressions in Max Waves happens with shader-db.
shader-db results (VEGA):
Totals from affected shaders:
SGPRS: 2349056 -> 2349640 (0.02 %)
VGPRS: 1322160 -> 1323300 (0.09 %)
Spilled SGPRs: 21190 -> 21527 (1.59 %)
Spilled VGPRs: 99 -> 99 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 72 -> 72 (0.00 %) dwords per thread
Code Size: 57260904 -> 57270932 (0.02 %) bytes
Compile Time: 1107186 -> 1022942 (-7.61 %) milliseconds
LDS: 786 -> 786 (0.00 %) blocks
Max Waves: 391932 -> 391619 (-0.08 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
We now call this for all drivers in glsl_to_nir() instead.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
glsl_to_nir() is still missing support for converting certain
functions to NIR, so for those we use the GLSL IR optimisations
to remove the functions.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we have a loop unrolling cost function and loop unrolling isn't
going to kill us the moment we have a 64-bit op in a loop, we can go
ahead and move 64-bit lowering later. This gives us the opportunity to
do more optimizations and actually let the full optimizer run even on
64-bit ops rather than hoping one round of opt_algebraic will fix
everything. This substantially reduces both fp64 shader compile times
and the resulting code size.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of trusting the caller to already have created a softfp64
function shader and added all its functions to our shader, we simply
take the softfp64 shader as an argument and do the function inlining
ouselves. This means that there's no more nasty functions lying around
that the caller needs to worry about cleaning up.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Instead of looking the devinfo directly, look at the lowering options we
provided to NIR. This is more accurate as it's now checking for "do we
need full software lowering" rather than a hardware bit.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|