| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
Starting a glxgears and closing it, I was seeing a lot of leaked TGSI for
the fixed function VPs.
v2: drop unused delete_ir() arg.
Fixes: 3b4929ec6e64 ("st/mesa: Copy VP TGSI tokens if they exist, even for NIR shaders.")
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
GLSL NIR gets freed on relink by _mesa_delete_program(), but for ARB
programs we need to free the old NIR when PSN is used to set up new NIR in
the same gl_program. Additionally, set the base .nir field so that it
will get freed by _mesa_delete_program().
Fixes: 3d7611e9a6c6 ("st/nir: use NIR for asm programs")
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Erik Faye-Lund <[email protected]>
|
|
|
|
|
|
|
| |
In preparation for the definition of a function to log a formatted
string.
Reviewed-by: Erik Faye-Lund <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
_mesa_log_msg must provide the length of the string passed into the
KHR_debug api. When the string formatted by _mesa_gl_vdebugf exceeds
MAX_DEBUG_MESSAGE_LENGTH, the length is incorrectly set to the number
of characters that would have been written if enough space had been
available.
Fixes: 30256805784450b8bb9d4dabfb56226271ca9d24
("mesa: Add support for GL_ARB_debug_output with dynamic ID allocation.")
Reviewed-by: Erik Faye-Lund <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
ARB_fragment_shader_interlock depends on memory fences to
ensure fragment ordering and this ordering guarantee is
only supported from GEN9 onwards.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109980
Fixes: 939312702e35 "i965: Add ARB_fragment_shader_interlock support."
Signed-off-by: Plamena Manolova <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch removes scaling factors introduced in 2a2e69f975b but leaves
option to use scaling in place as it could be useful with other upcoming
YUV formats.
We did this scaling because ffmpeg was shifting channel bits down, however
it seems this is not the right place as compositor wants to flip same
buffers directly to display as well and therefore bitshifting needs to be
done by the client when receiving frame from ffmpeg.
Now P0x formats are treated the same, e.g. P010 is same as P016 but with
lower 6 bits set to zeros.
Fixes: 2a2e69f975b "i965: add P0x formats and propagate required scaling factors"
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a segfault when we try to access the array using a
-1 when the array wasn't allocated in the first place.
Before 7536af670b75 we would just access a pre-allocated array
that was also load/stored to/from the shader cache. But now the
cache will no longer allocate these arrays if they are empty.
The change resulted in tests such as the following segfaulting
when run with a warm shader cache.
tests/spec/arb_arrays_of_arrays/execution/sampler/fs-struct-const-index.shader_test
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements virtually all documented PIPE_CONTROL restrictions
in a centralized helper. You now simply ask for the operations you
want, and the pipe control "brain" will figure out exactly what pipe
controls to emit to make that happen without tanking your system.
The hope is that this will fix some intermittent flushing issues as
well as GPU hangs. However, it also has a high risk of causing GPU
hangs and other regressions, as this is a particularly sensitive
area and poking the bear isn't always advisable.
Mark Janes noted that this patch helps with some GPU hangs on Icelake.
This does re-enable the VF Invalidate => Write Immediate workaround
on Gen8, which had been disabled (bug 103787) due to GPU hangs. The
old code did this workaround after another which would have added CS
stall bits, so it missed a workaround. The new code orders them
properly and appears to work.
v4: Don't pass "bo, offset, imm" to a recursive CS stall (caught by
Topi Pohjolainen), drop Gen10 workarounds that are unnecessary for
production hardware.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
While this does add a bunch of boilerplate, it also protects us against
the hardware moving bits, or changing their meaning. For something as
finnicky as PIPE_CONTROL, the extra safety seems worth it.
We turn PIPE_CONTROL_* into an bitfield of arbitrary flags, and then
pack them appropriately.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
| |
Clearer name.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
| |
This will let us make multiple genX_*.c files, without copy and pasting
all this boilerplate.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rename st_texture_free_sampler_views() to
st_delete_texture_sampler_views() to align with
st_DeleteTextureObject(), its only caller.
Move the call to st_texture_release_all_sampler_views() from
st_DeleteTextureObject() to st_delete_texture_sampler_views()
so all the sampler view clean-up code is in one place.
Reviewed-by: Neha Bhende <[email protected]>
|
|
|
|
|
|
|
| |
To st_texture_release_context_sampler_view() to be more clear
that it's context-specific.
Reviewed-by: Neha Bhende <[email protected]>
|
|
|
|
| |
Reviewed-by: Neha Bhende <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
st_init_driver_functions() is only called in st_context.c so there's
no need for the prototype in st_context.h
To avoid a forward declaration of st_init_driver_functions() in
st_context.c, we need to move around several other functions.
No functional change.
Reviewed-by: Neha Bhende <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To de-clutter st_context.h.
Clean up remaining function prototypes in st_context.h.
The st_vp_uses_current_values() helper is only used in st_context.c
so move it there.
The st_get_active_states() function is only used in st_context.c so
remove its prototype in st_context.h
Reviewed-by: Neha Bhende <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
for fragment programs we already treat fog as a single component value,
but for vp we didn't.
Fixes fog related piglit tests with my out of tree Nouveau nir patches.
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Since the compiler may not zero-out padding in the object.
Add a couple comments about this to prevent misunderstandings in
the future.
Fixes: 67d96816ff5 ("st/mesa: move, clean-up shader variant key decls/inits")
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
| |
Trivial.
|
|
|
|
|
|
|
| |
Move the variant key declarations inside the scope they're used.
Use designated initializers instead of memset() calls.
Reviewed-by: Neha Bhende <[email protected]>
|
|
|
|
|
|
|
| |
This is necessary for legacy texture buffer object formats, where we'll
need to use a swizzle to fake e.g. luminance.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
An MI_BATCH_BUFFER_START in the ring buffer acts as a second level
batchbuffer (aka jump back to ring buffer when running into a
MI_BATCH_BUFFER_END).
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
| |
Some commands like MI_BATCH_BUFFER_START have this indicator.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The NIR and TGSI paths are currently intertwined which makes it
not only hard to follow but also makes it hard to take advantage
of the differences in IR.
Here we take the first step to splitting that path apart. With
this we take the opportunity to no longer call the GLSL IR
optimisation passes after the final lowering calls for NIR. We
can instead just use the NIR passes which can produce better code
and should also result in faster compile times.
The speed-up can be measured in some dolphin uber shaders due to
no longer calling lower_if_to_cond_assign() for example
dolphin/ubershaders/120.shader_test goes from ~1.63 -> ~1.53
seconds on my machine.
There are some code changes as a result of not calling
lower_if_to_cond_assign(), this is because it flattens ifs that
contain UBOs where as NIR's peephole select doesn't. This is
were most of the regressions in Max Waves happens with shader-db.
shader-db results (VEGA):
Totals from affected shaders:
SGPRS: 2349056 -> 2349640 (0.02 %)
VGPRS: 1322160 -> 1323300 (0.09 %)
Spilled SGPRs: 21190 -> 21527 (1.59 %)
Spilled VGPRs: 99 -> 99 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 72 -> 72 (0.00 %) dwords per thread
Code Size: 57260904 -> 57270932 (0.02 %) bytes
Compile Time: 1107186 -> 1022942 (-7.61 %) milliseconds
LDS: 786 -> 786 (0.00 %) blocks
Max Waves: 391932 -> 391619 (-0.08 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
We now call this for all drivers in glsl_to_nir() instead.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
glsl_to_nir() is still missing support for converting certain
functions to NIR, so for those we use the GLSL IR optimisations
to remove the functions.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we have a loop unrolling cost function and loop unrolling isn't
going to kill us the moment we have a 64-bit op in a loop, we can go
ahead and move 64-bit lowering later. This gives us the opportunity to
do more optimizations and actually let the full optimizer run even on
64-bit ops rather than hoping one round of opt_algebraic will fix
everything. This substantially reduces both fp64 shader compile times
and the resulting code size.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of trusting the caller to already have created a softfp64
function shader and added all its functions to our shader, we simply
take the softfp64 shader as an argument and do the function inlining
ouselves. This means that there's no more nasty functions lying around
that the caller needs to worry about cleaning up.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Instead of looking the devinfo directly, look at the lowering options we
provided to NIR. This is more accurate as it's now checking for "do we
need full software lowering" rather than a hardware bit.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Replace done using:
find ./src -type f -exec sed -i -- \
's/glsl_type_is_struct(/glsl_type_is_struct_or_ifc(/g' {} \;
Acked-by: Karol Herbst <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Replace done using:
find ./src -type f -exec sed -i -- \
's/record_location_offset(/struct_location_offset(/g' {} \;
Acked-by: Karol Herbst <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Replace was done using:
find ./src -type f -exec sed -i -- \
's/is_record(/is_struct(/g' {} \;
Acked-by: Karol Herbst <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Note that locations can be set in different units, and the multiplier
argument caters to supporting these different units. For example,
st_glsl_to_nir uses dwords (4 bytes) so the multiplier should be 4,
while tgsi_to_nir uses bytes, so the multiplier should be 16.
Signed-Off-By: Timur Kristóf <[email protected]>
Tested-by: Andre Heider <[email protected]>
Tested-by: Rob Clark <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The nir_lower_uniforms_to_ubo function is useful outside of
mesa/state_tracker, and in fact is needed to produce NIR for
drivers that have the PIPE_CAP_PACKED_UNIFORMS capability.
Signed-Off-By: Timur Kristóf <[email protected]>
Tested-by: Andre Heider <[email protected]>
Tested-by: Rob Clark <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now i965 supports mesa_glthread=true like Gallium drivers do.
According to Markus (degasus), the Citra emulator now runs ~30% faster.
Emmanuel (linkmauve) also reported that the Dolphin emulator improved
by 2.8x on one game. (Both of those still need to be added to drirc.)
An Intel Mesa CI run with mesa_glthread=true appears to be happy.
Bioshock Infinite's benchmark mode seems to be around 15-20% faster
on my Skylake GT4 at 1920x1080.
Tested-by: Markus Wick <[email protected]>
Tested-by: Emmanuel Gil Peyrot <[email protected]>
Tested-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Optimize mulExtended to use 32x32->64 multiplication.
Drivers which are not based on NIR, they can set the
MUL64_TO_MUL_AND_MUL_HIGH lowering flag in order to have same old
behavior.
v2: Add missing condition check (Jason Ekstrand)
Signed-off-by: Sagar Ghuge <[email protected]>
Suggested-by: Matt Turner <Matt Turner <[email protected]>
Suggested-by: Jason Ekstrand <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Trivial.
|
|
|
|
| |
Trivial.
|
|
|
|
|
| |
Replace tabs w/ spaces. 80-column wrapping.
Trivial.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that the buffer object usage history tracks if it is
being used as vertex buffer object, we can restrict setting
the ST_NEW_VERTEX_ARRAYS bit to dirty on glBufferData calls to
buffers that are potentially used as vertex buffer object.
Also put a note that the same could be done for index arrays
used in indexed draws.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We already track the usage history for buffer objects
in a lot of aspects. Add GL_ARRAY_BUFFER and
GL_ELEMENT_ARRAY_BUFFER to gl_buffer_object::UsageHistory.
Reviewed-by: Brian Paul <[email protected]>
Signed-off-by: Mathias Fröhlich <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
EXT_texture_query_lod provides the same functionality for GLES like
the ARB extension with the same name for GL.
v2: Set ES 3.0 as minimum GLES version as required by the extension
Signed-off-by: Gert Wollny <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This might enough for iris and possible r600 (when it gets NIR)
This appears to work for iris.
v2:
* change cap return so DOUBLES == 2 means sw emu
v3:
* Refactor using int64/doubles lowering options which were added
into nir options
* Remove DOUBLES == 2 added in v2
[jordan: Remove "2" value on PIPE_CAP_DOUBLES]
[jordan: Use lowering options added to nir options]
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Usually the uniforms will be assigned locations and have their slots
counted automatically, but for builtin shaders the location assignment
is manual. So count them too otherwise we get num_uniforms == 0.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some types of params such as some builtins are always padded. We
need to keep track of this so we can restore the list correctly.
Here we also remove a couple of cache entries that are not actually
required as they get rebuilt by the _mesa_add_parameter() calls.
This patch fixes a bunch of arb_texture_multisample and
arb_sample_shading piglit tests for the radeonsi NIR backend.
Fixes: edded1237607 ("mesa: rework ParameterList to allow packing")
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added check for higher compat profile being allowed
before assigning certain extensions.
Fixes: 272fe9494232 (mesa: enable ARB_texture_buffer_* extensions in the Compatibility profile)
Signed-off-by: Danylo Piliaiev <[email protected]>
Signed-off-by: Yevhenii Kolesnikov <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107052
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a failed assertion in glDeleteLists() for the following
case:
list = glGenLists(1);
glDeleteLists(list, 1);
when those are the first display list commands issued by the
application.
When we generate display lists, we plug in empty lists created with
the make_list() helper. This function uses the OPCODE_END_OF_LIST
opcode but does not call dlist_alloc() which would set the
InstSize[OPCODE_END_OF_LIST] element to non-zero.
When the empty list was deleted, we failed the InstSize[opcode] > 0
assertion.
Typically, display lists are created with glNewList/glEndList so we
set InstSize[OPCODE_END_OF_LIST] = 1 in dlist_alloc(). That's why
this bug wasn't found before.
To fix this failure, simply initialize the InstSize[OPCODE_END_OF_LIST]
element in make_list().
The game oolite was hitting this.
Fixes: https://github.com/OoliteProject/oolite/issues/325
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Remove trailing whitespace, replace tabs w/ spaces, etc.
Trivial.
|