| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mark the functions 'exec="skip"' in the XML instead. libGL will still
have the functions, but the driver won't try to use them. I verified
that this commit works with piglit's 'object-namespace-pollution glClear
vertex-array' on x64 with a driver built from mesa-12.0.3 tag.
In fairness, this test also works with a libGL built from 7927d03. I
believe it continues to work because on non-Windows platforms we
generate some extra, dummy dispatch functions that can be used when a
driver requests a function unknown to libGL. This was done to provide
some "forward" compatibility with drivers that need more functions.
This doesn't work on Windows because the Windows calling convention is
for the callee to clean up the stack. That's the theory anyway.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we lower vars to regs, we no longer regress for anything that
does complex dereferences. (With tgsi, derefers are already lowered
before tgsi_to_nir, but not with glsl_to_nir.) In fact it actually
fixes a few things to bypass tgsi.
So make NIR the default (finally!)
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of using load/store_var intrinsics, which can have complex
derefs in the case of multi-dimensional arrays, lower these to regs
and handle the direct/indirect loads in get_src() and stores in
put_dst().
This should let us switch to using nir by default.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
standalone_compiler_cleanup() frees the glsl types, among other things,
so it needs to come after nir->ir3. But since we exit after dumping the
disassembly, it is easier to just not call it at all.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
vertex_id_zero_based differs..
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
Won't ever hit this w/ a420 gpu, so this is dead code. Need to get astc
working to know whether to rip this out entirely or not.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
tgsi_to_nir does not support them. Note that compute shaders already
force nir.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Maybe there is a better way to do this. But by the time we get to
assigning uniform locs, we want the atomic_uint's to all be gone,
otherwise we assert in st_glsl_attrib_type_size().
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
Fixes some piglits like arb_shader_atomic_counters-active-counters
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I wasn't sure if I should filter the flags so that we only use
flags that actually change the shader output. To avoid manual
updates we just pass in everything for now.
Reviewed-by: Eduardo Lima Mitev <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This will be used for things such as adding driver specific environment
variables to the key. Allowing us to set environment vars that change
the shader and not have the driver ignore them if it finds existing
shaders in the cache.
Reviewed-by: Eduardo Lima Mitev <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This prevents spurious failures when libtxc-dxtn-s2tc is installed.
Note: lp_test_format doesn't need any change since we were already
ignoring S3TC failures there.
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Rhys Kidd <[email protected]>
|
|
|
|
|
|
|
| |
v2: Remove the '+' after bxt
Reviewed-by: Anuj Phogat <[email protected]>
Signed-off-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
| |
This platform passes the following GLES3 tests:
ES3-CTS.functional.texture.compressed.astc.endpoint_value_hdr_cem_*
Reviewed-by: Anuj Phogat <[email protected]>
Signed-off-by: Nanley Chery <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Anuj Phogat <[email protected]>
Signed-off-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Not really what the fast depth clear does, no matter whether you use
EXPCLEAR or not. Seems the fast clear using the DB HW always touches
the main buffer.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Did some RE'ing what several HTILE words give when read from a descriptor
with HTILE compression enabled.
Seems to align with -pro usage for D16 too.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
And correct implementation to specify only what we support.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
We never use EXPCLEAR clears.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
text data bss dec hex filename
7038459 235248 37280 7310987 6f8e8b 32-bit i965_dri.so before
7038227 235248 37280 7310755 6f8da3 32-bit i965_dri.so after
6681438 303400 50608 7035446 6b5a36 64-bit i965_dri.so before
6681254 303400 50608 7035262 6b597e 64-bit i965_dri.so after
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Definitive Edition
Signed-off-by: Marek Olšák <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
| |
so that LLVM doesn't allocate SGPRs where XNACK is.
Cc: 17.1 <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Coverity caught the use of dead code copy-paste for
found_colors[] and num_found_colors.
CID: 1341850
Signed-off-by: Rhys Kidd <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We're already verified that 'window' wasn't NULL, I'm guessing this
allocation error is about the newly created queue.
CID: 1409754
Fixes: 03dd9a88b0b ("egl/wayland: Use per-surface event queues")
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Daniel Stone <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
APPLE_vertex_array_object support was removed in 7927d0378fc7.
However it turns out we can't remove the functions because this
can cause issues when libglapi is used together with DRI
drivers built prior to said commit
Fixes: 7927d0378fc ("mesa: drop APPLE_vertex_array_object support")
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Potentially more efficient as it may avoid the struct being initialised
twice.
Also add var to the initialisation list while we are here.
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
If the str is long or isn't null-terminated, strlen() could take a lot
of time or even crash. I don't know why was it used in the first place,
maybe for platforms without strnlen(), but strnlen() is already used
inside of ralloc_strndup(), so this change should not additionally
break anything.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Overwhelming majority of shaders don't use line continuations. In my
shader-db only shaders from the Talos Principle and Serious Sam used
them, less than 1% out of all shaders. Optimize for this case, don't
do any copying if no line continuation was found.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
strcmp() is slow. Initiate comparison with "__LINE__" or "__FILE__"
only if the identifier starts with '_', which is rare.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is shorter and easier on the eyes. At the same time this
also ensures that we are always asserting that the table pointer
is not NULL. Currently that was not done for all situations.
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use our knowledge that pointers are at least 4 byte aligned to remove
the useless digits. Then shift by 6, 10, and 14 bits and add this to
the original pointer, effectively folding in the entropy of the higher
bits of the pointer into a 4-bit section. Stopping at 14 means we can
add the entropy from 18 bits, or at least a 600Kbyte section of memory.
Assuming that ralloc allocates from a linearly allocated heap less than
this we can make a very efficient pointer hashing function for our usecase.
Even if we are not on an architecture that is 4 byte aligned, there is
still a high big chance that the thing we are allocating is at least
8 bytes in size, so even then we will have entropy into the third bit.
The 4 bit increment on the shifts is chosen rather arbitrarily; if we
had chosen a 3 bit increment we would need to add another xor to
cover a decently sized memorypool. Increasing it to 5 bits would
spread our entropy more, possibly hurting us with more collisions on
hash tables of size less than 32. With a hash table of size 16 there
are a max of 11 entries, and we can assume that with such a small table
collisions are not that painfull.
This allows us to hash the whole 32 or 64 bit pointer at once,
instead of running FNV1a, looping through each byte and doing
increments, decrements, muls, and xors on every byte. This cuts
_mesa_hash_data from 1.5 % on profiles, to making _mesa_hash_pointer
show up with a 0.09% share. Collisions on insertion actually seems to be
ever so slightly lower with this hash function, as found by printing
a loop counter and sorting the data.
perf stat shows a 1.5% reduction in instruction count,
and a 5% reduction in stalled cycles. Shader-db runtime goes
from 225 to 220 seconds.
No instruction-count changes in shader-db, but there are some minor
changes in cycle-count that is likely caused by nir walking a set
in some of its passes, and this causing a different ordering.
That might eventually lead to a difference in register allocation.
However, the effect is a net positive;
total cycles in shared programs: 24739550 -> 24738482 (-0.00%)
cycles in affected programs: 374468 -> 373400 (-0.29%)
helped: 178
HURT: 49
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before the swapchain event queue is destroyed, all proxy objects that reference
it must be dropped. Otherwise we risk a use-after-free if a frame callback event
or buffer release events are received afterwards.
This happens when an application destroys and recreates a swapchain in FIFO
mode between two frames without using the VkSwapchainCreateInfoKHR::oldSwapchain
mechanism to keep the old swapchain until after the next redraw.
Fixes: 5034c615582a ("vulkan/wsi/wayland: Use proxy wrappers for swapchain")
Signed-off-by: Philipp Zabel <[email protected]>
Reviewed-by: Daniel Stone <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Dead Island Definitive Edition
This fixes the long-standing problem with Dying Light where the game would
produce a black screen when running under Mesa. This happened because the
game's vertex shaders redeclare gl_VertexID, which is a GLSL builtin.
Mesa's GLSL compiler is a little more strict than others, and would not
compile them:
error: `gl_VertexID' redeclared
The allow_glsl_builtin_variable_redeclaration directive allows the shaders
to compile and the game to render. The game also requires OpenGL 4.4+ (GLSL
440), but does not request it explicitly. It must be forced with an
override, such as MESA_GL_VERSION_OVERRIDE=4.5 and
MESA_GLSL_VERSION_OVERRIDE=450. A compatibility context is *not* required
and forcing one with 4.5COMPAT or allow_higher_compat_version results in
graphical artifacts.
Dead Island Definitive Edition is another Techland port on the same engine
with the same problems, so we set the
allow_glsl_builtin_variable_redeclaration option for that game as well.
v2 (Samuel Pitoiset):
- Rename allow_glsl_builtin_redeclaration ->
allow_glsl_builtin_variable_redeclaration
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96449
Signed-off-by: John Brooks <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Conditional on allow_glsl_builtin_variable_redeclaration driconf option.
v2 (Samuel Pitoiset):
- Rename allow_glsl_builtin_redeclaration ->
allow_glsl_builtin_variable_redeclaration
- style: put spaces after 'if'
Signed-off-by: John Brooks <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This option will allow GLSL builtins to be redeclared verbatim (e.g.
redeclaring "in int gl_VertexID" in a vertex shader). This is not strictly
valid and would normally fail to compile, but some applications (such as
newer Techland ports) do it and need more leniency.
v2 (Samuel Pitoiset):
- Rename allow_glsl_builtin_redeclaration ->
allow_glsl_builtin_variable_redeclaration
Signed-off-by: John Brooks <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The previous condition was to clear it out if it had previously been
set, not what's in the current draw. That information is gone now, so
just clear it unconditionally.
Fixes: 330d0607e ("gallium: remove pipe_index_buffer and set_index_buffer")
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Use a user-buffer-aware cleanup function.
Fixes: c24c3b94ed ("gallium: decrease the size of pipe_vertex_buffer - 24 -> 16 bytes")
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can easily use the upload BO for push constants on Gen7.5/Gen8 too,
at the cost of a relocation when emitting 3DSTATE_CONSTANT_XS. We can
simply switch to using constant buffer pointer 2 instead of pointer 0,
like we do on Gen9+.
Ivybridge and Baytrail can't do this trick because they require the
constant buffers to be enabled in order, starting with 0. We'd have
to set the INSTPM bit to make the constant buffer pointer not relative
to dynamic state base address, which would need kernel command parser
support.
Improves performance in GLBenchmark 2.7/TRex Offscreen by:
- Broadwell GT2: 0.305608% +/- 0.19877% (n = 68)
- Braswell: No difference proven (n = 742)
- Haswell GT3e: 0.180755% +/- 0.0237505% (n = 30)
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shaders can use quite a bit of uniform data. Better to put it in the
upload buffers, like we do for client vertex data, rather than the
batch buffer state area, which is primarly used for indirect state.
This should free up batch space, allowing us to emit more commands in a
batch before flushing. Because BRW_NEW_BATCH also causes a lot of state
to be re-emitted, it may also reduce CPU overhead a little bit.
We took this approach on Gen4-5, but switched to using the batch area
on Gen6+ because buffer 0 is relative to Dynamic State Base Address by
default, which is set to the start of the batch.
On Gen9+, we already use a relocation due to a workaround, so this is
trivial to change and has basically no downside.
Unfortunately we can't change compute shader push constants because
MEDIA_CURBE_LOAD always uses an offset from dynamic state base address.
Improves performance in GLBenchmark 2.7/TRex Offscreen by:
- Skylake GT4e: 0.52821% +/- 0.113402% (n = 190)
- Apollolake: 0.510225% +/- 0.273064% (n = 70)
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
I don't think CS push constant uploading uses the section of L3
controlled by 3DSTATE_PUSH_CONSTANT_ALLOC_XS. So I don't think
it needs to be re-emitted when that space is reallocated.
The programming note in gen7_allocate_push_constants doesn't
indicate this is necessary, at least.
Reviewed-by: Chris Forbes <[email protected]>
|