| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
This fixes a performance issue with the shader cache that delayed Gallium
shader create calls until draw calls.
I'd like this in stable, but it's not a showstopper.
Cc: 17.1 <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
text data bss dec hex filename
7038459 235248 37280 7310987 6f8e8b 32-bit i965_dri.so before
7038227 235248 37280 7310755 6f8da3 32-bit i965_dri.so after
6681438 303400 50608 7035446 6b5a36 64-bit i965_dri.so before
6681254 303400 50608 7035262 6b597e 64-bit i965_dri.so after
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This option will allow GLSL builtins to be redeclared verbatim (e.g.
redeclaring "in int gl_VertexID" in a vertex shader). This is not strictly
valid and would normally fail to compile, but some applications (such as
newer Techland ports) do it and need more leniency.
v2 (Samuel Pitoiset):
- Rename allow_glsl_builtin_redeclaration ->
allow_glsl_builtin_variable_redeclaration
Signed-off-by: John Brooks <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we fallback currently the gl_program objects are re-allocated.
This is likely to change when the i965 cache lands, but for now
this fixes a crash when using MESA_GLSL=cache_fb. This env var
simulates the fallback path taken when a tgsi cache item doesn't
exist due to being evicted previously or some kind of error.
Unlike i965 we are always falling back at link time so it's safe to
just re-allocate everything. We will be unnecessarily freeing and
re-allocate a bunch of things here but it's probably not a huge deal,
and can be changed when the i965 code lands.
Fixes: 0e9991f957e2 ("glsl: don't reference shader prog data during cache fallback")
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the gallium state tracker a tgsi binary may have been evicted
from the cache to make space. In this case we would take the
fallback path and recompile/link the shader.
On i965 there are a number of reasons we can get to the program
upload stage and have neither IR nor a valid cached binary.
For example the binary may have been evicted from the cache or
we need a variant that wasn't previously cached.
This environment variable enables us to force the fallback path that
would be taken in these cases and makes it easier to debug these
otherwise hard to reproduce scenarios.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This will explicitly state that we are following the fallback
path when we find invalid/corrupt cache items. It will also
output the fallback message when the fallback path is forced
via an environment variable, the following patches will allow
this.
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
Only the first array element was declared, so tgsi_shader_info::
shader_buffers_declared didn't match what the shader was using.
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
RadeonSI needs to do a special lowering for Gather4 with integer
formats, but with bindless samplers we just can't access the index.
Instead, store the return type in the instruction like the target.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
There is really no reason why the current DrawBuffer needs to be complete
at this point. In particular, the assertion gets hit on the X server side
in libglx when running .../piglit/bin/glx-get-current-display-ext -auto
(which uses indirect GLX rendering).
Fixes: 19b61799e3d0 ("st/mesa: don't cast the incomplete framebufer to st_framebuffer")
Reported-by: Michel Dänzer <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
for skipping mapped-buffer checking in every GL draw call
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Don't reject YUV formats that the driver doesn't handle natively, since
mesa/st already knows how to lower this in shader.
Reported-by: Nicolas Dechesne <[email protected]>
Fixes: 83e9de2 ("st/mesa: EGLImageTarget* error handling")
Cc: 17.1 <[email protected]
Signed-off-by: Rob Clark <[email protected]>
Tested-by: Nicolas Dechesne <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
State trackers can set this to tell the driver when u_threaded_context is
desirable.
Reviewed-by: Nicolai Hähnle <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The function was pretty slow. This brings a substantial decrease in draw
call overhead when min/max index bounds are not needed:
Before: DrawElements (1 VBO) w/ no state change: 5.75 million
After: DrawElements (1 VBO) w/ no state change: 7.03 million
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This is the best place to do it. Now drivers without u_vbuf don't have to
do it.
v2: use correct upload size and optimal alignment
Tested-by: Edmondo Tommasina <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Tested-by: Edmondo Tommasina <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Tested-by: Edmondo Tommasina <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Tested-by: Edmondo Tommasina <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Tested-by: Edmondo Tommasina <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pipe_draw_info::indexed is replaced with index_size. index_size == 0 means
non-indexed.
Instead of pipe_index_buffer::offset, pipe_draw_info::start is used.
For indexed indirect draws, pipe_draw_info::start is added to the indirect
start. This is the only case when "start" affects indirect draws.
pipe_draw_info::index is a union. Use either index::resource or
index::user depending on the value of pipe_draw_info::has_user_indices.
v2: fixes for nine, svga
|
|
|
|
| |
For faster initialization of non-indirect draws.
|
| |
|
|
|
|
|
|
|
|
| |
Similar to how image resources are handled. That way we are sure
that inst->resource.file is PROGRAM_SAMPLER for "bound" samplers.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
| |
Tested-by: Edmondo Tommasina <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Tested-by: Edmondo Tommasina <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: use the st_common_program() helper
Tested-by: Edmondo Tommasina <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Edmondo Tommasina <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
| |
Tested-by: Edmondo Tommasina <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Tested-by: Edmondo Tommasina <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Tested-by: Edmondo Tommasina <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Tested-by: Edmondo Tommasina <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Tested-by: Edmondo Tommasina <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It contains only one member: the update function. Let's use the update
function directly.
Tested-by: Edmondo Tommasina <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
When the arrays are initialized later on with -1, that's useless
to use rzalloc_array().
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
| |
After calling this we were then overriding all the functions with
st versions.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
For the explicit pack/unpack conversions.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The TGSI DCE pass doesn't eliminate dead assignments like
MOV TEMP[0], TEMP[1] in presence of loops because it assumes
that the visitor doesn't emit dead code. This assumption is
actually wrong and this situation happens.
However, it appears that the merge_registers() pass accidentally
takes care of this for some weird reasons. But since this pass has
been disabled for RadeonSI and Nouveau, the renumber_registers()
pass which is called *after*, can't do its job correctly.
This is because it assumes that no dead code is present. But if
there is still a dead assignment, it might re-use the TEMP
register id incorrectly and emits wrong code.
This patches fixes the issue by recording writes instead of reads,
and this has the advantage to be faster.
This should fix Unigine Heaven on RadeonSI and Nouveau.
shader-db results with RadeonSI:
47109 shaders in 29632 tests
Totals:
SGPRS: 1923308 -> 1923316 (0.00 %)
VGPRS: 1133843 -> 1133847 (0.00 %)
Spilled SGPRs: 2516 -> 2518 (0.08 %)
Spilled VGPRs: 65 -> 65 (0.00 %)
Private memory VGPRs: 1184 -> 1184 (0.00 %)
Scratch size: 1308 -> 1308 (0.00 %) dwords per thread
Code Size: 60095968 -> 60096256 (0.00 %) bytes
LDS: 1077 -> 1077 (0.00 %) blocks
Max Waves: 431889 -> 431889 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
It's still interesting to disable the merge_registers() pass.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It doesn't make sense to prefix them with 'image' because
they are called "Memory Qualifiers" and they can be applied
to members of storage buffer blocks.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Andres Gomez <[email protected]>
|
|
|
|
|
|
|
| |
This is already set for the instruction at initialisation.
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
also remove the incorrect comment about primitive restart.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The VMware driver has a limited set of integer texture formats. We
often have to fall back to 4-component formats when 1- or 2-component
formats are missing.
This fixes about 8 integer texture Piglit tests with the VMware driver
on Linux. We've had this code in-house for a long time but I guess it
was never up-streamed to Mesa master.
This shouldn't regress any other drivers since we're either choosing
an earlier format in the list, or failing anyway.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Trivial.
|
|
|
|
|
|
|
|
|
|
| |
stfb->iface is always non-NULL for an st_framebuffer. These checks
were incorrect, relying on out-of-bounds memory access in the
surface-less case of EGL_KHR_surfaceless_context.
v2: remove redundant stread check (Marek)
Reviewed-by: Marek Olšák <marek@[email protected]> (v2)
|