| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
Instead of emitting absolutely everything, just emit the few functions
that are actually referenced in some way by the entrypoint. This should
save us quite a bit of time when handed large shader modules containing
many entrypoints.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
We have a nir_builder and it has an impl field.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes MESA_GLSL=cache_fb with piglit
tests/spec/glsl-1.50/execution/geometry/clip-distance-vs-gs-out.shader_test
Fixes: 0610a624a12 i965/link: Serialize program to nir after linking for shader cache
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103988
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Since almost all BOs will be in one CL at a time, this cache will almost
always hit except for the first usage of the BO in each CL.
This didn't show up as statistically significant on the minetest trace
(n=340), but if I lop off the throttled lobe of the bimodal distribution,
it very clearly does (0.74731% +/- 0.162093%, n=269).
|
|
|
|
|
|
|
|
| |
No significant difference in the minetest replay, but it should reduce
overhead by not requiring that we write quad indices to index buffers that
we repeatedly re-upload (and making the draw packet smaller, as well).
Over the course of the series the actual game seems to be up by 1-2 fps.
|
| |
|
|
|
|
|
|
|
|
|
| |
Now that there's only one user of it, it's pretty obvious how to avoid
emitting redundant ones. This should save a bunch of kernel validation
overhead.
No statistically sigificant difference on the minetest trace I was looking
at (n=169), but the maximum FPS is up by .3%
|
|
|
|
|
|
| |
Originally there was CL code for handling various relocations back when I
had relocs for the TSDA/TA buffers. Now that the kernel handles those
entirely on its own, I can inline that code into the one place using it.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We failed to take the start into account for how many vertices to draw in
this round, so we would end up decrementing count below 0, which as an
unsigned number meant we would loop until the CLs soon ran out of space.
When I wrote the code I was thinking about how to use the previously
emitted shader state (no index bias baked into the elements) by emitting
up to 65535 and then only re-emitting with bias for the second wround, but
that doesn't work if the start is over 65535. Instead, just delay
emitting shader state until we get into the drawarrays GFXH-515 loop and
always bake the bias in when we're doing the workaround.
|
|
|
|
| |
For triangle strips, we step by max_verts - 2.
|
|
|
|
|
|
|
|
| |
They are the same thing, but this is more consistent with the rest of
the project.
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: - Make -Dlmsensors=false work
- Simplify auto and true cases
Signed-off-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
| |
When we look up the DRI drawable state we need to associate an fbconfig
with the drawable. With GLX_EXT_no_config_context we can no longer infer
that from the context and must instead query the server.
Signed-off-by: Adam Jackson <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
This also fixes a bug, the error path through MakeCurrent didn't
translate the error code by the extension's error base.
Signed-off-by: Adam Jackson <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
The dummy vtable has these slots as NULL already, no need to check for
the dummy context explicitly.
Signed-off-by: Adam Jackson <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The implementation is a simple 'return EGL_FALSE'. Stop pretending and
simply remove it.
Note: the removal of XMesa API is fine, since there hasn't been any
users for it in years.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Adam Jackson <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The extension was never implemented and seemingly never will.
The DRI based libGL dropped support for it over 10 years ago.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Adam Jackson <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The extension was never implemented and seemingly never will.
The DRI based libGL dropped support for it over 10 years ago.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Adam Jackson <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
I believe the workaround describes that the MI_LOAD_REGISTER_IMM should
come right after the 3DSTATE_SAMPLE_PATTERN.
This fixes GPU hangs in the i965 initial state batchbuffer when running
some Piglit tests with always_flush_batch=true.
Signed-off-by: Rafael Antognolli <[email protected]>
Cc: Kenneth Graunke <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The bspec describes:
"WA: Clear tdr register before send EOT in all non-PS shader kernels
mov(8) tdr0:ud 0x0:ud {NoMask}"
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On CNL, we see multiple multisample failures on piglit tests. By
emitting this extra state, though not documented in the bspec, those
failures seem to go away.
This workaround could be removed if we ever find out a better solution,
but it should be good enough for now.
Signed-off-by: Rafael Antognolli <[email protected]>
Cc: Kenneth Graunke <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This partially reverts commit 3e57e9494c2279580ad6a83ab8c065d01e7e634e
which caused a bunch of GPU hangs on several Source titles. To date, we
have no clue why these hangs are actually happening. This undoes the
final effect of 3e57e9494c227 and gets us back to not hanging. Tested
with Team Fortress 2.
Reviewed-by: Kenneth Graunke <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102435
Fixes: 3e57e9494c2279580ad6a83ab8c065d01e7e634e
Cc: [email protected]
|
|
|
|
|
|
|
|
| |
In this condition dri2_dpy->driver_name string always equals
NULL, so call to free() is useless
Signed-off-by: Vadym Shovkoplias <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
deviceName string is declared, assigned and freed but actually
never used in dri3_create_screen() function.
Fixes: 2d94601582e ("Add DRI3+Present loader")
Signed-off-by: Vadym Shovkoplias <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
| |
gen_rasterizer*.cpp depends on gen_ar_eventhandler.hpp.
Account for new dependency.
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"vkAllocateCommandBuffers can be used to create multiple command
buffers. If the creation of any of those command buffers fails, the
implementation must destroy all successfully created command buffer
objects from this command, set all entries of the pCommandBuffers
array to NULL and return the error."
This has been suggested by [email protected].
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
It's really annoying and this pollutes the output especially
when a bunch of non-meta shaders are compiled.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This just builds on the image support. Evergreen only has ssbo
for fragment and compute no other stages.
v2: handle images and ssbo in the same shader properly (Ilia)
v3: fix RESQ on buffers,
fix missing atom emit
fix first element offset
use R32 format
write separate buffer rat store path.
(from running deqp gles3.1 tests)
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
On cayman it appears the cmp component is now in Z.
Fixes:
arb_shader_image_load_store-dead-fragments on cayman.
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
These didn't handle the TGSI at all properly, this fixes
them to use the common path for 64->32 then adds the 32->int
on at the end.
Fixes:
generated_tests/spec/arb_gpu_shader_fp64/execution/conversion/*
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The idea is ported from RadeonSI, but using 512x512 instead of
256x256 seems slightly better. This improves dota2 performance
by +2%.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
| |
The state no longer exists on GFX9.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
| |
The same code for VI doesn't check for scanout either.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
VI has 11 dwords at least. GFX9 has 10 dwords.
Cc: 17.2 17.3 <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jon Turney <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jon Turney <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
Acked-by: Eric Engestrom <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Jon Turney <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix incomplete check of input params in blorp_surf_convert_to_uncompressed()
which can lead to NULL pointer dereferencing.
Fixes: 5ae8043fed2 ("intel/blorp: Add an entrypoint for doing
bit-for-bit copies")
Fixes: f395d0abc83 ("intel/blorp: Internally expose
surf_convert_to_uncompressed")
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Andres Gomez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes issues seen with certain versions of Unreal Engine 4 editor
and games built with that using GLSL 4.30.
v2: add driinfo_gallium change (Emil Velikov)
Signed-off-by: Tapani Pälli <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97852
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103801
Acked-by: Andres Gomez <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103909
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
64-bit pull loads are implemented by emitting 2 separate
32-bit pull load messages, where the second message loads from
an offset at +16B.
That addition of 16B to the original offset should not alter the
original offset register used as source for the pull load instruction
though, since the compiler might use that same offset register in other
instructions (for example, for other pull loads in the shader code
that take that same offset as reference).
If the pull load is 32-bit then we only need to emit one message and
we don't need to do offset calculations, but in that case the optimizer
should be able to drop the redundant MOV.
Fixes the following test on Haswell:
KHR-GL45.gpu_shader_fp64.fp64.max_uniform_components
Reviewed-by: Matt Turner <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103007
|
|
|
|
|
|
|
|
|
|
| |
Prepare for two texture handling paths, the descriptor-based
path will be added in a future commit. These are structured
so that the texture implementation handles its own state
emission.
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
| |
This needs to be shared between texture_plain and texture_desc.
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
| |
This will be shared with the texture descriptor path.
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
| |
Need this to efficiently emit texture descriptor invalidations.
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|
|
|
|
|
|
|
|
| |
Track varying component offset of the point size output, as well as
provide the offset of the point coord input.
Signed-off-by: Wladimir J. van der Laan <[email protected]>
Reviewed-by: Christian Gmeiner <[email protected]>
|