| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
We switched from a boolean to array lengths in gl_program a while back.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We weren't taking first_component into account when handling GS push
inputs. We hardly ever push GS inputs, so this was not caught by
existing tests. When I started using component qualifiers for the
gl_ClipDistance arrays, glsl-1.50-transform-feedback-type-and-size
started catching this.
Cc: "13.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
I forgot to delete this in 9ef2b9277d3bead6dbfa47e95794ca61e8be4e84.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This code is far too complicated to cut and paste.
v2: Update the newly added genX_gpu_memcpy.c; const a few things.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
So much of this code was cut and pasted per stage. We can accomplish
much of it by looping over shader stages.
Improves performance of OglBatch7 (version 6) by 1.50783% +/- 0.287049%
(n = 71) at 1024x768 on Cherryview.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
The context fields are for Gen4-5; setting them has always been useless.
There's no point in spending the cost in the hottest path in the driver.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Matt intentionally switched the VS calculation to be float-based in
commit c1da15709a0c0c2775bd9e534f67c60f7dc95ce8. Tessellation support
was written before this and rebased forward, and missed the change.
Now it's consistent.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
If geometry/tessellation shaders are disabled, prog_data will be NULL
(see brw_state_upload.c). This consolidates dirty bits a little.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
| |
gl_shader_program
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
This will be used to share data between gl_program and gl_shader_program
allowing for greater code simplification as we can remove a number of
awkward uses of gl_shader_program.
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's no point in performing depth writes when the depth test
comparison function is set to GL_EQUAL - it would just write out
the same value that's already there (if it is written at all). While
this is harmless from a functional perspective, it hurts performance.
Obviously, writing to memory is not free, but there's another more
subtle impact as well: it can prevent early depth optimizations.
Depth writes aren't supposed to happen for pixels that are killed
by fragment shader discard statements or the alpha test. So, with
depth writes enabled and either of those, the pixel shader must be
invoked to determine whether or not to perform the write. This is
fairly stupid in the EQUAL case - we're running a shader to decide
whether to replace the existing depth value with itself.
By disabling these pointless writes, we allow early depth even with
discards and alpha testing, allowing the hardware to skip the pixel
shader altogether if the depth test fails.
Improves performance of Unigine Valley:
- Skylake GT2: +17.8%
- Broadwell GT3e: +11.5%
- Cherrytrail: +19.4%
Huge thanks to Mark Janes for building frameretrace [1], the performance
analysis tool that helped us find this issue, and to Robert Bragg for
providing us performance metrics on Linux. Mark also spent the time to
analyze Valley performance on Windows vs. Linux and discovered a
discrepancy in early depth test metrics. Once he had isolated a draw
call and drawn attention to the problem, fixing it was pretty simple.
[1] https://github.com/janesma/apitrace/wiki/frameretrace-branch
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will allow us to directly store metadata we want to retain in
gl_program this metadata is currently stored in gl_linked_shader and
will be lost if relinking fails even though the program will remain
in use and is still valid according to the spec.
"If a program object that is active for any shader stage is re-linked
unsuccessfully, the link status will be set to FALSE, but any existing
executables and associated state will remain part of the current
rendering state until a subsequent call to UseProgram,
UseProgramStages, or BindProgramPipeline removes them from use."
This change will also help avoid the double handing that happens in
_mesa_copy_linked_program_data().
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In i965 we were calling _mesa_reference_program() after creating
gl_program and then later calling it again with NULL as a param
to get the refcount back down to 1. This changes things to not
use _mesa_reference_program() at all and just have gl_linked_shader
take ownership of gl_program since refcount starts at 1.
The st and ir_to_mesa linkers were worse as they were both getting
in a state were the refcount would never get to 0 and we would leak
the program.
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Framebuffer attachments can be specified through FramebufferTexture*
calls. Upon specifying a depth (or stencil) framebuffer attachment that
internally reuses a texture, the cube map face of the new attachment
would not be updated (defaulting to TEXTURE_CUBE_MAP_POSITIVE_X).
Fix this issue by actually updating the CubeMapFace field.
This bug manifested itself in BindFramebuffer calls performed on
framebuffers whose stencil attachments internally reused a depth
texture. When binding a framebuffer, we walk through the framebuffer's
attachments and update each one's corresponding gl_renderbuffer. Since
the framebuffer's depth and stencil attachments may share a
gl_renderbuffer and the walk visits the stencil attachment after
the depth attachment, the uninitialized CubeMapFace forced rendering
to TEXTURE_CUBE_MAP_POSITIVE_X.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77662
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This extension can be enabled automatically as it is a subset of
ARB_shader_image_load_store.
v2: Replace helper function by qualifier struct field (Ilia)
Enable NV_image_formats using ARB_shader_image_load_store (Ilia)
v3: Drop extension field from gl_extensions (Ilia)
Release notes (Ilia)
Signed-off-by: Lionel Landwerlin <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98480
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
These changes were missed in 0ad69e6b5.
Acked-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98767
|
|
|
|
|
|
|
| |
Mark variables and static functions that only occur in assert()s as
MAYBE_UNUSED.
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
We called it immediately prior, so re-use the previously returned value.
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are some differences between how non-multisampled framebuffers (i.e.
samples == 0) and multisampled framebuffers with a single sample should be
treated. For example, alpha to coverage and writing to gl_SampleMask has an
effect with single-sample multisample framebuffers, but not on
non-multisample framebuffers.
This fixes GL45-CTS.sample_variables.mask.*.samples_1.* at least for
Gallium drivers (and possibly others, though at least radeonsi needs an
additional fix).
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
| |
I deleted this function in 59864e8e02057cc6fa0448a8af067a3cf53389da.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In case we have empty log (""), we should return 0. This fixes
Khronos WebGL conformance test 'program-infolog'.
From OpenGL ES 3.1 (and OpenGL 4.5 Core) spec:
"If pname is INFO_LOG_LENGTH , the length of the info log, including
a null terminator, is returned. If there is no info log, zero is
returned."
v2: apply same fix for get_shaderiv and _mesa_GetProgramPipelineiv (Ian)
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]> (v1)
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97321
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Found by the piglit 'fbo-depth-array stencil-clear' test when
implementing blorp blit splitting for gen7.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GNU/Hurd does not define PATH_MAX since it doesn't have such arbitrary
limitation, so this failed to compile. Apparently glibc does not
enforce PATH_MAX restrictions anyway, so it's kind of a hoax:
https://www.gnu.org/software/libc/manual/html_node/Limits-for-Files.html
MSVC uses a different name (_MAX_PATH) as well, which is annoying.
We don't really need it. We can simply asprintf() the filenames.
If the filename exceeds an OS path limit, presumably fopen() will
fail, and we already check that. (We actually use ralloc_asprintf
because Mesa provides that everywhere, and it doesn't look like we've
provided an implementation of GNU's asprintf() for all platforms.)
Fixes the build on GNU/Hurd.
Cc: "13.0" <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98632
Signed-off-by: Samuel Thibault <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes crashes when starting Deus Ex: Mankind Divided.
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Previously, blorp copy operations were CCS-unaware so you had to perform
resolves on the source and destination before performing the copy. This
commit makes blorp_copy capable of handling CCS-compressed images without
any resolves.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This commit moves the handling of resolves into blorp_surf_for_miptree().
Instead of each helper doing resolves and checks itself, it simply tells
blorp_surf_for_miptree which aux modes are supported by the given blorp
operation and blorp_surf_for_miptree will resolve as-needed.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
Eventually, we may want to just have a single blorp_ccs_op function that
does both clears and resolves. For now we'll stick to just making the
ccs_resolve function we have now a bit more configurable.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
cd724208d3e1e3307f84a794f2c1fc83b69ccf8a added a call to
_mesa_lock_debug_state(ctx) but wasn't unlocking the debug state.
This fixes a hang in glsl-fs-loop piglit test with MESA_DEBUG=context.
v2:
- Remove unrelated changes.
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
It's common for games to compile 2000 programs or more so at
32bits x 2000 programs x 22 fields x 2 (at least) stages
This should give us something like 352 kilobytes in savings
once we add some more glsl only fields.
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
Since gl_program is now created with rzalloc() they should
already be initialised.
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This will allow us to move the ARB asm fields in gl_program into
a union as we will be able call ralloc_free() on the entire struct
when destroying the context.
In this change we switch over to using ralloc for the Instructions,
String and LocalParams fields of gl_program.
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
| |
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
| |
This is a step towards freeing gl_linked_shader after linking.
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
We should be able to free gl_linked_shader after linking in order to
do so we need to switch to getting values from gl_program instead.
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
| |
This is a step towards freeing gl_linked_shader after linking.
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
We should be able to free gl_linked_shader after linking in order to
do so we need to switch to getting values from gl_program instead.
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
| |
Since we started releasing GLSL IR after linking the only time we can
print GLSL IR is during linking. When regenerating variants only NIR
will be available.
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This partially reverts commit 0d241085f723402120b4b47e939fe77020a16d80.
HiZ buffer cannot do this properly now, and it's not required, so remove
it.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
When 1 BO is used for aux data, it needs to point to the correct offset,
which will not be the BOs offset but instead an offset from the BOs
offset. Since today there are always multiple BOs for aux, this doesn't
actually change anything.
Cc: Jason Ekstrand <[email protected]>
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A number of drivers report useful debug/perf information accessible
through GL_ARB_debug_output and with debug contexts (i.e. setting the
GLX_CONTEXT_DEBUG_BIT_ARB flag). But few applications actually use
the GL_ARB_debug_output extension.
This change lets one set the MESA_DEBUG env var to "context" to force-set
a debug context and report debug/perf messages to stderr (or whatever
file MESA_LOG_FILE is set to). This is a useful debugging tool.
The small change in st_api_create_context() is needed so that
st_update_debug_callback() gets called to hook up the driver debug
callbacks when ST_CONTEXT_FLAG_DEBUG was not set, but MESA_DEBUG=context.
v2: use %.*s format string instead of allocating temporary buffer.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
Fixes parts of GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pixelstore.
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By using _mesa_image_address, the code becomes simpler _and_ fixes the bug
that GL_PACK_SKIP_IMAGES was applied even on non-3D textures.
Also, converting a whole slice at a time simplifies the format translation
fallback path.
Fixes parts of GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pixelstore.
v2: fix a silly mistake during code movement
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes use cases like glReadPixels from an RGBA8I framebuffer into
a PBO with type GL_INT by clamping values appropriately when they fall
outside the range of the destination format.
Fixes parts of GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo.
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
| |
For consistency with st_pbo_get_download_fs.
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using the GPU download path, we bind the PBO as a buffer texture,
so call is_format_supported accordingly. On radeonsi, this means that
GPU downloads aren't used for UNSIGNED_SHORT_5_6_5 destinations, for
example.
Fixes parts of GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo.
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|