| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At this moment that lowering is using info coming from the
UniformStorage, so for the ARB_gl_spirv codepath, it needs to be done
after calling gl_nir_link_uniforms. As for the GLSL codepath it can
also be called later, we just move the call on both cases, to avoid
adding several shader->spirv_data checks, and keep the patch as small
as possible.
This is the first patch needed to fix the following piglit tests:
tests/spec/arb_gl_spirv/linker/uniform/multisampler.shader_test
tests/spec/arb_gl_spirv/linker/uniform/multisampler-array.shader_test
but fixes thousands of tests when borrowing the tests from other specs
(that needs to be done manually right now).
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
To brw_nir_lower_gl_images, as it will be also used on the
ARB_gl_spirv codepath, that doesn't involves GLSL at all. So the
lowering is about images following the OpenGL semantics. In any case
"brw_nir_lower_opengl_images" seemed too long to me, so I just used
gl. That shortening is already used on other parts of the code.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This moves the evergreen-specific max-sizes out as a driver-cap, so
other drivers with less strict requirements also can use hw-atomics.
Remove ssbo_atomic as it's no longer needed.
We should now be able to use hw-atomics for some stages and not for
other, if needed.
Signed-off-by: Erik Faye-Lund <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This gets rid of a r600 specific hack in the state-tracker, and prepares
for other drivers to be able to use hw-atomics.
While we're at it, clean up some indentation in the various drivers.
Signed-off-by: Erik Faye-Lund <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MaxAtomicCounters has already been assigned in the loop above in the
ssbo_atomic = true case, so this will calculate the same value as the
default.
While we're at it, fixup indentation on the MaxAtomicBufferBindings
assign.
Signed-off-by: Erik Faye-Lund <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes the code a bit easier to follow; we first set up
MaxShaderStorageBlocks, then we either set up a dedicated
MaxAtomicBuffers, or we split MaxShaderStorageBlocks in two.
While we're at it, also make the SSBO-splitting code tolerate the
hypothetical case of having an odd number of SSBOs without incorrectly
dropping the last SSBO.
This has the nice result that the SSBOs and atomic buffers are dealt
with almost completely orthogonally, easing some upcoming patches.
Signed-off-by: Erik Faye-Lund <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We're doing full c99 now, so there's no point in using the old boolean
type.
Signed-off-by: Erik Faye-Lund <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Gurchetan Singh <[email protected]>
|
|
|
|
|
|
|
|
|
| |
GL_STENCIL_INDEX uses GL_INTENSITY for the border color, which is nicer
to hardware that doesn't read the stencil border value from the X channel.
This fixes a bunch of dEQP tests on Vega & Raven.
Cc: 18.1 18.2 <[email protected]>
|
|
|
|
|
|
|
|
| |
Reported by Coverity: arr_live_ranges is freed in a different branch
than the one in which it was allocated.
Signed-off-by: Ernestas Kulik <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to internal docs, some gen9 platforms have a pixel shader push
constant synchronization issue. Although not listed among said
platforms, this issue seems to be present on the GeminiLake 2x6's we've
tested.
We consider the available workarounds to be too detrimental on
performance. Instead, we mitigate the issue by applying part of one of
the workarounds. Re-emit PUSH_CONSTANT_ALLOC at the top of every batch
(as suggested by Ken).
Fixes ext_framebuffer_multisample-accuracy piglit test failures with the
following options:
* 6 depth_draw small depthstencil
* 8 stencil_draw small depthstencil
* 6 stencil_draw small depthstencil
* 8 depth_resolve small
* 6 stencil_resolve small depthstencil
* 4 stencil_draw small depthstencil
* 16 stencil_draw small depthstencil
* 16 depth_draw small depthstencil
* 2 stencil_resolve small depthstencil
* 6 stencil_draw small
* all_samples stencil_draw small
* 2 depth_draw small depthstencil
* all_samples depth_draw small depthstencil
* all_samples stencil_resolve small
* 4 depth_draw small depthstencil
* all_samples depth_draw small
* all_samples stencil_draw small depthstencil
* 4 stencil_resolve small depthstencil
* 4 depth_resolve small depthstencil
* all_samples stencil_resolve small depthstencil
v2: Include more platforms in WA (Ken).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106865
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93355
Cc: <[email protected]>
Tested-by: Mark Janes <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This fixes a firefox crash.
Fixes: 781a78914c798dc64005b37c6ca1224ce06803fc
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This effectively reverts a26693493570a9d0f0fba1be617e01ee7bfff4db which
was a misguided attempt at protecting intel_query_dma_buf_modifiers from
invalid formats. Unfortunately, in some internal EGL cases, we can get
an SRGB format validly in this function. Rejecting such formats caused
us to not allow CCS in some cases where we should have been allowing it.
This regressed the performance of some SynMark tests as well as GfxBench
ALU2, Tessellation and Manhattan 3.0 tests
There's some question of whether or not we really should be using SRGB
"fourcc" formats that aren't actually in drm_foucc.h but there's not
much harm in allowing them through here.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107223
Fixes: a26693493570 "i965/screen: Return false for unsupported..."
Tested-By: Eero Tamminen <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
| |
The spec seems clear this is not allowed but the Nvidia binary
forces apps to add layout qualifiers so this works around the
issue for No Mans Sky until the CTS can be sorted out.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The spec is quite clear this is not allowed:
From Section 4.4. (Layout Qualifiers) of the GLSL 4.60 spec:
"Layout qualifiers can appear in several forms of declaration.
They can appear as part of an interface block definition or
block member, as shown in the grammar in the previous section.
They can also appear with just an interface-qualifier to establish
layouts of other declarations made with that qualifier:
layout-qualifier interface-qualifier ;
Or, they can appear with an individual variable declared with
an interface qualifier:
layout-qualifier interface-qualifier declaration ;"
From Section 4.10 (Memory Qualifiers) of the GLSL 4.60 spec:
"Layout qualifiers cannot be used on formal function parameters,
and layout qualification is not included in parameter matching."
However on the Nvidia binary driver they actually fail to compile
if image function params don't have a layout qualifier. This results
in applications such as No Mans Sky using layout qualifiers on params.
I've submitted a CTS test to expose this problem in the Nvidia driver
but until that is resolved this patch will help Mesa drivers work
around the issue.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
We could enable it for lower versions of GL but this allows us
to just use the existing version/extension checks that are already
used by the core profile.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Now that the drivers are lowering to surface indices themselves, we no
longer need to push the surface index into the shader.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the back-end compiler turn image access into magic uniform
reads and there was a complex contract between back-end compiler and
driver about setting up and filling out those params. As of this
commit, both drivers now lower image_deref_load_param_intel intrinsics
to load_uniform intrinsics controlled by the driver and lower the other
image_deref_* intrinsics to image_* intrinsics which take an actual
binding table index. There are still "magic" uniforms but they are now
added and controlled entirely by the driver and that contract no longer
spans components.
This also has the side-effect of making most image use compile-time
binding table indices. Previously, all image access pulled the binding
table index from a uniform. Part of the reason for this was that the
magic uniforms made it difficult to decouple binding table indices from
the uniforms and, since they are indexed completely differently
(especially in Vulkan), it was hard to pull them apart. Now that the
driver is handling both, it's trivial to decouple the two and provide
actual binding table indices.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15166872 -> 15164293 (-0.02%)
instructions in affected programs: 115834 -> 113255 (-2.23%)
helped: 191
HURT: 0
total cycles in shared programs: 571311495 -> 571196465 (-0.02%)
cycles in affected programs: 4757115 -> 4642085 (-2.42%)
helped: 73
HURT: 67
total spills in shared programs: 10951 -> 10926 (-0.23%)
spills in affected programs: 742 -> 717 (-3.37%)
helped: 7
HURT: 0
total fills in shared programs: 22226 -> 22201 (-0.11%)
fills in affected programs: 1146 -> 1121 (-2.18%)
helped: 7
HURT: 0
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The GLSL spec allows you to set both the "readonly" and "writeonly"
qualifiers on images to indicate that it can only be used with
imageSize. However, we had no way of representing this int he linked
shader and flagged it as GL_READ_ONLY. This is good from a "does it use
this buffer?" perspective but not from a format and access lowering
perspective. By using GL_NONE for if "readonly" and "writeonly" are
both set, we can detect this case in the driver and handle it correctly.
Nothing currently relies on the type of surface in the "readonly" +
"writeonly" case but that's about to change. i965 is the only drier
which uses the ImageAccess field and gl_bindless_image::access is
currently unused.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit moves our storage image format conversion codegen into NIR
instead of doing it in the back-end. This has the advantage of letting
us run it through NIR's optimizer which is pretty effective at shrinking
things down. In the common case of rgba8, the number of instructions
emitted after NIR is done with it is half of what it was with the
lowering happening in the back-end. On the downside, the back-end's
lowering is able to directly use predicates and the NIR lowering has to
use IFs.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15166910 -> 15166872 (<.01%)
instructions in affected programs: 5895 -> 5857 (-0.64%)
helped: 15
HURT: 0
Clearly, we don't have that much image_load_store happening in the
shaders in shader-db....
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Blending isn't valid for integer formats. Rather than having drivers
worry about this, just disable blending in this case. This hopefully
will increase hits in the CSO cache as well, by eliminating most of the
meaningless fields in this case.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OpenGL ES spec states:
"For normalized fixed-point rendering surfaces, the combination format
RGBA and type UNSIGNED_BYTE is accepted."
This fixes following failing VK-GL-CTS tests:
KHR-GLES3.packed_pixels.pbo_rectangle.rgba8_snorm
KHR-GLES3.packed_pixels.rectangle.rgba8_snorm
KHR-GLES3.packed_pixels.varied_rectangle.rgba8_snorm
Signed-off-by: Tapani Pälli <[email protected]>
https://bugs.freedesktop.org/show_bug.cgi?id=107658
Cc: [email protected]
Reviewed-by: Marek Olšák <[email protected]>
Tested-by: Andres Gomez <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gen >= 9 have ability to control clamping of depth values separately at
near and far plane.
z_w is clamped to the range [min(n,f), 0] if clamping at near plane is
enabled, [0, max(n,f)] if clamping at far plane is enabled and [min(n,f)
max(n,f)] if clamping at both plane is enabled.
v2: 1) Use better coding style (Ian Romanick)
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
_mesa_set_enable() and _mesa_IsEnabled() extended to accept new two
tokens GL_DEPTH_CLAMP_NEAR_AMD and GL_DEPTH_CLAMP_FAR_AMD.
v2: Remove unnecessary parentheses (Marek Olsak)
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable _mesa_PushAttrib() and _mesa_PopAttrib() to handle
GL_DEPTH_CLAMP_NEAR_AMD and GL_DEPTH_CLAMP_FAR_AMD tokens.
Remove DepthClamp, because DepthClampNear + DepthClampFar replaces it,
as suggested by Marek Olsak.
Driver that enables AMD_depth_clamp_separate will only ever look at
DepthClampNear and DepthClampFar, as suggested by Ian Romanick.
v2: 1) Remove unnecessary parentheses (Marek Olsak)
2) if AMD_depth_clamp_separate is unsupported, TEST_AND_UPDATE
GL_DEPTH_CLAMP only (Marek Olsak)
3) Clamp against near and far plane separately (Marek Olsak)
4) Clip point separately for near and far Z clipping plane (Marek
Olsak)
v3: Clamp raster position zw to the range [min(n,f), 0] for near plane
and [0, max(n,f)] for far plane (Marek Olsak)
v4: Use MIN2 and MAX2 instead of CLAMP (Marek Olsak)
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add some basic types and storage for the AMD_depth_clamp_separate
extension.
v2: 1) Drop unnecessary definition (Marek Olsak)
2) Expose extension in compatibility profile (Marek Olsak)
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Adds suppport for INTEL_fragment_shader_ordering. We achieve
the fragment ordering by using the same instruction as for
beginInvocationInterlockARB() which is by issuing a memory
fence via sendc.
Signed-off-by: Kevin Rogovin <[email protected]>
Reviewed-by: Plamena Manolova <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This extension provides new GLSL built-in function
beginFragmentShaderOrderingIntel() that guarantees
(taking wording of GL_INTEL_fragment_shader_ordering
extension) that any memory transactions issued by
shader invocations from previous primitives mapped to
same xy window coordinates (and same sample when
per-sample shading is active), complete and are visible
to the shader invocation that called
beginFragmentShaderOrderingINTEL().
One advantage of INTEL_fragment_shader_ordering over
ARB_fragment_shader_interlock is that it provides a
function that operates as a memory barrie (instead
of a defining a critcial section) that can be called
under arbitary control flow from any function (in
contrast the begin/end of ARB_fragment_shader_interlock
may only be called once, from main(), under no control
flow.
Signed-off-by: Kevin Rogovin <[email protected]>
Reviewed-by: Plamena Manolova <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
When the SVBI Payload Enable is false I guess the register R1.4
which contains the Maximum Streamed Vertex Buffer Index is filled by zero
and GS stops to write transform feedback when the transform feedback
is not active.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107579
Signed-off-by: Andrii Simiklit <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some hardware can do PIPE_TEX_WRAP_MIRROR_REPEAT but not
PIPE_TEX_WRAP_MIRROR_CLAMP and PIPE_TEX_WRAP_MIRROR_CLAMP_TO_BORDER.
Drivers for such hardware would like to advertise support for
ARB_texture_mirror_clamp_to_edge but not EXT_texture_mirror_clamp.
This commit adds a new PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE bit,
changes the extension enable to be based on that, and enables it
in all upstream drivers which supported PIPE_CAP_TEXTURE_MIRROR_CLAMP
(so they continue supporting this mode).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit ae7898dfdbe5c8dab7d11c71862353f1ae43feb0.
Turns out the python scripts are _not_ fully python 3 compatible.
As Ilia reported using get_xmlpool.py with LANG=C produces some weird
output - see the link for details.
Even though the issue was spotted with the autoconf build, it exposes a
genuine problem with the script (and lack of lang handling of the meson
build.)
https://lists.freedesktop.org/archives/mesa-dev/2018-August/203508.html
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 095515e16ca3cb2c9f1813b6602ee57ae28325a8.
This breaks KHR-GL46.map_buffer_alignment.functional on i965.
This code was apparently not reviewed and I don't know why we would
move from a driver configurable constant to a hardcoded value for all
drivers. This really looks like an accidental hack push.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As far as I can tell, no one reviewed these changes, they made i965
assert fail on driver load, and I am not certain they are correct.
(Hopefully reverting these does not break radeonsi too badly...)
The uniform related changes seem fine and reasonable, but the texture
image units change is possibly incorrect. According to the
OES_tessellation_shader spec issue 5:
(5) How are aggregate shader limits computed?
RESOLVED: Following the GL 4.4 model, but we restrict uniform
buffer bindings to 12/stage instead of 14, this results in
MAX_UNIFORM_BUFFER_BINDINGS = 72
This is 12 bindings/stage * 6 shader stages, allowing a static
partitioning of the bindings even though at most 5 stages can
appear in a program object).
MAX_COMBINED_UNIFORM_BLOCKS = 60
This is 12 blocks/stage * 5 stages, since compute shaders can't
be mixed with other stages.
MAX_COMBINED_TEXTURE_IMAGE_UNITS = 96
This is 16 textures/stage * 6 stages.
which definitely is including compute shaders in that last limit.
Not including compute shaders breaks the following test:
dEQP-GLES31.functional.state_query.integer.max_combined_texture_image_units_getinteger
There was enough breakage that I figured we should just send this back
to the drawing board.
Revert "i965: don't include compute resources in "Combined" limits"
Revert "st/mesa: don't include compute resources in "Combined" limits"
Revert "mesa: don't include compute resources in MAX_COMBINED_* limits"
This reverts commit b03dcb1e5f507c5950d0de053a6f76e6306ee71f.
This reverts commit cff290df4c09547cd2cb3b129ec59bdebdadba90.
This reverts commit 45f87a48f94148b484961f18a4f1ccf86f066b1c.
|
|
|
|
|
| |
This is ASTC 2D LDR allowing texture arrays and 3D, compressing each
slice as a separate 2D image. Tested by piglit. Trivial.
|
|
|
|
| |
same cap as ARB_timer_query, no changes needed, tested by piglit
|
|
|
|
|
|
|
| |
because the closed driver exposes it.
It's the same as the ARB extension.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
| |
it's a subset of the ARB extension.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
because the closed driver exposes it.
This is equivalent to the ARB extension.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
because the closed driver exposes it.
It's equivalent to ARB_gpu_shader_int64.
In this patch, I did everything the same as we do for ARB_gpu_shader_int64.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
| |
It only contains GLSL changes.
v2: allow the layout qualifier on GLSL <= 1.30
|
|
|
|
|
|
|
|
|
|
|
| |
The combined limits should only include shader stages that can be active
at the same time. We don't need to include compute.
See also cff290df4c09547cd2cb3b129ec59bdebdadba90 for st/mesa.
Unbreaks i965 from assert failing on driver load since Marek's
45f87a48f94148b484961f18a4f1ccf86f066b1c, which dropped the core
Mesa capabilities before adjusting driver limits down to match.
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
| |
The combined limits should only include shader stages that can be active
at the same time.
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
|
|
|
|
|
|
|
| |
radeonsi wants to report a different value
Reviewed-by: Ian Romanick <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
|