| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
For GS input arrays, we may turn a packed_type of ivec4 into an
array of ivec4s. We still want flat qualification.
Found by inspection. Not known to help anything.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Allow that capability if the driver indicates that it is supported, and
flag whether images are read-only/write-only in the nir_variable (based
on the NonReadable and NonWritable decorations), which drivers may need
to implement this.
Signed-off-by: Alex Smith <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
As soon as we support shaderStorageImageWriteWithoutFormat we can see
write-only images (sampled == 2) that don't have a format specified.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Passes the newly added piglit test for this extension on i965.
V2: Fix comments by Ilia.
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TCS and TES inputs without an array size are implicitly sized to
gl_MaxPatchVertices. But TCS outputs are apparently not:
"If no size is specified, it will be taken from the output patch size
(gl_VerticesOut) declared in the shader."
Fixes dEQP-GLES31.functional.program_interface_query.program_output.
array_size.separable_tess_ctrl.var.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
|
|
|
| |
OpenGL ES actually has spec text to prohibit this. It's just OpenGL
that's confusing.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From GLSL ES 3.10 spec, section 4.1.9 "Arrays":
"If an array is declared as the last member of a shader storage block
and the size is not specified at compile-time, it is sized at run-time.
In all other cases, arrays are sized only at compile-time."
In desktop GLSL it is allowed to have unsized-arrays that are
not last, as long as we can determine that they are implicitly
sized, which is detected at link-time.
With this patch Mesa reports a compilation error as glslang does with
the following shader:
buffer SSBO { vec4 data[]; vec4 moreData;};
void main (void)
{
}
Fixes:
dEQP-GLES31.functional.debug.negative_coverage.log.shader.compile_compute_shader
dEQP-GLES31.functional.debug.negative_coverage.callbacks.shader.compile_compute_shader
dEQP-GLES31.functional.debug.negative_coverage.get_error.shader.compile_compute_shader
Cc: "17.0" <[email protected]>
Signed-off-by: Jose Maria Casanova Crespo <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously if you used MESA_GL_VERSION_OVERRIDE=3.3COMPAT, Mesa exposed
an OpenGL 3.3 compatibility profile context (with various unimplemented
features and bugs), but still refused to compile shaders with
#version 330 compatibility
This patch simply adds a small bit of plumbing to let that through.
Of course the same caveats apply: compatibility profile is still not
supported (and will not be supported), so there are no guarantees that
anything will work.
Tested-by: Dylan Baker <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99660
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
This will be used to skip checking the cache and force a recompile.
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the on-disk shader cache we want to be able to differentiate
between a program that was linked and one that was loaded from cache.
V2:
- don't return the new enum directly to the application when queried,
instead return GL_TRUE or GL_FALSE as required. Fixes google-chrome
corruptions when using cache.
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99465
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As per the spec -
"The functions memoryBarrierShared() and groupMemoryBarrier() are
available only in compute shaders; the other functions are available
in all shader types."
Conform to this by adding another delegate to check for compute
shader support instead of only whether the current stage is compute
This allows some fragment shaders in Dirt Rally to compile
Cc: "17.0" <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Juan A. Suarez Romero <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Juan A. Suarez Romero <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of zero/infinity.
See "glsl: Rewrite atan2 implementation to fix accuracy and handling
of zero/infinity." for the rationale, but note that the instruction
count benefit discussed there is somewhat less important for the SPIRV
implementation, because the current code already emitted no control
flow instructions -- Still this saves us one hardware instruction per
scalar component on Intel SKL hardware.
Fixes the following Vulkan CTS tests on Intel hardware:
dEQP-VK.glsl.builtin.precision.atan2.highp_compute.scalar
dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec2
dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec3
dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec4
dEQP-VK.glsl.builtin.precision.atan2.mediump_compute.vec2
dEQP-VK.glsl.builtin.precision.atan2.mediump_compute.vec4
Note that most of the test-cases above expect IEEE-compliant handling
of atan2(±∞, ±∞), which this patch doesn't explicitly handle, so
except for the last two the test-cases above weren't expected to pass
yet. The reason they do is that the i965 back-end implementation of
the NIR fmin and fmax instructions is not quite GLSL-compliant (it
complies with IEEE 754 recommendations though), because fmin/fmax of a
NaN and a non-NaN argument currently always return the non-NaN
argument, which causes atan() to flush NaN to one and return the
expected value. The front-end should probably not be relying on this
behavior for correctness though because other back-ends are likely to
behave differently -- A follow-up patch will handle the atan2(±∞, ±∞)
corner cases explicitly.
v2: Fix up argument scaling to take into account the range and
precision of exotic FP24 hardware. Flip coordinate system for
arguments along the vertical line as if they were on the left
half-plane in order to avoid division by zero which may give
unspecified results on non-GLSL 4.1-capable hardware. Sprinkle in
some more comments.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zero/infinity.
This addresses several issues of the current atan2 implementation:
- Negative zero (and negative denorms which end up getting flushed to
zero) isn't handled correctly by the current implementation. The
reason is that it does 'y >= 0' and 'x < 0' comparisons to decide
on which side of the branch cut the argument is, which causes us to
return incorrect results (off by up to 2π) for very small negative
values.
- There is a serious precision problem for x values of large enough
magnitude introduced by the floating point division operation being
implemented as a mul+rcp sequence. This can lead to the quotient
getting flushed to zero in some cases introducing an error of over
8e6 ULP in the result -- Or in the most catastrophic case will
cause us to return NaN instead of the correct value ±π/2 for y=±∞
and x very large. We can fix this easily by scaling down both
arguments when the absolute value of the denominator goes above
certain threshold. The error of this atan2 implementation remains
below 25 ULP in most of its domain except for a neighborhood of y=0
where it reaches a maximum error of about 180 ULP.
- It emits a bunch of instructions including no less than three
if-else branches per scalar component that don't seem to get
optimized out later on. This implementation uses about 13% less
instructions on Intel SKL hardware and doesn't emit any control
flow instructions.
v2: Fix up argument scaling to take into account the range and
precision of exotic FP24 hardware. Flip coordinate system for
arguments along the vertical line as if they were on the left
half-plane in order to avoid division by zero which may give
unspecified results on non-GLSL 4.1-capable hardware. Sprinkle in
some more comments.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Juan A. Suarez Romero <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Will avoid a regression in a future commit that introduces some
additional rcp operations. According to the GLSL 4.10 specification:
"Dividing by 0 results in the appropriately signed IEEE Inf."
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Juan A. Suarez Romero <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The `end+1` skips the ']', whereas the `strlen+1` includes the final
'\0' in the move to terminate the string.
Cc: [email protected]
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Timothy Arceri <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The shader cache is expected to be developed incrementally over a
fairly long series of commits. For that period of instability, we
require users to opt into the shader cache by setting:
MESA_GLSL_CACHE_ENABLE=1
In the future, when the shader cache is complete, we can revert this
commit so that the cache will be on by default.
The user can always disable the cache with
MESA_GLSL_CACHE_DISABLE=1. That functionality is not affected by this
commit, (nor will it be affected by the future revert).
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
Correctly handled by all the build systems.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
MSVC warns about different const qualifiers. Add the extra const to
silence it.
nir_phi_builder.c(244) : warning C4090: 'initializing' : different 'const' qualifiers
nir_phi_builder.c(245) : warning C4090: 'initializing' : different 'const' qualifiers
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
MSVC warns about implicit conversion as below. Annotate the literal
appropriately to silence the warning.
nir_gather_info.c(249) : warning C4334: '<<' : result of 32-bit shift
implicitly converted to 64 bits (was 64-bit shift intended?)
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Fixes:
dEQP-VK.spirv_assembly.instruction.compute.opspecconstantop.vector_related
dEQP-VK.spirv_assembly.instruction.graphics.opspecconstantop.vector_related*
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: "17.0 13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Looking at the following bit of SPIRV shader :
...
%zero = OpConstant %i32 0
%ivec3_0 = OpConstantComposite %ivec3 %zero %zero %zero
%vec3_undef = OpUndef %ivec3
%sc_0 = OpSpecConstant %i32 0
%sc_1 = OpSpecConstant %i32 0
%sc_2 = OpSpecConstant %i32 0
...
Our compiler currently stops parsing variables & types on the OpUndef
and switches to instructions, leaving the following sc_[0-2] variables
untreated.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: "17.0 13.0" <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SPIR-V maps both gl_SampleMask and gl_SampleMaskIn to the same
builtin (SampleMask). The only way to tell which one we are dealing with
is to check if it is an input or an output.
Fixes:
dEQP-VK.pipeline.multisample_shader_builtin.sample_mask.write.*
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original number was chosen in an attempt to match the limits applied to
GLSL IR.
A look at the git history of the why these limits were chosen for GLSL IR
shows it was more to do with the slow speed of unrolling large loops in
GLSL IR than anything else. The speed of loop unrolling in NIR is not a
problem so we may wish to bump this even higher in future.
No shader-db change, however a furture change will disbale the GLSL IR
optimisation loop in the i965 backend results in 4 loops from The Talos
Principle failing to unroll. Bumping the limit allows them to unroll which
results in the instruction count matching the previous output from when the
GLSL IR opts were still enabled.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the constant array would not get copy propagated until the backend
did its GLSL IR opt loop. I plan on removing that from i965 shortly which
caused huge regressions in Deus-ex and Tomb Raider which have large
constant arrays. Moving lowering before the opt loop in the GLSL linker
fixes this and unexpectedly improves some compute shaders also.
shader-db results BDW:
instructions helped: shaders/closed/steam/deus-ex-mankind-divided/374.shader_test CS SIMD16: 204 -> 194 (-4.90%)
instructions helped: shaders/closed/steam/deus-ex-mankind-divided/318.shader_test CS SIMD8: 1010 -> 741 (-26.63%)
instructions helped: shaders/closed/steam/deus-ex-mankind-divided/144.shader_test CS SIMD8: 542 -> 385 (-28.97%)
cycles helped: shaders/closed/steam/deus-ex-mankind-divided/318.shader_test CS SIMD8: 1831382 -> 1818492 (-0.70%)
cycles helped: shaders/closed/steam/deus-ex-mankind-divided/144.shader_test CS SIMD8: 216238 -> 206180 (-4.65%)
cycles helped: shaders/closed/steam/deus-ex-mankind-divided/374.shader_test CS SIMD16: 18484 -> 16644 (-9.95%)
total instructions in shared programs: 13060313 -> 13059877 (-0.00%)
instructions in affected programs: 1756 -> 1320 (-24.83%)
helped: 3
HURT: 0
total cycles in shared programs: 256586698 -> 256561910 (-0.01%)
cycles in affected programs: 2066104 -> 2041316 (-1.20%)
helped: 3
HURT: 0
V3: only call the opt loop if lowering progressed (Suggested by Eric)
V2: call opts before and after lowering (Suggested by Ken)
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
define __STDC_FORMAT_MACROS and include <inttypes.h> (same as
ir_builder_print_visitor.cpp already does).
Otherwise, some mingw build errors out (since
8e7e1ae0365ddc7edb0d4d98250ab46728e6c14a and
bbce1c538dc0cb8bf3769510283d11847dc07540 presumably) with:
src/compiler/glsl/ir_print_visitor.cpp:479:40: error: expected ‘)’ before ‘PRIu64’
case GLSL_TYPE_UINT64:fprintf(f, "%" PRIu64, ir->value.u64[i]); break;
(Note even with that fix I get other format specifier warnings:
src/compiler/glsl/ir_print_visitor.cpp:473:47:
warning: unknown conversion type character ‘a’ in format [-Wformat=]
fprintf(f, "%a", ir->value.f[i]);
^
src/compiler/glsl/ir_print_visitor.cpp:473:47:
warning: too many arguments for format [-Wformat-extra-args]
but it still compiles at least)
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Tested-by: Glenn Kennard <[email protected]>
Tested-by: James Harvey <[email protected]>
Cc: 17.0 <[email protected]>
|
|
|
|
|
| |
Fixes regression caused by cbeba6bd48da2c. I accidentally pushed the
wrong version of the patch.
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
There are some line wrapping violations here but those lines will get
deleted in the following patch.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Now that the i965 backend doesn't depend on this field we can
make it more generic and short circuit a bunch of code paths.
The new field will be used in a following patch for another
clean-up.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous code always compared integers as 64-bit. Due to variations
in sign-extension in the code generated by nir_opt_algebraic.py, this
meant that nir_search doesn't always do what you want. Instead, 32-bit
values should be matched as 32-bit and 64-bit values should be matched
as 64-bit. While we're here we unify the unsigned and signed paths.
Now that we're using the right bit size, they should be the same since
the only difference we had before was sign extension.
This gets the UE4 bitfield_extract optimization working again. It had
stopped working due to the constant 0xff00ff00 getting sign-extended
when it shouldn't have.
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Cc: "17.0 13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
basetsd.h on Windows defines INT64 and UINT64 typedefs which conflict
with these. Append "_TOK" to avoid conflicts.
Should fix the Windows build.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2: Don't up-convert the shift count parameter if shift instructions.
Suggested by Connor. Add type_is_singed() function. This will make
adding 8- and 16-bit types easier.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
Cc: Jason Ekstrand <[email protected]>
|