| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This extension provides new GLSL built-in function
beginFragmentShaderOrderingIntel() that guarantees
(taking wording of GL_INTEL_fragment_shader_ordering
extension) that any memory transactions issued by
shader invocations from previous primitives mapped to
same xy window coordinates (and same sample when
per-sample shading is active), complete and are visible
to the shader invocation that called
beginFragmentShaderOrderingINTEL().
One advantage of INTEL_fragment_shader_ordering over
ARB_fragment_shader_interlock is that it provides a
function that operates as a memory barrie (instead
of a defining a critcial section) that can be called
under arbitary control flow from any function (in
contrast the begin/end of ARB_fragment_shader_interlock
may only be called once, from main(), under no control
flow.
Signed-off-by: Kevin Rogovin <[email protected]>
Reviewed-by: Plamena Manolova <[email protected]>
|
|
|
|
|
|
|
|
|
| |
because the closed driver exposes it.
It's equivalent to ARB_gpu_shader_int64.
In this patch, I did everything the same as we do for ARB_gpu_shader_int64.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
| |
The main purpose for having NV_fragment_shader_interlock
extension is because that extension is also for GLES31 while
the ARB extension is for GL only.
Reviewed-by: Plamena Manolova <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This relaxes a number of ES shader restrictions allowing shaders
to follow more desktop GLSL like rules.
This initial implementation relaxes the following:
- allows linking ES shaders with desktop shaders
- allows mismatching precision qualifiers
- always enables standard derivative builtins
These relaxations allow Google Earth VR shaders to compile.
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This extension provides new GLSL built-in functions
beginInvocationInterlockARB() and endInvocationInterlockARB()
that delimit a critical section of fragment shader code. For
pairs of shader invocations with "overlapping" coverage in a
given pixel, the OpenGL implementation will guarantee that the
critical section of the fragment shader will be executed for
only one fragment at a time.
Signed-off-by: Plamena Manolova <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
- remove mtypes.h from most header files
- add main/menums.h for often used definitions
- remove main/core.h
v2: fix radv build
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NIR does not have these instructions. TGSI and Mesa IR both implement
them using < and >=, repsectively. Removing them deletes a bunch of
code and means I don't have to add code to the SPIR-V generator for
them.
v2: Rebase on 2+ years of change... and fix a major bug added in the
rebase.
text data bss dec hex filename
8255291 268856 294072 8818219 868e2b 32-bit i965_dri.so before
8254235 268856 294072 8817163 868a0b 32-bit i965_dri.so after
7815339 345592 420592 8581523 82f193 64-bit i965_dri.so before
7813995 345560 420592 8580147 82ec33 64-bit i965_dri.so after
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cloning was introduced in f81ede469910d to fix a problem with
shaders including IR that was owned by builtins.
However the approach of cloning the whole function each time we
reference a builtin lead to a significant reduction in the GLSL
IR compilers performance.
The previous patch fixes the ownership problem in a more precise
way. So we can now remove this cloning.
Testing on a Ryzen 7 1800X shows a ~15% decreases in compiling the
Deus Ex: Mankind Divided shaders on radeonsi (which take 5min+ on
some machines). Looking just at the GLSL IR compiler the speed up
is ~40%.
Tested-by: Dieter Nützel <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
Other ones are either unsupported or don't have any helper
function checks.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
_mesa_glsl_has_builtin_function is used to determine whether any variant
of a builtin are available, for the purpose of enforcing the GLSL ES
3.00+ rule that overloads or overrides of builtins are disallowed.
However the builtin_builder contains information on all builtins,
irrespective of parse state, or versions, or extension enablement. As a
result we would say that a builtin existed even if it was not actually
available.
To resolve this, first check if at least one signature is available for
a builtin before returning true.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101666
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
Reviewed-by: Timothy Arceri <[email protected]>
Acked-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It doesn't make sense to prefix them with 'image' because
they are called "Memory Qualifiers" and they can be applied
to members of storage buffer blocks.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Andres Gomez <[email protected]>
|
| |
|
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
| |
For both consistency and new bindless sampler types.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
| |
For both consistency and new bindless sampler types.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
| |
For both consistency and new bindless sampler types.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some versions of MinGW-w64 such as 5.3.1 and 6.2.0 produce bad code
with -O2 or -O3 causing a random driver crash when running programs
that use GLSL. Most Mesa demos in the glsl/ directory trigger the
bug, but not the fragcoord.c test.
Use a #pragma to force -O1 for this file for later MinGW versions.
Luckily, this is basically one-time setup code. I suspect the bug
is related to the sheer size of this file.
This should let us move to newer versions of MinGW-w64 for Mesa.
Reviewed-by: Jose Fonseca <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
| |
The underlying intrinsic is defined to always have a uvec2 return type.
Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Builtins are created once and allocated using their own private ralloc
context. When reparenting IR that includes builtins, we might be steal
bits of builtins. This is problematic because these builtins might now
be freed when the shader that includes then last is disposed. This
might also lead to inconsistent ralloc trees/lists if shaders are
created on multiple threads.
Rather than including builtins directly into a shader's IR, we should
include clones of them in the ralloc context of the shader that
requires them. This fixes double free issues we've been seeing when
running shader-db on a big multicore (72 threads) server.
v2: Also rename _mesa_glsl_find_builtin_function_by_name() to better
reflect how this function is used. (Ken)
v3: Rename ctx to mem_ctx (Ken)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As per the spec -
"The functions memoryBarrierShared() and groupMemoryBarrier() are
available only in compute shaders; the other functions are available
in all shader types."
Conform to this by adding another delegate to check for compute
shader support instead of only whether the current stage is compute
This allows some fragment shaders in Dirt Rally to compile
Cc: "17.0" <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Juan A. Suarez Romero <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zero/infinity.
This addresses several issues of the current atan2 implementation:
- Negative zero (and negative denorms which end up getting flushed to
zero) isn't handled correctly by the current implementation. The
reason is that it does 'y >= 0' and 'x < 0' comparisons to decide
on which side of the branch cut the argument is, which causes us to
return incorrect results (off by up to 2π) for very small negative
values.
- There is a serious precision problem for x values of large enough
magnitude introduced by the floating point division operation being
implemented as a mul+rcp sequence. This can lead to the quotient
getting flushed to zero in some cases introducing an error of over
8e6 ULP in the result -- Or in the most catastrophic case will
cause us to return NaN instead of the correct value ±π/2 for y=±∞
and x very large. We can fix this easily by scaling down both
arguments when the absolute value of the denominator goes above
certain threshold. The error of this atan2 implementation remains
below 25 ULP in most of its domain except for a neighborhood of y=0
where it reaches a maximum error of about 180 ULP.
- It emits a bunch of instructions including no less than three
if-else branches per scalar component that don't seem to get
optimized out later on. This implementation uses about 13% less
instructions on Intel SKL hardware and doesn't emit any control
flow instructions.
v2: Fix up argument scaling to take into account the range and
precision of exotic FP24 hardware. Flip coordinate system for
arguments along the vertical line as if they were on the left
half-plane in order to avoid division by zero which may give
unspecified results on non-GLSL 4.1-capable hardware. Sprinkle in
some more comments.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These functions are directly available in shaders. A #define is added
to detect the presence. This allows these functions to be tested using
piglit regardless of whether the driver uses them for lowering. The
GLSL spec says that functions and macros beginning with __ are reserved
for use by the implementation... hey, that's us!
v2: Use function inlining.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These functions are directly available in shaders. A #define is added
to detect the presence. This allows these functions to be tested using
piglit regardless of whether the driver uses them for lowering. The
GLSL spec says that functions and macros beginning with __ are reserved
for use by the implementation... hey, that's us!
v2: Use function inlining.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
These functions are directly available in shaders. A #define is added
to detect the presence. This allows these functions to be tested using
piglit regardless of whether the driver uses them for lowering. The
GLSL spec says that functions and macros beginning with __ are reserved
for use by the implementation... hey, that's us!
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
These functions are directly available in shaders. A #define is added
to detect the presence. This allows these functions to be tested using
piglit regardless of whether the driver uses them for lowering. The
GLSL spec says that functions and macros beginning with __ are reserved
for use by the implementation... hey, that's us!
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
If ARB_gpu_shader_int64 is supported, ARB_shader_clock also adds
clockARB() that returns a uint64_t. Rather than add new opcodes and
intrinsics for this, just wrap the existing intrinsic with a
packUint2x32.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These are all the allowed 64-bit functions from ARB_gpu_shader_int64
spec.
v2: restrict int64/double functions better.
v3 (idr): Delete spurious blank lines. Suggested by Matt.
Signed-off-by: Dave Airlie <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
According to OpenGL Shading Language 4.50 spec, Section 8.7 "Vector
Relational Functions", functions of this type do not operate on scalar
types, so remove scalar types from signature definitions to make the
behavior consistent with glslangValidator and other drivers.
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Boyan Ding <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
git grep -l comparitor | xargs sed -i 's/comparitor/comparator/g'
Just happened to notice this in a patch that was sent and included one
of the tokens in question.
Signed-off-by: Ilia Mirkin <[email protected]>
Acked-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The formula we have used in the past is a trivial reduction from the
definition by simply multiplying both the numerator and denominator of the
formula by 2. However, multiplying by e^x, you can further reduce it.
This allows us to get rid of one side of the clamp and two of exponential
functions which should make it faster. The new formula still passes the
dEQP precision tests for tanh so it should be fine.
Reviewed-by: Roland Scheidegger <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clamp input scalar value to range [-10, +10] to avoid precision problems
when the absolute value of input is too large.
Fixes dEQP-GLES3.functional.shaders.builtin_functions.precision.tanh.* test
failures.
v2: added more explanation in the comment.
v3: fixed a typo in the comment.
Signed-off-by: Haixia Shi <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This has apparently never existed in GLSL ES.
Fixes dEQP-GLES3.functional.shaders.texture_functions.invalid
.textureoffset_sampler2darrayshadow_vec4_ivec2_vertex and
.textureoffset_sampler2darrayshadow_vec4_ivec2_fragment
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98244
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just generate an __intrinsic_atomic_add with a negated parameter.
Some background on the non-obvious reasons for the the big change to
builtin_builder::call()... this is cribbed from some discussion with
Ilia on mesa-dev.
Why change builtin_builder::call() to allow taking dereferences and
create them here rather than just feeding in the ir_variables directly?
The problem is the neg_data ir_variable node would have to be in two
lists at the same time: the instruction stream and parameters. The
ir_variable node is automatically added to the instruction stream by the
call to make_temp. Restructuring the code so that the ir_variables
could be in parameters then move them to the instruction stream would
have been pretty terrible.
ir_call in the instruction stream has an exec_list that contains
ir_dereference_variable nodes.
The builtin_builder::call method previously took an exec_list of
ir_variables and created a list of ir_dereference_variable. All of the
original users of that method wanted to make a function call using
exactly the set of parameters passed to the built-in function (i.e.,
call __intrinsic_atomic_add using the parameters to atomicAdd). For
these users, the list of ir_variables already existed: the list of
parameters in the built-in function signature.
This new caller doesn't do that. It wants to call a function with a
parameter from the function and a value calculated in the function. So,
I changed builtin_builder::call to take a list that could either be a
list of ir_variable or a list of ir_dereference_variable. In the former
case it behaves just as it previously did. In the latter case, it uses
(and removes from the input list) the ir_dereference_variable nodes
instead of creating new ones.
text data bss dec hex filename
6036395 283160 28608 6348163 60dd83 lib64/i965_dri.so before
6036923 283160 28608 6348691 60df93 lib64/i965_dri.so after
Signed-off-by: Ian Romanick <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
text data bss dec hex filename
6036491 283160 28608 6348259 60dde3 lib64/i965_dri.so before
6036395 283160 28608 6348163 60dd83 lib64/i965_dri.so after
Signed-off-by: Ian Romanick <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This necessetated renaming the is_intrinsic field to _is_intrinsic. The
next commit will remove the field.
text data bss dec hex filename
6036507 283160 28608 6348275 60ddf3 lib64/i965_dri.so before
6036491 283160 28608 6348259 60dde3 lib64/i965_dri.so after
Signed-off-by: Ian Romanick <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
text data bss dec hex filename
6037483 283160 28608 6349251 60e1c3 lib64/i965_dri.so before
6038043 283160 28608 6349811 60e3f3 lib64/i965_dri.so after
Signed-off-by: Ian Romanick <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
This is now handled directly by ast_function.cpp.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by; Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
This is identical to OES_texture_cube_map_array support. dEQP has tests
which use this extension. Also it is part of AEP.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This has a separate enable flag because this extension also requires
OES_geometry_shader. It is possible that some drivers may support
OpenGL ES 3.1 and ARB_texture_cube_map but not support
OES_geometry_shader.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Iago Toral Quiroga <[email protected]>
|