| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
This will allow us to later split gl_shader into two structs.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
| |
This will allow us to split gl_shader into two different structs, one for
shader objects and one for linked shaders.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Rather than passing in gl_shader we now pass in the IR. This will
allow us to later split gl_shader into two structs. One for use
as a linked per stage shader struct and one for use as a GL shader
object.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The only part of an ir_texture which can be an array is the
offsets array in textureGatherOffsets() calls. We don't want
to lower those, because they're required to remain constants.
Fixes textureGatherOffsets with Gallium drivers such as llvmpipe,
which commit ef78df8d3b0cf540e5f08c8c2f6caa338b64a6c7 regressed.
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The intent was to continue down the indirect chain, not to call ourselves
with unchanged input arguments. Found by code inspection, and comparison
to copy_prop_alu_src().
We haven't hit this because callers of NIR's copy prop are doing so in
SSA, before indirect variable dereferences have been lowered to registers.
Reviewed-by: Rob Clark <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
It defaults to true so default behavior doesn't change but it allows you to
do NIR_VALIDATE=false if you don't want validation. Disabling validation
can substantially speed up shader compiles so you frequently want to turn
it off if compiler invariants aren't in question.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clean up misrepetitions ('if if', 'the the' etc) found throughout the
comments. This has been done manually, after grepping
case-insensitively for duplicate if, is, the, then, do, for, an,
plus a few other typos corrected in fly-by
v2:
* proper commit message and non-joke title;
* replace two 'as is' followed by 'is' to 'as-is'.
v3:
* 'a integer' => 'an integer' and similar (originally spotted by
Jason Ekstrand, I fixed a few other similar ones while at it)
Signed-off-by: Giuseppe Bilotta <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Constant propagation on arrays doesn't make a lot of sense. If the
array is only accessed with constant indexes, then opt_array_splitting
would split it up. Otherwise, we have variable indexing. If there's
multiple accesses, then constant propagation would end up replicating
the data.
The lower_const_arrays_to_uniforms pass creates uniforms for each
ir_constant with array type that it encounters. This means that it
creates redundant uniforms for each copy of the constant, which means
uploading too much data. It can even mean exceeding the maximum number
of uniform components, causing link failures.
We could try and teach the pass to de-duplicate the data by hashing
constants, but it makes more sense to avoid duplicating it in the first
place. We should promote constant arrays to uniforms, then propagate
the uniform access.
Fixes the TressFX shaders from Tomb Raider, which exceeded the maximum
number of uniform components by a huge margin and failed to link.
On Broadwell:
total instructions in shared programs: 9067702 -> 9068202 (0.01%)
instructions in affected programs: 10335 -> 10835 (4.84%)
helped: 10 (Hoard, Shadow of Mordor, Amnesia: The Dark Descent)
HURT: 20 (Natural Selection 2)
loops in affected programs: 4 -> 0
The hurt programs appear to no longer have a constarray uniform, as
all constants were successfully propagated. Apparently before this
patch, we successfully unrolled a loop containing array access, but
only after promoting constant arrays to uniforms. With this patch,
we unroll it first, so all array access is direct, and the array
is split up, and individual constants are propagated. This seems
better.
Cc: [email protected]
Reported-by: Karol Herbst <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's really no point in looking at ir_dereference_array of a
constant. It also misses cases like:
(assign () (var_ref tmp) (constant (array ...) ...))
No changes in shader-db, but keeps it working after the next commit.
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The new uniform may need precise as well.
Fixes copy propagation of constant array uniforms in Tomb Raider shaders.
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we failed to split constant arrays. Code such as
int[2] numbers = int[](1, 2);
would generates a whole-array assignment:
(assign () (var_ref numbers)
(constant (array int 4) (constant int 1) (constant int 2)))
opt_array_splitting generally tried to visit ir_dereference_array nodes,
and avoid recursing into the inner ir_dereference_variable. So if it
ever saw a ir_dereference_variable, it assumed this was a whole-array
read and bailed. However, in the above case, there's no array deref,
and we can totally handle it - we just have to "unroll" the assignment,
creating assignments for each element.
This was mitigated by the fact that we constant propagate whole arrays,
so a dereference of a single component would usually get the desired
single value anyway. However, I plan to stop doing that shortly;
early experiments with disabling constant propagation of arrays
revealed this shortcoming.
This patch causes some arrays in Gl32GSCloth's geometry shaders to be
split, which allows other optimizations to eliminate unused GS inputs.
The VS then doesn't have to write them, which eliminates the entire VS
(5 -> 2 instructions). It still renders correctly.
No other change in shader-db.
v2: Drop !AOA check and improve a comment (feedback from Tim Arceri).
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
opt_constant_propagation.cpp contains constant folding code which can
actually do constant propagation in some cases. It was happily
propagating constants into the left-hand-side of assignments.
For example,
(assign () (var_ref temp) (constant ...))
would brilliantly be turned into:
(assign () (constant ...) (constant ....))
This is a bigger hammer than necessary - it prevents propagation
into the left-hand-side altogether. We could certainly do better
someday. Notably, the constant propagation pass itself already
takes this approach - it's just the constant propagation pass's
built-in constant folding code (which actually propagates, too)
that was broken.
No change in shader-db, but prevents regressions after future commits.
It seems plausible that this could be hit today, but I haven't seen it
happen.
Cc: [email protected]
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already store these in gl_shader and gl_program here we
remove it from gl_shader_program and just use the values
from gl_shader.
This will allow us to keep the shader cache restore code as
simple as it can be while making it somewhat clearer where these
values originate from.
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already store this in gl_shader and gl_program here we
remove it from gl_shader_program and just use the values
from gl_shader.
This will allow us to keep the shader cache restore code as
simple as it can be while making it somewhat clearer where these
values originate from.
V2: remove unnecessary NULL check
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Iago Toral <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's special logic around finding gl_FragData. It latches onto any
array with FRAG_RESULT_DATA0. However gl_SecondaryFragDataEXT[], added
by GL_EXT_blend_func_extended, fits those parameters as well. The real
frag data array should have index 0 though, so we can use that to
distinguish them.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96617
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "11.1 11.2 12.0" <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
SPIR-V treats it as an input but NIR wants the system value. This
shouldn't have been too much of a surprise given that we have to do the
same conversion in the GLSL IR to NIR pass.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Just setting builder->exact isn't sufficient because that only applies to
instructions that are built with the builder but instructions created
manually and only inserted using the builder are left alone.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This pass is similar to propagate_invariance in the GLSL compiler. The
real "output" of this pass is that any algebraic operations which are
eventually consumed by an invariant variable get marked as "exact".
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While mathematically correct, these two optimizations result in an
expression with substantially lower precision than the original. For any
positive finite floating-point value, log2(x) is well-defined and finite.
More precisely, it is in the range [-150, 150] so any sum of logarithms
log2(a) + log2(b) is also well-defined and finite as long as a and b are
both positive and finite. However, if a and b are either very small or
very large, their product may get flushed to infinity or zero causing
log2(a * b) to be nowhere close to log2(a) + log2(b).
This imprecision was causing incorrect rendering in Talos Principal because
part of its HDR rendering process involves doing 8 texture operations,
clamping the result to [0, 65000], taking a dot-product with a constant,
and then taking the log2. This is done 6 or 8 times and summed to produce
the final result which is written to a red texture. In cases where you
have a region of the screen that is very dark, it can end up getting a
result value of -inf which is not what is intended.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96425
Cc: "11.1 11.2 12.0" <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously some callers of precision_qualifier_allowed would strip the
arrayness from the type and some would not. As a result, some places
would not notice that float[6], for example, needed a precision
qualifier.
Fixes the new piglit test no-default-float-array-precision.frag.
Signed-off-by: Ian Romanick <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96358
Cc: "12.0" <[email protected]>
Cc: Gregory Hainaut <[email protected]>
Cc: Ilia Mirkin <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead use the internal gl_shader_stage enum everywhere. This
makes things more consistent and gets rid of unnecessary
conversions.
Ideally it would be nice to remove the Type field from gl_shader
altogether but currently it is used to differentiate between
gl_shader and gl_shader_program in the ShaderObjects hash table.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
i965 has no special hardware for this, so the best way to implement
this is to pass it in via a uniform.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Alejandro Piñeiro <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
| |
i965 has no special hardware for this, so we need to pass this value in
as a uniform (unless the TES is linked against a TCS, in which case the
linker can just replace this with a constant).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Alejandro Piñeiro <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Built-in variable "MaxCombinedShaderStorageBlocks" was added to GLSL 4.40
revision 9.
Section "1.2.1 Changes since revision 8 of GLSL version 4.40",
page 3 of the PDF states:
"Bug 11734: Add gl_MaxCombinedShaderOutputResources and mark
gl_MaxCombinedImageUnitsAndFragmentOutputs as deprecated."
Fixes: GL44-CTS.shader_image_load_store.basic-glsl-const
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This check was removed in 5b2675093e86 add it back in.
Reviewed-by: Dave Airlie <[email protected]>
Cc: "12.0" <[email protected]>
https://bugs.freedesktop.org/show_bug.cgi?id=96349
|
|
|
|
|
|
|
|
|
|
|
| |
This change makes sure to remove arrays when checking if type
is a double.
The check for the end of the first slot of a multi-slot double
is also fixed by bumping the check to 4 rather than 3.
Previously we were we not reserving the last component.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Since this extension allows more than one varying to share a single
location we can't just count the number of slots a varying takes and
add it to the total.
Instead we now reuse the reserved varyings bitfield to determine how
many slots are reserved for explicit locations instead.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Adding 64-bit integers support was going to make this file worse,
just remove the tabs from it now.
Acked-by: Timothy Arceri <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
In the future int64 support will have the same requirements.
Reviewed-by: Ilia Mirkin <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
This is prep work for int64 support.
Reviewed-by: Ilia Mirkin <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
This is just prep work for int64 support, changing
places where 64-bit matters no doubles.
Reviewed-by: Ilia Mirkin <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
This just moves code to the new check in advance of int64 support.
Reviewed-by: Ilia Mirkin <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This adds an inline and type query for if a type is 64-bit.
Fow now this is equivalent to double, but int64 will change
this.
Reviewed-by: Ilia Mirkin <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This just stops counting and assigning a storage location for
these uniforms, the count is only used to create the uniform storage.
These uniform types don't use this storage.
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Could cause issues if you tried to read from an uninitialised pointer.
This just initalises the pointer to null to avoid that being a problem.
Discovered by Coverity.
CID: 1343616
Signed-off-by: Jakob Sinclair <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've had a FINISHME here since Eric originally wrote the code in 2011.
This patch implements his suggested approach, which makes us actually
able to copy propagate into the loops, at the unfortunate cost of making
this pass even more expensive.
The shader-db statistics are basically a wash:
No change in instruction counts.
total cycles in shared programs: 78685980 -> 78680730 (-0.01%)
cycles in affected programs: 2102646 -> 2097396 (-0.25%)
helped: 48
HURT: 83
I figured if we're going to do this for one copy propagation pass,
we may as well do it in both.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've had a FINISHME here since Eric originally wrote the code in 2010.
This patch implements his suggested approach, which makes us actually
able to copy propagate into the loops, at the unfortunate cost of making
this pass even more expensive.
The shader-db statistics are not terribly impressive:
total instructions in shared programs: 9008589 -> 9008613 (0.00%)
instructions in affected programs: 4293 -> 4317 (0.56%)
helped: 0
HURT: 10
total cycles in shared programs: 78550978 -> 78575760 (0.03%)
cycles in affected programs: 655426 -> 680208 (3.78%)
helped: 75
HURT: 88
GAINED: 2
Most of the "regressions" appear to be us successfully copy propagating
uniforms, which i965 is loading as pull constants instead of push, so we
occasionally have two pulls instead of one. That doesn't seem like this
pass's job - it's propagating correctly, and we should be smarter about
pull loads in the backend.
This patch is also useful for a couple of reasons:
1. It can clean up copies created by varying packing (previously, we
couldn't if the uses were inside a loop).
This fixes a bug when interpolateAt*() is used on a packed varying
inside a loop: glsl_to_nir struggles to see through the extra copy
and mistakenly believed the variable was not an input.
2. It will help propagate uniform array access created by
lower_const_array_to_uniforms().
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From GLSL 4.5 spec, "4.4.2.3 Geometry Outputs".
"all geometry shader output vertex count declarations in a
program must declare the same count."
Fixes:
GL45-CTS.geometry_shader.output.conflicted_output_vertices_max
Reviewed-by: Alejandro Piñeiro <[email protected]>
Cc: "11.2 12.0" <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Although the glsl_types.h stores this in a bitfield,
we should hide that from everyone else. Hide the cast
in an accessor method and use the enum everywhere.
This makes things a bit nicer in gdb, and improves type
safety.
v2: fix a few pieces of interface I missed that caused some
piglit regressions.
Signed-off-by: Dave Airlie <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With tessellation shaders we can have cases where we have
arrays of anon structs, so make sure we match using without_array().
Fixes:
GL45-CTS.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_in
v2:
test lengths match as well (Ilia)
v3:
descend array lengths to check for matches as well (Ilia)
Reviewed-by: Ilia Mirkin <[email protected]>
Cc: "12.0" <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a crash in
GL43-CTS.shader_subroutine.subroutines_not_allowed_as_variables_constructors_and_argument_or_return_types
If we can't find the func_name in one of these paths,
we have emitted an earlier error so just return here.
Reviewed-by: Timothy Arceri <[email protected]>
Cc: "11.2 12.0" <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GL43-CTS.compute_shader.work-group-size does
uniform uint g_uniform[gl_WorkGroupSize.z + 20] = { 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24 };
The initializer triggers the GLSL 4.30/GLES3 tests
for constant sequence subexpressions, so it doesn't
happen unless you are using those, so just return
false as this path is now reachable.
v2: update commit msg with diagnosis
Acked-by: Timothy Arceri <[email protected]>
Cc: "11.2 12.0" <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
Getting rid of the default case makes the compiler warn if we are missing
cases. While we're here, we also add the one missing case.
Signed-off-by: Jason Ekstrand <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
| |
glslang frequently throw bogus decorations into shaders. While we are free
to assert-fail, it's a bit nicer to the application to just warn.
Signed-off-by: Jason Ekstrand <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Previously we supported a subset of capabilities and just left a default
case for the others. It's time to stop being lazy and actually audit the
capabilities. This should bring them up-to-date with reality.
Signed-off-by: Jason Ekstrand <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We were using this briefly in the i965 driver to trigger recompiles but we
haven't been using it since we switched to the NIR y-transform lowering
pass.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|