| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should be more efficient than the previous snprintf() solution.
But more importantly, it avoids a buffer overflow bug that could result
in crashes or unpredictable results when processing very large interface
blocks.
For the app in question, key->length = 103 for some interfaces. The check
if size >= sizeof(hash_key) was insufficient to prevent overflows of the
hash_key[128] array because it didn't account for the terminating zero.
In this case, this caused the call to hash_table_string_hash() to return
different results for identical inputs, and then shader linking failed.
This new solution also takes all structure fields into account instead
of just the first 15 when sizeof(pointer)==8.
Cc: [email protected]
Reviewed-by: Ian Romanick <[email protected]>
(cherry picked from commit 31667e6237d30188d0b29e17f5b9892f10c0d83a)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Required by the i965 driver.
v2:
- Split out the nir_builder_opcodes.h rules.
- Do not unconditionally hide the python command - use $(hide)
- Use LOCAL_EXPORT_C_INCLUDE_DIRS to manage includes for the generated
sources.
Cc: "10.5" <[email protected]>
[Emil Velikov: Split from a larger commit, v2]
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Chih-Wei Huang <[email protected]>
(cherry picked from commit 06619749a11651a50e353168c7c793082820585d)
|
|
|
|
|
|
|
|
|
|
| |
Many parts of mesa already have the include with others depending on it
but it's missing. Add it once at the top makefile and be done with it.
Cc: "10.4 10.5" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Chih-Wei Huang <[email protected]>
(cherry picked from commit 6fb801786604c270fae99c3d665dcebaa0bff3a6)
|
|
|
|
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
Cc: "10.5" <[email protected]>
(cherry picked from commit bc672e261c5f7ff56cd2b8f6b518ebfdc0163bb7)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in different fragment shaders. This also applies to a case when gl_FragCoord
is redeclared with no layout qualifiers in one fragment shader and not
declared but used in other fragment shader.
Signed-off-by: Anuj Phogat <[email protected]>
Khronos Bug#12957
Cc: "10.5" <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
(cherry picked from commit d8208312a3a200b4e6d71ce533d835b2d705234a)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch changes lowering pass to use unique name for each uniform
so that arrays from different stages cannot end up having same
name.
v2: instead of global counter, use pointer to achieve
unique name (Kenneth Graunke)
Signed-off-by: Tapani Pälli <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89590
Reviewed-by: Chris Forbes <[email protected]>
Cc: 10.5 10.4 <[email protected]>
(cherry picked from commit 3cf99701ba6c9e56c9126fdbb74107a31ffcbcfb)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The optimization done by commit 34ec1a24d did not take it into account.
Fixes:
dEQP-GLES3.functional.shaders.random.all_features.fragment.20
Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: "10.4 10.5" <[email protected]>
(cherry picked from commit b43bbfa90ace596c8b2e0b3954a5f69924726c59)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Piglit's spec/glsl-1.20/compiler/structure-and-array-operations/
array-selection.vert test contains the following code:
gl_Position = (pick_from_a_or_b ? a : b)[i];
where "a" and "b" are uniform vec4[2] variables.
ast_to_hir creates a temporary vec4[2] variable, conditional_tmp, and
generates an if-block to copy one or the other:
(declare (temporary) (array vec4 2) conditional_tmp)
(if (var_ref pick_from_a_or_b)
((assign () (var_ref conditional_tmp) (var_ref a)))
((assign () (var_ref conditional_tmp) (var_ref b))))
However, we failed to update max_array_access for "a" and "b", so it
remained 0 - here, the whole array is being accessed. At link time,
update_array_sizes() used this bogus information to change the types
of "a" and "b" to vec4[1]. We then had assignments from a vec4[1] to
a vec4[2], which is highly illegal.
This tripped assertions in nir_split_var_copies with scalar VS.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: [email protected]
(cherry picked from commit 9f1e250e77ebd9255bbd9a83bd68c9e4068c2aab)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There were some bugs, and the code was really difficult to follow. We
would optimize
min(max(x, b), 1.0) into max(sat(x), b)
but not pay attention to the order of min/max and also do
max(min(x, b), 1.0) into max(sat(x), b)
Corrects four shaders from Champions of Regnum that do
min(max(x, 1), 10)
and corrects rendering of Mass Effect under VMware Workstation.
Cc: "10.4 10.5" <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89180
Reviewed-by: Abdiel Janulgue <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
(cherry picked from commit cb25087c7bd5f1ad2515647278b32d3f07803f77)
|
|
|
|
|
|
|
|
|
|
|
|
| |
When compiling in C99 or C++11 modes, Solaris defines isnormal() as
a macro via <math.h>, which causes the function definition to become
too mangled to compile.
Signed-off-by: Alan Coopersmith <[email protected]>
Cc: "10.5" <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
(cherry picked from commit d602fbd861e2c3c5570b55f0839361a6f8bd32c7)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The macro is defined to provide a trailing ; so this caused the expansion
to end in ";;" which made the Solaris Studio compilers issue warnings for
every line of:
"builtin_type_macros.h", line 113: Warning: extra ";" ignored.
for every file that included the header, filling build logs with thousands
of useless warnings.
Signed-off-by: Alan Coopersmith <[email protected]>
Cc: "10.5" <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
(cherry picked from commit 815b3bd096a3eab9f00f9270d45a6885d73180e9)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
opt_copy_propagation and opt_copy_propagation_elements create new ACP
and Kill sets each time they enter a new control flow block. For if
blocks, they also copy the entire existing ACP set contents into the
new set.
When we exit the control flow block, we discard the new sets. However,
we weren't freeing them - so they lived on until the pass finished.
This can waste a lot of memory (57MB on one pessimal shader).
This patch makes the pass allocate ACP entries using this->acp as the
memory context, and Kill entries out of this->kill. It also steals
kill entries when moving them from the inner kill list to the parent.
It then frees the lists, including their contents.
v2: Move ralloc_free(this->acp) just before this->acp = orig_acp
(suggested by Eric Anholt).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Cc: "10.5 10.4" <[email protected]>
(cherry picked from commit 76960a55e6656bb0022e9c31ae7542010da130e3)
|
|
|
|
|
|
|
| |
Cc: "10.5" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Use nir/nir_opcodes.h as is (w/o the absolute path), as it is the target
name used to generate the actual file. Otherwise the target is missing,
the file won't get generated and the build will fail.
Cc: "10.5" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
We've probably never seen this ridiculous pattern in the wild, so it
didn't matter.
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
v2 (Ian Romanick)
- Move the check to the lexer before rallocing a copy of the large string.
Fixes the following 2 dEQP tests:
dEQP-GLES3.functional.shaders.keywords.invalid_identifiers.max_length_vertex
dEQP-GLES3.functional.shaders.keywords.invalid_identifiers.max_length_fragment
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes phi nodes whose sources all point to the same thing.
Shader-db results:
total NIR instructions in shared programs: 2045293 -> 2041209 (-0.20%)
NIR instructions in affected programs: 126564 -> 122480 (-3.23%)
helped: 615
HURT: 0
total FS instructions in shared programs: 4321840 -> 4320392 (-0.03%)
FS instructions in affected programs: 24622 -> 23174 (-5.88%)
helped: 138
HURT: 0
Reviewed-by: Jason Ekstrand <[email protected]>
Tested-by: Jason Ekstrand <[email protected]>
Signed-off-by: Connor Abbott <[email protected]>
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2 Jason Ekstrand <[email protected]>:
- Add better comments
- Use nir_ssa_dest_init and nir_src_for_ssa more places
- Fix some void * casts
v3 Jason Ekstrand <[email protected]>:
- Rework the way we determine whether or not to sccalarize a phi node to
make the recursion non-bogus
- Treat load_const instructions as scalarizable
v4 Jason Ekstrand <[email protected]>:
- Allow uniform and input loads to be scalarizable
v5 Jason Ekstrand <[email protected]>:
- Also consider loads of inputs (varying, uniform, or ubo) to be
scalarizable. We were already doing this for load_var on uniforms and
inputs.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, Mesa uses the lowering pass MOD_TO_FRACT to implement
mod(x,y) as y * fract(x/y). This implementation has a down side though:
it introduces precision errors due to the fract() operation. Even worse,
since the result of fract() is multiplied by y, the larger y gets the
larger the precision error we produce, so for large enough numbers the
precision loss is significant. Some examples on i965:
Operation Precision error
-----------------------------------------------------
mod(-1.951171875, 1.9980468750) 0.0000000447
mod(121.57, 13.29) 0.0000023842
mod(3769.12, 321.99) 0.0000762939
mod(3769.12, 1321.99) 0.0001220703
mod(-987654.125, 123456.984375) 0.0160663128
mod( 987654.125, 123456.984375) 0.0312500000
This patch replaces the current lowering pass with a different one
(MOD_TO_FLOOR) that follows the recommended implementation in the GLSL
man pages:
mod(x,y) = x - y * floor(x/y)
This implementation eliminates the precision errors at the expense of
an additional add instruction on some systems. On systems that can do
negate with multiply-add in a single operation this new implementation
would come at no additional cost.
v2 (Ian Romanick)
- Do not clone operands because when they are expressions we would be
duplicating them and that can lead to suboptimal code.
Fixes the following 16 dEQP tests:
dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.mediump_*
dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.highp_*
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes the following 2 dEQP tests:
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_const_vertex
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_const_fragment
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes the following 2 dEQP tests:
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_in_main_vertex
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_in_main_fragment
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the ?: operator's condition is a constant value, and both branches
were pure expressions, we can just make the resulting value one or the
other.
Previously, we only did this if op[1] and op[2] were also constant
values - but there's no actual reason for that restriction.
No changes in shader-db, probably because we usually optimize this later
anyway. But it does make us generate less stupid code up front.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 5998190 -> 5997603 (-0.01%)
instructions in affected programs: 54276 -> 53689 (-1.08%)
helped: 293
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 5998321 -> 5998287 (-0.00%)
instructions in affected programs: 4520 -> 4486 (-0.75%)
helped: 8
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
the search
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This allows you to match on an unknown value but only if it is of a given
type. 90% of the uses of this are for matching only booleans, but adding
the generality of arbitrary types is no more complex.
nir_algebraic.py doesn't handle this yet but that's ok because the C
language will ensure that the default type on all variables is void.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are some algebraic transformations that we want to do but only if
certain things are constants. For instance, we may want to replace
a * (b + c) with (a * b) + (a * c) as long as a and either b or c is constant.
While this generates more instructions, some of it will get constant
folded.
nir_algebraic.py doesn't handle this yet, but that's ok because the C
language will make sure that false is the default for now.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
This allows us to indicate a concept of an invalid type.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We end up with these from TGSI-to-NIR because the pass generating the
comparisons doesn't know if the arg is actually a bool input or not. vc4
results:
total instructions in shared programs: 41801 -> 41508 (-0.70%)
instructions in affected programs: 4253 -> 3960 (-6.89%)
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This will be used by tgsi_to_nir, which needs to get vec4 types for
declaring shader input/output variables.
v2: Add a missing space.
Reviewed-by: Matt Turner <[email protected]> (v2)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It now emits vector MOVs instead of a series of individual MOVs, which
should be useful to any vector backends. This pushes the problem of
src/dest aliasing of channels on a scalar chip to the backend, but if
there are any vector operations in your shader then you needed to be
handling this already.
Fixes fs-swap-problem with my scalarizing patches.
v2: Rename to insert_mov(), and add a comment about what it does.
v3: Rewrite the comment.
Reviewed-by: Connor Abbott <[email protected]> (v3)
|
|
|
|
|
| |
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit d7d340fb2f68c46bd5a0008ecf53c6693e29c916.
We have an isnormal() implementation available, the only problem was that
we had the wrong return type (fixed in a later patch).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88806
Acked-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88806
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before, we were only copying the first N channels, where N is the size
of the SSA destination, which is fine for per-component instructions,
but non-per-component instructions like fdot3 can have more source
components than destination components. Fix this using the helper
function introduced in the last patch.
v2: use new helper name
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unlike with non-SSA ALU instructions, where if they're per-component
you have to look at the writemask to know which source channels are
being used, SSA ALU instructions always have all the possible channels
enabled so we can just look at the number of components in the SSA
definition for per-component instructions to say how many source
components are being used.
v2: use new name nir_ssa_alu_instr_src_components()
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Previously, we called the abs() function in math.h. However, this involves
unnecessarily going through double. This commit changes it to use integers
directly with a ternary.
Reviewed-by: Connor Abbott <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Previously, these functions were explicitly writing to dst.x and dst.y.
However they both return only one component so writing to dst.y is invalid.
Also, since they only return one component, we don't need the explicit
assignment in the expression and can simplify it use an implicit
assignment.
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
| |
This avoids the overhead of copying structures and better matches the newly
added nir_alu_src_copy and nir_alu_dest_copy.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Connor Abbott <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a required field to the Opcode class, const_expr, that contains an
expression or statement that computes the result of the opcode given known
constant inputs. Then take those const_expr's and expand them into a function
that takes an opcode and an array of constant inputs and spits out the constant
result. This means that when adding opcodes, there's one less place to update,
and almost all the opcodes are self-documenting since the information on how to
compute the result is right next to the definition.
The helper functions in nir_constant_expressions.c were taken from
ir_constant_expressions.cpp.
v3 Jason Ekstrand <[email protected]>
- Use mako to generate one function per opcode instead of doing piles of
string splicing
v4 Jason Ekstrand <[email protected]>
- More comments and better indentation in the mako
- Add a description of the constant expression language in nir_opcodes.py
- Added nir_constant_expressions.py to EXTRA_DIST in Makefile.am
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Connor Abbott <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before, we used a system where a file, nir_opcodes.h, defined some macros that
were included to generate the enum values and the nir_op_infos structure. This
worked pretty well, but for development the error messages were never very
useful, Python tools couldn't understand the opcode list, and it was difficult
to use nir_opcodes.h to do other things like autogenerate a builder API. Now, we
store opcode information in nir_opcodes.py, and we have nir_opcodes_c.py to
generate the old nir_opcodes.c and nir_opcodes_h.py to generate nir_opcodes.h,
which contains all the enum names and gets included into nir.h like before. In
addition to solving the above problems, using Python and Mako to generate
everything means that it's much easier to add keep information centralized as we
add new things like constant propagation that require per-opcode information.
v2:
- make Opcode derive from object (Dylan)
- don't use assert like it's a function (Dylan)
- style fixes for fnoise, use xrange (Dylan)
- use iterkeys() in nir_opcodes_h.py (Dylan)
- use pydoc-style comments (Jason)
- don't make fmin/fmax commutative and associative yet (Jason)
Signed-off-by: Connor Abbott <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
v3 Jason Ekstrand <[email protected]>
- Alphabetize source file lists
- Generate nir_opcodes.h in the builddir instead of the source dir
- Include $(builddir)/src/glsl/nir in the i965 build
- Rework nir_opcodes.h generation so it generates a complete header file
instead of one that has to be embedded inside an enum declaration
|