| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reported-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a bug in code motion that occurred when the best block is the
same as the schedule early block. In this case, because we're checking
(lca != def->parent_instr->block) at the top of the loop, we never get to
the check for loop depth so we wouldn't move it out of the loop. This
commit reworks the loop to be a simple for loop up the dominator chain and
we place the (lca != def->parent_instr->block) check at the end of the
loop.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Once again, SPIR-V is insane... It allows you to place "patch"
decorations on structure members. Presumably, this is so that you can
do something such as
out struct S {
layout(location = 0) patch vec4 thing1;
layout(location = 0) vec4 thing2;
} str;
And have your I/O "nicely" organized. While this is a bit silly, it's
allowed and well-defined so whatever. Where it really gets interesting
is when you have an array of struct. SPIR-V says nothing about not
allowing you to have those qualifiers on the members of a struct that's
inside an array and GLSLang does this. Specifically, if you have
layout(location = 0) out patch struct S {
vec4 thing1;
vec4 thing2;
} str[2];
then GLSLang will place the "patch" decorations on the struct members.
This is ridiculous there is no way that having some of them be patch and
some not would be well-defined given that patch and non-patch outputs
are in effectively different storage classes. This commit moves around
the way we handle the "patch" decoration so that we can detect even the
crazy cases and handle them.
Fixes: dEQP-VK.tessellation.user_defined_io.per_patch_block_array.*
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
| |
We don't want to do anything for the other cases.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise we will end up with an extra instruction to compare the
result of the inot.
On BDW:
total instructions in shared programs: 13060620 -> 13060481 (-0.00%)
instructions in affected programs: 103379 -> 103240 (-0.13%)
helped: 127
HURT: 0
total cycles in shared programs: 256590950 -> 256587408 (-0.00%)
cycles in affected programs: 11324730 -> 11321188 (-0.03%)
helped: 114
HURT: 21
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We turn these from bcsel into inot/b2f combos in order for other
optimisation passes to get further. Once we have finished turn
the ones that remain and are used in more than a single expression
back into a bcsel.
On BDW:
total instructions in shared programs: 13060965 -> 13060297 (-0.01%)
instructions in affected programs: 835701 -> 835033 (-0.08%)
helped: 670
HURT: 2
total cycles in shared programs: 256599536 -> 256598006 (-0.00%)
cycles in affected programs: 114655488 -> 114653958 (-0.00%)
helped: 419
HURT: 240
LOST: 0
GAINED: 1
The 2 HURT is because inserting bcsel creates the only use of
const 1.0 in two shaders from tri-of-friendship-and-madness.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On BDW:
total instructions in shared programs: 13061890 -> 13061877 (-0.00%)
instructions in affected programs: 2441 -> 2428 (-0.53%)
helped: 13
HURT: 0
total cycles in shared programs: 256612254 -> 256611784 (-0.00%)
cycles in affected programs: 16418 -> 15948 (-2.86%)
helped: 10
HURT: 2
V2: don't use ffma directly
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This tries to move comparisons (a common source of boolean values)
closer to their first use. For GPUs which use condition codes,
this can eliminate a lot of temporary booleans and comparisons
which reload the condition code register based on a boolean.
V2: (Timothy Arceri)
- fix move comparision for phis so we dont end up with:
vec1 32 ssa_227 = phi block_34: ssa_1, block_38: ssa_240
vec1 32 ssa_235 = feq ssa_227, ssa_1
vec1 32 ssa_230 = phi block_34: ssa_221, block_38: ssa_235
- add nir_op_i2b/nir_op_f2b to the list of comparisons.
V3: (Timothy Arceri)
- tidy up suggested by Jason.
- add inot/fnot to move comparison list
V4: (Jason Ekstrand)
- clean up move_comparison_source
- get rid of the tuple
- rework phi handling
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]> [v1]
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
This is more correct and should also be a tiny bit faster since we're
just comparing pointers instead of calling nir_src_equal.
Reviewed-by: Timothy Arceri <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
| |
this is to avoid following compilation error on Android:
error: control may reach end of non-void function [-Werror,-Wreturn-type]
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Geometry and Tessellation stages do handle this as a system value instead.
Fixes:
dEQP-VK.geometry.basic.primitive_id
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In Vulkan, we always have both the TCS and TES available in the same
pipeline, so we can simply use the TCS OutputVertices execution mode
value as the TES PatchVertices built-in.
For GLSL, we handle this in the linker. But we could use this pass
in the case when both TCS and TES are linked together, if we wanted.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
...when the capability bit is set.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]> [v1]
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Iago suggested tidying this.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to:
- handle the extra array level for per-vertex varyings
- handle the patch qualifier correctly
- assign varying locations
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2: Use info->tess.
v3: Handle more things in either TCS/TES.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Dave Airlie <[email protected]> [v1]
Reviewed-by: Iago Toral Quiroga <[email protected]> [v1]
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Annoyingly, SPIR-V lets you specify all of these fields in either the
TCS or TES, which means that we need to be able to store all of them
for either shader stage. Putting them in a union won't work.
Combining both is an easy solution, and given that the TCS struct only
had a single field, it's pretty inexpensive.
This patch renames the combined struct to "tess" to indicate that it's
for tessellation in general, not one of the two stages.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix this build error with GCC 4.4.7.
CC nir/nir_opt_copy_prop_vars.lo
nir/nir_opt_copy_prop_vars.c: In function ‘copy_prop_vars_block’:
nir/nir_opt_copy_prop_vars.c:765: error: unknown field ‘deref’ specified in initializer
nir/nir_opt_copy_prop_vars.c:765: warning: missing braces around initializer
nir/nir_opt_copy_prop_vars.c:765: warning: (near initialization for ‘(anonymous).<anonymous>’)
nir/nir_opt_copy_prop_vars.c:765: warning: initialization from incompatible pointer type
Fixes: 62332d139c8f ("nir: Add a local variable-based copy propagation pass")
Signed-off-by: Vinson Lee <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
According to OpenGL Shading Language 4.50 spec, Section 8.7 "Vector
Relational Functions", functions of this type do not operate on scalar
types, so remove scalar types from signature definitions to make the
behavior consistent with glslangValidator and other drivers.
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Boyan Ding <[email protected]>
|
|
|
|
|
|
|
|
| |
The foreach loop was called both in the else case and right after. The
indentation seems to indicate that the extra call was from a previous
version with an else section with out curly brackets.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vtn_ssa_value() can produce variable loads, and the cursor might
be after a return statement, causing nir_builder assert failures
about not inserting instructions after a jump.
This fixes:
dEQP-VK.spirv_assembly.instruction.graphics.barrier.in_if
dEQP-VK.spirv_assembly.instruction.graphics.barrier.in_switch
Cc: "13.0 12.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
v2 (Jason):
- Use nir_spirv_supported_extensions to check if the feature is enabled.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
So far, input_reads was a bitmap tracking which vertex input locations
were being used.
In OpenGL, an attribute bigger than a vec4 (like a dvec3 or dvec4)
consumes just one location, any other small attribute. So we mark the
proper bit in inputs_read, and also the same bit in double_inputs_read
if the attribute is a dvec3/dvec4.
But in Vulkan, this is slightly different: a dvec3/dvec4 attribute
consumes two locations, not just one. And hence two bits would be marked
in inputs_read for the same vertex input attribute.
To avoid handling two different situations in NIR, we just choose the
latest one: in OpenGL, when creating NIR from GLSL/IR, any dvec3/dvec4
vertex input attribute is marked with two bits in the inputs_read bitmap
(and also in the double_inputs_read), and following attributes are
adjusted accordingly.
As example, if in our GLSL/IR shader we have three attributes:
layout(location = 0) vec3 attr0;
layout(location = 1) dvec4 attr1;
layout(location = 2) dvec3 attr2;
then in our NIR shader we put attr0 in location 0, attr1 in locations 1
and 2, and attr2 in location 3 and 4.
Checking carefully, basically we are using slots rather than locations
in NIR.
When emitting the vertices, we do a inverse map to know the
corresponding location for each slot.
v2 (Jason):
- use two slots from inputs_read for dvec3/dvec4 NIR from GLSL/IR.
v3 (Jason):
- Fix commit log error.
- Use ladder ifs and fix braces.
- elements_double is divisible by 2, don't need DIV_ROUND_UP().
- Use if ladder instead of a switch.
- Add comment about hardware restriction in 64bit vertex attributes.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2 (Jason):
- Fix indent in radv change
- Add vtn_u64_literal() helper to take 64 bits (Jason)
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
SPIR-V does not have special opcodes for DF conversions. We need to identify
them by checking the bit size of the operand and the result.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function returns the nir_op corresponding to the conversion between
the given nir_alu_type arguments.
This function lacks support for integer-based types with bit_size != 32
and for float16 conversion ops.
v2:
- Improve readiness of the code and delete cases that don't happen now (Jason)
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
v2 (Jason):
- Refactor nir_get_nir_type_for_glsl_type() to avoid using unneeded helpers (Jason)
v3:
- Use return directly (Jason)
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
_vtn_variable_copy()
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
v2 (Jason):
- Add asserts.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
double-based vecs
We need to pick two 32-bit values per component to perform the right shuffle operation.
v2 (Jason):
- Add assert to check matching bit sizes (Jason)
- Simplify the code to pick components (Jason)
v3:
- Switch on bit_size once (Jason)
- Add comment to explain the constant value for unused components (Erik)
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
v2 (Jason):
- Add assert.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
v2 (Jason):
- Add assert.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
This change also removes the now duplicate NumImages field.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On BDW:
total instructions in shared programs: 13061877 -> 13060965 (-0.01%)
instructions in affected programs: 133569 -> 132657 (-0.68%)
helped: 566
HURT: 0
total cycles in shared programs: 256611784 -> 256599536 (-0.00%)
cycles in affected programs: 861016 -> 848768 (-1.42%)
helped: 379
HURT: 73
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On BDW:
total instructions in shared programs: 13074882 -> 13068703 (-0.05%)
instructions in affected programs: 1823116 -> 1816937 (-0.34%)
helped: 4187
HURT: 537
total cycles in shared programs: 256622718 -> 256425382 (-0.08%)
cycles in affected programs: 123790120 -> 123592784 (-0.16%)
helped: 3823
HURT: 2037
total spills in shared programs: 15276 -> 14929 (-2.27%)
spills in affected programs: 9446 -> 9099 (-3.67%)
helped: 352
HURT: 1
total fills in shared programs: 20496 -> 20144 (-1.72%)
fills in affected programs: 13040 -> 12688 (-2.70%)
helped: 352
HURT: 1
LOST: 2
GAINED: 21
v2: Rely on 'a' being a well-formed boolean (Connor, Eric).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On BDW:
total instructions in shared programs: 13071119 -> 13070371 (-0.01%)
instructions in affected programs: 83424 -> 82676 (-0.90%)
helped: 505
HURT: 45 (all TCS, all hurt by a single instruction)
total cycles in shared programs: 256601322 -> 256588932 (-0.00%)
cycles in affected programs: 819410 -> 807020 (-1.51%)
helped: 450
HURT: 57
total loops in shared programs: 2950 -> 2942 (-0.27%)
loops in affected programs: 8 -> 0
helped: 7
HURT: 0
v2: Drop unnecessary 'a@bool' annotation (Connor, Eric).
Add a comment explaining the rule (Ian).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]> [v1]
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
It feels weird using GL_* enums in a Vulkan driver.
v2: Fix the TESS_SPACING -> PIPE_TESS_SPACING conversion.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The vertex order is either clockwise or counterclockwise. We can just
store a "ccw" boolean rather than GLenum values. I don't want to use
GLenums in a Vulkan driver, and even in GL a simple boolean works fine.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I apparently broke mark_whole_variable in ir_set_program_inouts.
It was passing a type that wasn't var->type, so the wrapper didn't
work out. It's all broken, revert it and start over.
Fixes all kinds of things on other drivers.
Revert "glsl: Make is_fixed_function_array actually check for varyings."
This reverts commit 42699e12711668a142b7acf11c168cf4301c1295.
Revert "glsl: Mark whole variable used for ClipDistance and TessLevel*."
This reverts commit 5c580e64cc206ab160e1767c42e4d6c81f67da4d.
Revert "glsl: Override the # of varying slots for ClipDistance and TessLevel*."
This reverts commit 8b5749f65ac434961308ccb579fb8a816e4f29d5.
Revert "glsl: Create and use a new ir_variable::count_attribute_slots() wrapper."
This reverts commit 6aa5cb34d03765b7be8611aa516bc201bd337f73.
|
|
|
|
|
|
|
|
|
|
|
| |
We can't check VARYING_SLOT_* locations until we've determined that
the variable is actually a varying.
Fixes assert failures in drivers which actually use this path,
such as radeonsi and i915.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99314
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Marking operations as redundant if they are equal to the base
range is fine when the tree structure is something like this:
max
/ \
max b
/ \
3 max
/ \
3 a
But the opt falls apart with a tree like this:
max
/ \
max max
/ \ / \
3 a b 3
The problem is that both branches are treated the same: descending in
the left branch will prune the constant, and then descending the right
branch will prune the constant there as well, because limits[0] wasn't
updated to take the change on the left branch into account, and so we
still get [3,\infty) as baserange.
In order to fix the bug we just disable the marking of redundant expressions
when they match the baserange.
NIR algebraic opt will clean up the first tree for anyway, hopefully
other backends are smart enough to do this also.
Cc: "13.0" <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
|