| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
v2: Delete some stray debug code notice by Iago.
v3: Massive rebase on new ir_function_signature::intrinsic_id mechanism.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]> [v1]
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Otherwise grepping for where atomic_counter_inc and friends are defined
is a very frustrating experience.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just generate an __intrinsic_atomic_add with a negated parameter.
Some background on the non-obvious reasons for the the big change to
builtin_builder::call()... this is cribbed from some discussion with
Ilia on mesa-dev.
Why change builtin_builder::call() to allow taking dereferences and
create them here rather than just feeding in the ir_variables directly?
The problem is the neg_data ir_variable node would have to be in two
lists at the same time: the instruction stream and parameters. The
ir_variable node is automatically added to the instruction stream by the
call to make_temp. Restructuring the code so that the ir_variables
could be in parameters then move them to the instruction stream would
have been pretty terrible.
ir_call in the instruction stream has an exec_list that contains
ir_dereference_variable nodes.
The builtin_builder::call method previously took an exec_list of
ir_variables and created a list of ir_dereference_variable. All of the
original users of that method wanted to make a function call using
exactly the set of parameters passed to the built-in function (i.e.,
call __intrinsic_atomic_add using the parameters to atomicAdd). For
these users, the list of ir_variables already existed: the list of
parameters in the built-in function signature.
This new caller doesn't do that. It wants to call a function with a
parameter from the function and a value calculated in the function. So,
I changed builtin_builder::call to take a list that could either be a
list of ir_variable or a list of ir_dereference_variable. In the former
case it behaves just as it previously did. In the latter case, it uses
(and removes from the input list) the ir_dereference_variable nodes
instead of creating new ones.
text data bss dec hex filename
6036395 283160 28608 6348163 60dd83 lib64/i965_dri.so before
6036923 283160 28608 6348691 60df93 lib64/i965_dri.so after
Signed-off-by: Ian Romanick <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
text data bss dec hex filename
6036491 283160 28608 6348259 60dde3 lib64/i965_dri.so before
6036395 283160 28608 6348163 60dd83 lib64/i965_dri.so after
Signed-off-by: Ian Romanick <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This necessetated renaming the is_intrinsic field to _is_intrinsic. The
next commit will remove the field.
text data bss dec hex filename
6036507 283160 28608 6348275 60ddf3 lib64/i965_dri.so before
6036491 283160 28608 6348259 60dde3 lib64/i965_dri.so after
Signed-off-by: Ian Romanick <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
text data bss dec hex filename
6038043 283160 28608 6349811 60e3f3 lib64/i965_dri.so before
6036507 283160 28608 6348275 60ddf3 lib64/i965_dri.so after
v2: s/ir_intrinsic_atomic_sub/ir_intrinsic_atomic_counter_sub/. Noticed
by Ilia.
v3: Silence unhandled enum in switch warnings in st_glsl_to_tgsi.
Signed-off-by: Ian Romanick <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
text data bss dec hex filename
6037483 283160 28608 6349251 60e1c3 lib64/i965_dri.so before
6038043 283160 28608 6349811 60e3f3 lib64/i965_dri.so after
Signed-off-by: Ian Romanick <[email protected]>
Acked-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
st_glsl_to_tgsi only calls lower_instructions once (instead of in a
loop), so the ir_binop_carry generated would not get lowered. Fixes
assertion failure
state_tracker/st_glsl_to_tgsi.cpp:2265: void glsl_to_tgsi_visitor::visit_expression(ir_expression*, st_src_reg*): Assertion `!"Invalid ir opcode in glsl_to_tgsi_visitor::visit()"' failed.
on softpipe in 16 piglit tests:
mesa_shader_integer_functions/execution/built-in-functions/fs-imulExtended-nonuniform.shader_test
mesa_shader_integer_functions/execution/built-in-functions/fs-imulExtended-only-msb-nonuniform.shader_test
mesa_shader_integer_functions/execution/built-in-functions/fs-imulExtended-only-msb.shader_test
mesa_shader_integer_functions/execution/built-in-functions/fs-imulExtended.shader_test
mesa_shader_integer_functions/execution/built-in-functions/fs-umulExtended-nonuniform.shader_test
mesa_shader_integer_functions/execution/built-in-functions/fs-umulExtended-only-msb-nonuniform.shader_test
mesa_shader_integer_functions/execution/built-in-functions/fs-umulExtended-only-msb.shader_test
mesa_shader_integer_functions/execution/built-in-functions/fs-umulExtended.shader_test
mesa_shader_integer_functions/execution/built-in-functions/vs-imulExtended-nonuniform.shader_test
mesa_shader_integer_functions/execution/built-in-functions/vs-imulExtended-only-msb-nonuniform.shader_test
mesa_shader_integer_functions/execution/built-in-functions/vs-imulExtended-only-msb.shader_test
mesa_shader_integer_functions/execution/built-in-functions/vs-imulExtended.shader_test
mesa_shader_integer_functions/execution/built-in-functions/vs-umulExtended-nonuniform.shader_test
mesa_shader_integer_functions/execution/built-in-functions/vs-umulExtended-only-msb-nonuniform.shader_test
mesa_shader_integer_functions/execution/built-in-functions/vs-umulExtended-only-msb.shader_test
mesa_shader_integer_functions/execution/built-in-functions/vs-umulExtended.shader_test
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit fc03ecfeaf5a10a8b84d366f24f02e74ab03b145.
Chad had already pushed the same change between me posting the patch and Jason
pushing it: 44bcf1ffcced04fd7f2b (".gitignore: Ignore src/compiler/spirv2nir")
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
This fixes an uninitialized warning for is_vertex_input.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
This just translates to the correct cull distance slot.
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
We need these for spir-v/nir shaders.
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we were saving off the last nir_block in a vtn_block before
moving on so that we could find the nir_block again when it came time to
handle phi sources. Unfortunately, NIR's control flow modification code is
inconsistent when it comes to how it splits blocks so the block pointer we
saved off may point to a block somewhere else in the shader by the time we
get around to handling phi sources. In order to get around this, we insert
a nop instruction and use that as the logical end of our block. Since the
control flow manipulation code respects instructions, the nop will keeps
its place like any other instruction and we can easily find the end of our
block when we need it.
This fixes a bug triggered by a couple of vkQuake shaders.
Signed-off-by: Jason Ekstrand <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97233
Cc: "12.0" <[email protected]>
Tested-by: Dave Airlie <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This intrinsic has no destination, no sources, no variables, and can be
eliminated. In other words, it does nothing and will always get deleted by
dead code elimination. However, it does provide a quick-and-easy way to
temporarily tag a particular location in a NIR shader.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
While the current CFG code is valid in the case where a switch break also
happens to be a loop continue, it's a bit suboptimal. Since hardware is
capable of handling the continue as a direct jump, it's better to use a
continue instruction when we can than to bother with all of the nasty
switch break lowering.
Signed-off-by: Jason Ekstrand <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is possible that the break block of a switch is actually the continue of
the loop containing the switch. In this case, we need to identify the
break block as a continue and break out of current level of CFG handling.
If we don't, the continue portion of the loop will get handled twice, once
by following after the break and a second time by the loop handling code
handling it explicitly.
This fixes 6 of the new Vulkan CTS tests:
- dEQP-VK.spirv_assembly.instruction.graphics.opphi.out_of_order*
- dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order*
Signed-off-by: Jason Ekstrand <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
CovID: 1373369
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Acked-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Acked-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Acked-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Acked-by: Dave Airlie <[email protected]>
|
|
|
|
| |
Acked-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I found this in a shader that was doing an alpha test when alpha is fixed
at 1.0.
v2: Rebase on master (now the const value is "u32" not "u").
Reviewed-by: Jason Ekstrand <[email protected]> (v1)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This code provides for an on-disk cache of objects. Objects are stored
and retrieved via names that are arbitrary 20-byte sequences,
(intended to be SHA-1 hashes of something identifying for the
content). The directory used for the cache can be specified by means
of environment variables in the following priority order:
$MESA_GLSL_CACHE_DIR
$XDG_CACHE_HOME/mesa
<user-home-directory>/.cache/mesa
By default the cache will be limited to a maximum size of 1GB. The
environment variable:
$MESA_GLSL_CACHE_MAX_SIZE
can be set (at the time of GL context creation) to choose some other
size. This variable is a number that can optionally be followed by
'K', 'M', or 'G' to select a size in kilobytes, megabytes, or
gigabytes. By default, an unadorned value will be interpreted as
gigabytes.
The cache will be entirely disabled at runtime if the variable
MESA_GLSL_CACHE_DISABLE is set at the time of GL context creation.
Many thanks to Kristian Høgsberg <[email protected]> for the initial
implementation of code that led to this patch. In particular, the idea
of using an mmapped file, (indexed by a portion of the SHA-1), for the
efficent implementation of cache_has_key was entirely his
idea. Kristian also provided some very helpful advice in discussions
regarding various race conditions to be avoided in this code.
Signed-off-by: Timothy Arceri <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
At this point in the code, s must be visit_continue. If the child
returned visit_stop, visit_stop is the only correct thing to return.
Found by inspection.
Signed-off-by: Ian Romanick <[email protected]>
Cc: [email protected]
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Otherwise extensions to 1.40 that are only for core profile won't work.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
max_unroll_iterations was moved into options a long, long time ago.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
| |
This makes link_assign_uniform_locations() easier to follow.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
This makes link_assign_uniform_locations() easier to follow.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Otherwise we can end up with mismatching names between the cached
binary and the cached metadata.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As part of the shader-cache work an upcoming change will add new
references to _mesa_add_parameter and _mesa_new_parameter_list from
the glsl code. To prepare for that, and to allow the standalone
glsl_compiler to still link, here we add mesa/program/prog_parameter.c
to the libglsl_util sources.
Then, in order to get *that* to work, we also add to stubs to
standalone_scaffolding:
_mesa_program_state_flags
_mesa_program_state_string
These functions aren't actually used by the two functions in
prog_parameter.c that we are actually calling. They are used in other
functions in the same file. So we don't care what the implementation
of these stubs is, (they won't be called by glsl_compiler). We just
need the stubs present so that it can link.
Signed-off-by: Timothy Arceri <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
This reverts commit f5a6aab4031bc4754756c1773411728ad9a73381.
This broke some tests. It seems gl_transform_feedback_info gets memset
to 0 so we were losing the values in BufferStride before we used them.
|
|
|
|
|
|
|
|
|
|
| |
Now that we generate built-in functions inline, there's no need to link
against the built-in shader, and no built-in prototypes to consider.
This lets us delete a bunch of code.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by; Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
This is now handled directly by ast_function.cpp.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by; Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the past, we imported the prototypes of built-in functions, generated
calls to those, and waited until link time to resolve the calls and
import the actual code for the built-in functions.
This severely limited our compile-time optimization opportunities: even
trivial functions like dot() were represented as function calls. We
also had no way of reasoning about those calls; they could have been
1,000 line functions with side-effects for all we knew.
Practically all built-in functions are trivial translations to
ir_expression opcodes, so it makes sense to just generate those inline.
Since we eventually inline all functions anyway, we may as well just do
it for all built-in functions.
There's only one snag: built-in functions that refer to built-in global
variables need those remapped to the variables in the shader being
compiled, rather than the ones in the built-in shader. Currently,
ftransform() is the only function matching those criteria, so it seemed
easier to just make it a special case.
On Skylake:
total instructions in shared programs: 12023491 -> 12024010 (0.00%)
instructions in affected programs: 77595 -> 78114 (0.67%)
helped: 97
HURT: 309
total cycles in shared programs: 137239044 -> 137295498 (0.04%)
cycles in affected programs: 16714026 -> 16770480 (0.34%)
helped: 4663
HURT: 4923
while these statistics are in the wrong direction, the number of
hurt programs is small (309 / 41282 = 0.75%), and I don't think
anything can be done about it. A change like this significantly
alters the order in which optimizations are performed.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by; Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We want to check prior to optimization - otherwise we might fail to
detect cases where barrier() is in control flow which is always taken
(and therefore gets optimized away).
We don't currently loop unroll if there are function calls inside;
otherwise we might have a problem detecting barrier() in loops that
get unrolled as well.
Tapani's switch handling code adds a loop around switch statements, so
even with the mess of if ladders, we'll properly reject it.
Enforcing these rules at compile time makes more sense more sense than
link time. Doing it at ast-to-hir time (rather than as an IR pass)
allows us to emit an error message with proper line numbers.
(Otherwise, I would have preferred the IR pass...)
Fixes spec/arb_tessellation_shader/compiler/barrier-switch-always.tesc.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by; Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
It makes more sense to have this here where we store the other values
from xfb qualifiers. The struct it was previously part of is now only
used to store values that come from the api.
Reviewed-by: Alejandro Piñeiro <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The expectation is that drivers will set this based on
OES_geometry_shader and ARB_viewport_array support. This is a separate
enable on the same reasoning as for OES_texture_cube_map_array.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
| |
OpAtomicLoad/Store should have pointer to images just like the rest of the
atomic operators. These couple of lines were poorly copied from the
ssbo/shared_vars cases (the only ones currently tests by the CTS).
Fixes 2afb950161f8 ("spirv/nir: Add support for OpAtomicLoad/Store")
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
VC4 was running into a major performance regression from enabling control
flow in the glmark2 conditionals test, because of short if statements
containing an ffract.
This pass seems like it was was trying to ensure that we only flattened
IFs that should be entirely a win by guaranteeing that there would be
fewer bcsels than there were MOVs otherwise. However, if the number of
ALU ops is small, we can avoid the overhead of branching (which itself
costs cycles) and still get a win, even if it means moving real
instructions out of the THEN/ELSE blocks.
For now, just turn on aggressive flattening on vc4. i965 will need some
tuning to avoid regressions. It does looks like this may be useful to
replace freedreno code.
Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47
fps to 95 fps on vc4.
vc4 shader-db:
total instructions in shared programs: 101282 -> 99543 (-1.72%)
instructions in affected programs: 17365 -> 15626 (-10.01%)
total uniforms in shared programs: 31295 -> 31172 (-0.39%)
uniforms in affected programs: 3580 -> 3457 (-3.44%)
total estimated cycles in shared programs: 225182 -> 223746 (-0.64%)
estimated cycles in affected programs: 26085 -> 24649 (-5.51%)
v2: Update shader-db output.
Reviewed-by: Ian Romanick <[email protected]> (v1)
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|