| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pulls out the i965 IR definitions into a separate file and leaves
the top-level backend_shader structure and back-end compiler entry
points in brw_shader.h. The purpose is to keep things tidy and
prevent a nasty circular dependency between brw_cfg.h and
brw_shader.h. The logical dependency between these data structures
looks like:
backend_shader (brw_shader.h) -> cfg_t (brw_cfg.h)
-> bblock_t (brw_cfg.h) -> backend_instruction (brw_shader.h)
This circular header dependency is currently resolved by using forward
declarations of cfg_t/bblock_t in brw_shader.h and having brw_cfg.h
include brw_shader.h, which seems backwards and won't work at all when
the forward declarations of cfg_t/bblock_t are no longer sufficient in
a future commit.
Reviewed-by: Matt Turner <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Our current GPGPU_WALKER code only supports up to 64 threads.
On HSW we could use up to 70 and TGL up to 112, but only if the walker
is adjusted so the width does not exceed 64. Work to support this is
in progress.
Previous to this change, we might try to downgrade to SIMD8 if the
SIMD16 shader spilled. Since HSW and TGL have the max number of
threads above 64, we would then try to emit an invalid GPGPU walker
command.
Fixes: 932045061b5 ("i965/cs: Emit compute shader code and upload programs")
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Paulo Zanoni <[email protected]>
Tested-by: Paulo Zanoni <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This option will be used later by GLSL, so move to a common struct.
Because nir_options is filled in the compiler instead of the Vulkan
driver, fix that up. GLSL will ignore that for now.
Reviewed-by: Jason Ekstrand <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3913>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3913>
|
|
|
|
|
|
|
|
| |
This is a query that both Intel and freedreno need to do. We can simplify
it a lot with the new glsl_get_sampler_dim_coordinate_components()
Reviewed-by: Kenneth Graunke <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3728>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The other source of the multiply will be interpreted as a uint32_t in an
XOR instruction. Any source modifiers with either not be interpreted at
all or will be misinterpreted due to the differing types.
If the other operand of the multiplication has a source modifier, just
emit an extra move to resolve the source modifiers.
The negation source modifier problem is difficult to reproduce due to an
algebraic optimization that changes (-a*b) to -(a*b). However, changes
in MR !1359 push the negations back down.
On Gen7+ it might be possible to do slightly better for an abs() source
modifier by using BFI2 as a glorified copysign().
On Gen8+ it might be possible to do slightly better for a neg() source
modifier by emitting (~a ^ b).
There were no shader-db changes on any Intel platform, so I think we can
deal with that problem when it arises.
See also piglit!224.
Fixes: 06d2c116415 ("intel/fs: Add a scale factor to emit_fsign")
Reviewed-by: Matt Turner <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3780>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3780>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
../src/intel/compiler/brw_nir_analyze_ubo_ranges.c:316:4: runtime error: null pointer passed as argument 1, which is declared to never be null
#0 0x7f78f5916611 in brw_nir_analyze_ubo_ranges ../src/intel/compiler/brw_nir_analyze_ubo_ranges.c:316
#1 0x7f78f255c189 in brw_codegen_wm_prog ../src/mesa/drivers/dri/i965/brw_wm.c:97
#2 0x7f78f2565571 in brw_fs_precompile ../src/mesa/drivers/dri/i965/brw_wm.c:608
#3 0x7f78f24edd2c in brw_shader_precompile ../src/mesa/drivers/dri/i965/brw_link.cpp:56
#4 0x7f78f24f3af8 in brw_link_shader ../src/mesa/drivers/dri/i965/brw_link.cpp:381
#5 0x7f78f39a302a in _mesa_glsl_link_shader ../src/mesa/program/ir_to_mesa.cpp:3119
#6 0x7f78f3a43826 in create_new_program ../src/mesa/main/ff_fragment_shader.cpp:1133
#7 0x7f78f3a43d00 in _mesa_get_fixed_func_fragment_program ../src/mesa/main/ff_fragment_shader.cpp:1163
#8 0x7f78f325ddcd in update_program ../src/mesa/main/state.c:134
#9 0x7f78f325fe64 in _mesa_update_state_locked ../src/mesa/main/state.c:360
#10 0x7f78f32600f1 in _mesa_update_state ../src/mesa/main/state.c:394
#11 0x7f78f2b3e587 in clear ../src/mesa/main/clear.c:169
#12 0x7f78f2b3e587 in _mesa_Clear ../src/mesa/main/clear.c:242
Signed-off-by: Danylo Piliaiev <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3825>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
../src/intel/compiler/brw_fs.cpp:2247:46: runtime error: variable length array bound evaluates to non-positive value 0
#0 0x7f78f5697678 in fs_visitor::assign_constant_locations() ../src/intel/compiler/brw_fs.cpp:2247
#1 0x7f78f571d29e in fs_visitor::optimize() ../src/intel/compiler/brw_fs.cpp:7361
#2 0x7f78f574eb84 in fs_visitor::run_fs(bool, bool) ../src/intel/compiler/brw_fs.cpp:8022
#3 0x7f78f575641b in brw_compile_fs ../src/intel/compiler/brw_fs.cpp:8408
#4 0x7f78f255c8e4 in brw_codegen_wm_prog ../src/mesa/drivers/dri/i965/brw_wm.c:123
#5 0x7f78f2565571 in brw_fs_precompile ../src/mesa/drivers/dri/i965/brw_wm.c:608
#6 0x7f78f24edd2c in brw_shader_precompile ../src/mesa/drivers/dri/i965/brw_link.cpp:56
#7 0x7f78f24f3af8 in brw_link_shader ../src/mesa/drivers/dri/i965/brw_link.cpp:381
#8 0x7f78f39a302a in _mesa_glsl_link_shader ../src/mesa/program/ir_to_mesa.cpp:3119
#9 0x7f78f3a43826 in create_new_program ../src/mesa/main/ff_fragment_shader.cpp:1133
#10 0x7f78f3a43d00 in _mesa_get_fixed_func_fragment_program ../src/mesa/main/ff_fragment_shader.cpp:1163
#11 0x7f78f325ddcd in update_program ../src/mesa/main/state.c:134
#12 0x7f78f325fe64 in _mesa_update_state_locked ../src/mesa/main/state.c:360
#13 0x7f78f32600f1 in _mesa_update_state ../src/mesa/main/state.c:394
#14 0x7f78f2b3e587 in clear ../src/mesa/main/clear.c:169
#15 0x7f78f2b3e587 in _mesa_Clear ../src/mesa/main/clear.c:242
Signed-off-by: Danylo Piliaiev <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3825>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
../src/intel/compiler/brw_nir.c:979:40: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
#0 0x7f78f590d10b in brw_nir_apply_sampler_key ../src/intel/compiler/brw_nir.c:979
#1 0x7f78f590e07b in brw_nir_apply_key ../src/intel/compiler/brw_nir.c:1057
#2 0x7f78f5754b45 in brw_compile_fs ../src/intel/compiler/brw_fs.cpp:8347
#3 0x7f78f255c8e4 in brw_codegen_wm_prog ../src/mesa/drivers/dri/i965/brw_wm.c:123
#4 0x7f78f2565571 in brw_fs_precompile ../src/mesa/drivers/dri/i965/brw_wm.c:608
#5 0x7f78f24edd2c in brw_shader_precompile ../src/mesa/drivers/dri/i965/brw_link.cpp:56
#6 0x7f78f24f3af8 in brw_link_shader ../src/mesa/drivers/dri/i965/brw_link.cpp:381
#7 0x7f78f39a302a in _mesa_glsl_link_shader ../src/mesa/program/ir_to_mesa.cpp:3119
#8 0x7f78f3a43826 in create_new_program ../src/mesa/main/ff_fragment_shader.cpp:1133
#9 0x7f78f3a43d00 in _mesa_get_fixed_func_fragment_program ../src/mesa/main/ff_fragment_shader.cpp:1163
#10 0x7f78f325ddcd in update_program ../src/mesa/main/state.c:134
#11 0x7f78f325fe64 in _mesa_update_state_locked ../src/mesa/main/state.c:360
#12 0x7f78f32600f1 in _mesa_update_state ../src/mesa/main/state.c:394
#13 0x7f78f2b3e587 in clear ../src/mesa/main/clear.c:169
#14 0x7f78f2b3e587 in _mesa_Clear ../src/mesa/main/clear.c:242
Signed-off-by: Danylo Piliaiev <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3825>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The interpretation of the fields is different depending whether the
instruction is a SEND/MATH or not.
This fixes the disassembly output for non-SEND/MATH instructions that
have both in-order and out-of-order dependencies. Their dependencies
were wrongly represented as `@A $B` when the correct would be `@A
$B.dst`.
Fixes: 6154cdf924f ("intel/eu/gen12: Add auxiliary type to represent SWSB information during codegen.")
Fixes: 83612c01271 ("intel/disasm/gen12: Disassemble software scoreboard information.")
Acked-by: Francisco Jerez <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3660>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3660>
|
|
|
|
|
|
|
|
|
|
|
| |
At this point this simply involves fixing the initialization of the
sample mask flag register to take the right dispatch mask from the
thread payload, and fixing sample_mask_reg() to return f1.1 for the
second half of a SIMD32 thread. This improves Manhattan 3.1
performance by 2.4%±0.31% (N>40) on my ICL with SIMD32 enabled
relative to falling back to SIMD16 for the shaders that use discard.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
In SIMD32 programs that don't use discard, the upper 16 bits of the UD
result of sample_mask_reg() don't contain the sample mask of the upper
16 channels as one would expect. Stop pretending we are returning a
valid 32-bit mask.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
FIND_LIVE_CHANNEL was using f1.0-f1.1 as temporary flag register on
Gen7, instead use f0.0-f0.1. In order to avoid collision with the
discard sample mask, move the latter to f1.0-f1.1. This makes room
for keeping track of the sample mask of the second half of SIMD32
programs that use discard.
Note that some MOVs of the sample mask into f1.0 become redundant now
in lower_surface_logical_send() and lower_a64_logical_send().
Reviewed-by: Kenneth Graunke <[email protected]>x
|
|
|
|
|
|
|
|
|
| |
Use it instead of hard-coding f0.1 for the sample mask of programs
that use discard. This will make the task easier when we replace f0.1
with another flag register location in order to support discard with
SIMD32 shaders.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
It's only really useful there. This will avoid confusion with another
helper with a similar purpose I'm about to introduce that will be
useful in multiple files from the FS back-end.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SIMD32.
The SIMD8 dual-source blending framebuffer write messages seem to have
trouble releasing the pixel scoreboard dependency in SIMD32 dispatch
mode, which leads to hangs. I have a better workaround for this which
doesn't involve disabling SIMD32 when dual-source blending is enabled,
but I'm still investigating some issues with it. Limit the dispatch
width to SIMD16 in such cases for the moment in order to make the CI
happy on ICL with SIMD32 fragment shaders enabled.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the "Source0 Alpha Present to RenderTarget" bit of the RT
write message header is derived from brw_wm_prog_data::replicate_alpha.
However the src0_alpha payload is provided anytime it's specified to
the logical message. This could theoretically lead to an
inconsistency if somebody provided a src0_alpha value while
brw_wm_prog_data::replicate_alpha was false, as I'm planning to do in
a future commit in order to implement a hardware workaround.
Instead calculate the header bit based on whether a src0_alpha value
was provided to the logical message, which guarantees the same
behavior on pre-ICL and ICL+ (the latter used an extended descriptor
bit for this which didn't suffer from the same issue). Remove the
brw_wm_prog_data::replicate_alpha flag.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
control flow.
Together with the fixup_nomask_control_flow() pass introduced in a
previous patch, this implements a less invasive alternative to the
workaround documented in the hardware spec for GEN:BUG:1407528679,
which doesn't involve disabling structured control flow.
Under some conditions Gen12 hardware can end up executing a BB with
all channels disabled, which will lead to the execution of any NoMask
instructions in it, even though any execution-masked instructions will
be correctly shot down. This could break assumptions of the SWSB pass
if the data computed by a NoMask instruction is synchronized against
by using an SWSB annotation baked into a regular execution-masked
instruction, since the first (NoMask) instruction may be executed
redundantly by the hardware, even though the second will correctly be
shot down, potentially leading to a RaW or WaW hazard if a third
instruction subsequently accesses the destination register of the
first instruction.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Cc: 20.0 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Found by inspection. Existing code was trying to avoid assuming that
an SBID had been assigned to the virtual instruction, but
synchronizing the header setup with respect to the previous SIMD16
SEND by using SYNC.ALLRD doesn't really seem possible unless the SEND
instruction had been assigned an SBID. Assert-fail instead if no SBID
has been allocated.
Fixes: 15e3a0d9d264becc "intel/eu/gen12: Set SWSB annotations in hand-crafted assembly."
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Cc: 20.0 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
control flow.
This is a less invasive alternative to the workaround documented in
the hardware spec for GEN:BUG:1407528679, which doesn't involve
disabling structured control flow (it's unlikely that switching to
GOTO/JOIN would have actually fixed the problem anyway).
Under some conditions Gen12 hardware can end up executing a BB with
all channels disabled, which will lead to the execution of any NoMask
instructions in it, even though any execution-masked instructions will
be correctly shot down. This may break assumptions of some NoMask
SEND messages whose descriptor depends on data generated by live
invocations of the shader.
This avoids the problem by predicating certain instructions on an ANY
horizontal predicate that makes sure that their execution is omitted
when all channels of the program are disabled. The shader-db impact
of this patch seems to be minimal:
total instructions in shared programs: 17169833 -> 17169913 (0.00%)
instructions in affected programs: 30663 -> 30743 (0.26%)
helped: 0
HURT: 42
total cycles in shared programs: 336966176 -> 336968568 (0.00%)
cycles in affected programs: 2367290 -> 2369682 (0.10%)
helped: 0
HURT: 13
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Cc: 20.0 <[email protected]>
|
|
|
|
|
|
|
| |
register.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Cc: 20.0 <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We need to pass a width of 32 since the opcode bashes the whole f1.0
register on IVB. This is unlikely to have caused problems since f1.0
is largely unused currently. That's likely to change soon though,
even on platforms other than Gen7.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Cc: 20.0 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Found by inspection. This seems particularly likely to cause problems
with instructions dependent on the current execution mask like
SHADER_OPCODE_FIND_LIVE_CHANNEL or the FS_OPCODE_LOAD_LIVE_CHANNELS
instruction I'm about to introduce, but one could imagine it leading
to data corruption if CSE ever managed to combine two instructions
before and after the FS_OPCODE_PLACEHOLDER_HALT, since the one before
may not be executed for some channels.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Cc: 20.0 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes valgrind errors introduced since commit a8ec4082.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2346
Fixes: a8ec4082 ("nir+vtn: vec8+vec16 support")
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3691>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3691>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This means you can directly use format utils on it without having to have
your own GL enum to number-of-components switch statement (or whatever) in
your vulkan backend.
Thanks to imirkin for fixing up the nouveau driver (and a couple of core
details).
This fixes the computed qualifiers for EXT_shader_image_load_store's
non-integer sizeNxM qualifiers, which we don't have tests for.
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]> (v3d)
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3355>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3355>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Integer instructions don't coissue. Before e64be391dd0
("intel/compiler: generalize the combine constants pass"), this pass
only looked at float sources. There's no shader-db data in that commit,
so I collected some. The results are not good:
Haswell
total instructions in shared programs: 11898805 -> 11908127 (0.08%)
instructions in affected programs: 1218680 -> 1228002 (0.76%)
helped: 2
HURT: 5171
helped stats (abs) min: 12 max: 111 x̄: 61.50 x̃: 61
helped stats (rel) min: 1.59% max: 9.20% x̄: 5.40% x̃: 5.40%
HURT stats (abs) min: 1 max: 311 x̄: 1.83 x̃: 1
HURT stats (rel) min: 0.02% max: 9.91% x̄: 1.05% x̃: 0.70%
95% mean confidence interval for instructions value: 1.55 2.05
95% mean confidence interval for instructions %-change: 1.02% 1.08%
Instructions are HURT.
total cycles in shared programs: 221664974 -> 221404750 (-0.12%)
cycles in affected programs: 120012620 -> 119752396 (-0.22%)
helped: 3464
HURT: 3159
helped stats (abs) min: 1 max: 428160 x̄: 314.55 x̃: 16
helped stats (rel) min: <.01% max: 57.33% x̄: 3.40% x̃: 1.28%
HURT stats (abs) min: 1 max: 87846 x̄: 262.54 x̃: 14
HURT stats (rel) min: <.01% max: 85.57% x̄: 3.01% x̃: 0.77%
95% mean confidence interval for cycles value: -224.23 145.65
95% mean confidence interval for cycles %-change: -0.50% -0.19%
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 9804 -> 10047 (2.48%)
spills in affected programs: 6869 -> 7112 (3.54%)
helped: 2
HURT: 41
total fills in shared programs: 19863 -> 20319 (2.30%)
fills in affected programs: 17428 -> 17884 (2.62%)
helped: 2
HURT: 41
LOST: 20
GAINED: 13
This also prevents regressions in "intel/fs: Promote integer constants
after lowering integer multiplication" (note: that patch will probably
not be committed). When the passes are reorderd, code like
mul(8) acc0<1>D g9<8,8,1>D -2078209981D { align1 1Q };
gets turned into
mov(1) g23<1>D 2078209981D { align1 WE_all 1N };
...
mul(8) acc0<1>D g13<8,8,1>D -g23<0,1,0>D { align1 1Q compacted };
It's not 100% clear why, but these produce different results. Note that
-2078209981 & 0x0ffff = 0x0843, and -(2078209981 & 0x0ffff) =
0xffff0843. It seems like the upper 16-bits of the negation should be
ignored.
Fixes: e64be391dd0 ("intel/compiler: generalize the combine constants pass")
Cc: Iago Toral Quiroga <[email protected]>
Suggested-by: Matt Turner <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
The shaders with spills or fills hurt are the usual suspects. A couple
compute shaders in Dirt Showdown and a compute shader in Bioshock
Infinite. On Haswell, a compute shader (that appears twice in
shader-db) from Aztec Ruins was also hurt for spill and fills.
Haswell
total instructions in shared programs: 11573934 -> 11568335 (-0.05%)
instructions in affected programs: 828623 -> 823024 (-0.68%)
helped: 2825
HURT: 6
helped stats (abs) min: 1 max: 134 x̄: 2.16 x̃: 1
helped stats (rel) min: 0.02% max: 9.05% x̄: 0.84% x̃: 0.61%
HURT stats (abs) min: 1 max: 216 x̄: 81.83 x̃: 56
HURT stats (rel) min: 0.16% max: 8.65% x̄: 4.21% x̃: 4.68%
95% mean confidence interval for instructions value: -2.31 -1.64
95% mean confidence interval for instructions %-change: -0.85% -0.80%
Instructions are helped.
total cycles in shared programs: 187573593 -> 187004633 (-0.30%)
cycles in affected programs: 82816107 -> 82247147 (-0.69%)
helped: 2186
HURT: 1741
helped stats (abs) min: 1 max: 35230 x̄: 326.96 x̃: 16
helped stats (rel) min: <.01% max: 46.11% x̄: 3.11% x̃: 0.90%
HURT stats (abs) min: 1 max: 6138 x̄: 83.73 x̃: 16
HURT stats (rel) min: <.01% max: 104.11% x̄: 2.73% x̃: 0.75%
95% mean confidence interval for cycles value: -197.13 -92.64
95% mean confidence interval for cycles %-change: -0.72% -0.33%
Cycles are helped.
total spills in shared programs: 7870 -> 7743 (-1.61%)
spills in affected programs: 2260 -> 2133 (-5.62%)
helped: 31
HURT: 5
total fills in shared programs: 6320 -> 6263 (-0.90%)
fills in affected programs: 3547 -> 3490 (-1.61%)
helped: 31
HURT: 6
LOST: 9
GAINED: 9
Ivybridge
total instructions in shared programs: 11863372 -> 11859793 (-0.03%)
instructions in affected programs: 757183 -> 753604 (-0.47%)
helped: 2236
HURT: 3
helped stats (abs) min: 1 max: 81 x̄: 1.86 x̃: 1
helped stats (rel) min: 0.03% max: 5.26% x̄: 0.74% x̃: 0.48%
HURT stats (abs) min: 11 max: 301 x̄: 192.33 x̃: 265
HURT stats (rel) min: 1.55% max: 10.51% x̄: 6.89% x̃: 8.62%
95% mean confidence interval for instructions value: -2.01 -1.18
95% mean confidence interval for instructions %-change: -0.77% -0.70%
Instructions are helped.
total cycles in shared programs: 178377378 -> 177946087 (-0.24%)
cycles in affected programs: 76261390 -> 75830099 (-0.57%)
helped: 1635
HURT: 1395
helped stats (abs) min: 1 max: 34796 x̄: 333.53 x̃: 16
helped stats (rel) min: <.01% max: 47.15% x̄: 2.82% x̃: 0.64%
HURT stats (abs) min: 1 max: 4315 x̄: 81.74 x̃: 18
HURT stats (rel) min: <.01% max: 49.98% x̄: 1.99% x̃: 0.53%
95% mean confidence interval for cycles value: -197.06 -87.62
95% mean confidence interval for cycles %-change: -0.78% -0.43%
Cycles are helped.
total spills in shared programs: 4188 -> 4182 (-0.14%)
spills in affected programs: 1557 -> 1551 (-0.39%)
helped: 30
HURT: 3
total fills in shared programs: 5056 -> 5245 (3.74%)
fills in affected programs: 2708 -> 2897 (6.98%)
helped: 30
HURT: 3
LOST: 5
GAINED: 1
No shader-db changes on any other Intel platform.
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3544>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3544>
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a hang in the following Vulkan CTS test on TGL-LP:
dEQP-VK.descriptor_indexing.storage_buffer_dynamic_in_loop
Cc: [email protected]
Reviewed-by: Kenneth Graunke <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3642>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3642>
|
|
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3642>
|
|
|
|
|
|
|
|
|
|
| |
Otherwise, the validator tries to read the type of src1 of a SEND/SENDS
which doesn't actually have a type field. This prevents validation
issues in the next commit.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3642>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: (Francisco Jerez)
- Drop vec4 changes.
- Handle explicit acc0 operand and implicit one.
- Make sure instruction is SIMD16, prediction is off and default mask
control set to true.
v3: (Francisco Jerez)
- Clear accumulator only when it's written.
- Use BRW_MASK_DISABLE instead of true.
- Use correct width for brw_acc_reg().
- Fix last_inst_offset.
v4: (Francisco Jerez)
- Don't check for last instruction for accummulator write.
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3376>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3376>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gen12 does not support RENDER_SURFACE_STATE::SurfaceArray = true &&
RENDER_SURFACE_STATE::Depth = 0. SurfaceArray can only be set to true
if Depth >= 1.
We workaround this limitation by adding the max(value, 1) snippet in
the shaders on the 3 components for texture array sizes.
Tested on Gen9 with the following Vulkan CTS tests :
dEQP-VK.image.image_size.2d_array.*
v2: Drop debug print (Tapani)
Switch to GEN:BUG instead of Wa_
v3: Fix dEQP-VK.image.image_size.1d_array.* cases (Lionel)
v4: Fix dEQP-VK.glsl.texture_functions.query.texturesize.* cases
(Missing tex_op handling) (Lionel)
v5: Missing break statement (Lionel)
v6: Fixup comment (Tapani)
v7: Fixup comment again (Tapani)
v8: Don't use sample_dim as index (Jason)
Rename pass
Simplify control flow
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]> (v7)
Reviewed-by: Jason Ekstrand <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3362>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3362>
|
|
|
|
|
|
|
|
| |
Fixes: 58907568ec5 ("intel/fs: Add SHADER_OPCODE_[IU]SUB_SAT pseudo-ops")
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3558>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3558>
|
|
|
|
|
|
|
|
| |
Instead of emitting g[a0]UD for the indirect descriptor, emit a0<0>UD.
This is more correct because there is no GRF involved.
Reviewed-by: Kenneth Graunke <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3547>
|
|
|
|
|
|
|
|
|
|
| |
The instruction encoding for SENDS changed on Gen12 and it now supports
embedding the entire extended message descriptor in the instruction if
it's an immediate. Stop falling back to doing an indirect SEND just
because we had something in [15:12] of ex_desc.ud.
Reviewed-by: Kenneth Graunke <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3547>
|
|
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3499>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3499>
|
|
|
|
|
|
|
|
|
|
|
| |
v2: Use new lower_hadd64 and lower_usub_sat64 flags.
v3: Enable SPIR-V capability.
v4: Move lowering options to COMMON_SCALAR_OPTIONS. Suggested by Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: Remove smashing type to D for nir_op_irhadd. Caio noticed it was
odd, and removing it fixes an assertion failure in the crucible
func.shader.averageRounded.int64_t test (because the source should be
W).
v3: Emit BRW_OPCODE_MUL directly for nir_op_umul_32x16 and
nir_op_imul_32x16. Suggested by Curro.
v4: Smash types of MUL instruction generated for nir_op_umul_32x16 and
nir_op_imul_32x16. With this change, I get the same assembly now as I
did with v2.
v5: Remove support for pre-Gen7. The integer multiply path was
incorrect, and, since the extension isn't enabled pre-Gen7, there's no
way to test it.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: Add a big comment explaining the [IU]SUB_SAT lowering. Suggested by
Caio.
v3: Use get_fpu_lowered_simd_width in get_lowered_simd_width. Suggested
by Ken on IRC.
v4: Fix a typo in a comment. Noticed by Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
|
|
|
|
|
|
|
|
|
| |
v2: Move the check to fs_visitor::lower_integer_multiplication.
Previously the cases where lowering was skipped, the original
instruction was removed by fs_visitor::lower_integer_multiplication.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gen4/5's rounding instructions operate differently than later Gens'.
They all return the floor of the input and the "Round-increment"
conditional modifier answers whether the result should be incremented by
1.0 to get the appropriate result for the operation (and thus its
behavior is determined by the round opcode; e.g., RNDZ vs RNDE).
Since this requires a second instruciton (a predicated ADD) that
consumes the result of the round instruction, the round instruction
cannot write its result directly to the (write-only) message registers.
By emitting the ADD in the generator, the backend thinks it's safe to
store the round's result directly to the message register file.
To avoid this, we move the emission of the ADD instruction to the NIR
translator so that the backend has the information it needs.
I suspect this also fixes code generated for RNDZ.SAT but since
Gen4/5 don't support GLSL 1.30 which adds the trunc() function, I
couldn't write a piglit test to confirm. My thinking is that if x=-0.5:
sat(trunc(-0.5)) = 0.0
But on Gen4/5 where sat(trunc(x)) is implemented as
rndz.r.f0 result, x // result = floor(x)
// set f0 if increment needed
(+f0) add result, result, 1.0 // fixup so result = trunc(x)
then putting saturate on both instructions will give the wrong result.
floor(-0.5) = -1.0
sat(floor(-0.5)) = 0.0
// +1 increment would be needed since floor(-0.5) != trunc(-0.5)
sat(sat(floor(-0.5)) + 1.0) = 1.0
Fixes: 6f394343b1f ("nir/algebraic: i2f(f2i()) -> trunc()")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2355
Reviewed-by: Ian Romanick <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3459>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3459>
|
|
|
|
|
|
|
|
| |
This was the only warning I could see while compiling Iris.
Reviewed-by: Matt Turner <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2821>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2821>
|
|
|
|
|
| |
Signed-off-by: Timur Kristóf <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
With the previous commits we can now enable the unit test on Gen <= 12.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
|
|
|
|
|
|
|
| |
... before giving them to the instruction compactor.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
|
|
|
|
|
| |
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
|
|
|
|
|
|
|
|
|
|
|
| |
Specifically, execution size, register file, and register type. I did
not add validation for vertical stride and width because I don't believe
it's possible to have an otherwise valid instruction with an invalid
vertical stride or width, due to all of the other regioning
restrictions.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
|
|
|
|
|
|
|
|
|
| |
In order to fuzz test instructions, we first need to do some sanity
checking first. Factoring out this function allows us an easy way to
validate a single instruction.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
|
|
|
|
|
|
|
|
|
|
|
| |
16-bit immediates need to be replicated through the 32-bit immediate
field, so we should never see one that isn't.
This does happen however in the fuzzer unit test, so returning false
allows the fuzzer to reject this case.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
|
|
|
|
|
|
|
| |
Necessary to handle these cases when we test fuzzed instructions.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we were sharing tables between generations that were nearly
identical (i.e., Gen8 3-src adds HF support) and used a small bit of
code to handle the differences. This is kind of a mess if you want to
reject 64-bit types on platforms that don't support 64-bit types, so
split the tables, allowing each generation's table to list exactly what
it supports.
Acked-by: Caio Marcelo de Oliveira Filho <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
|