aboutsummaryrefslogtreecommitdiffstats
path: root/src/intel/compiler
Commit message (Collapse)AuthorAgeFilesLines
* intel/compiler: Increase nir_opt_peephole_select thresholdIan Romanick2019-12-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I tried 2, 4, 6, 8, and 10. 8 seemed to be the sweet spot across all Intel platforms. Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Matt Turner <[email protected]> All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 14736141 -> 14661140 (-0.51%) instructions in affected programs: 2272413 -> 2197412 (-3.30%) helped: 8416 HURT: 140 helped stats (abs) min: 1 max: 1152 x̄: 8.99 x̃: 6 helped stats (rel) min: 0.13% max: 42.55% x̄: 4.15% x̃: 3.20% HURT stats (abs) min: 1 max: 140 x̄: 4.73 x̃: 1 HURT stats (rel) min: 0.03% max: 3.44% x̄: 0.87% x̃: 0.60% 95% mean confidence interval for instructions value: -9.36 -8.17 95% mean confidence interval for instructions %-change: -4.14% -3.99% Instructions are helped. total cycles in shared programs: 231560416 -> 228585416 (-1.28%) cycles in affected programs: 126536021 -> 123561021 (-2.35%) helped: 7092 HURT: 1898 helped stats (abs) min: 1 max: 419320 x̄: 519.02 x̃: 159 helped stats (rel) min: <.01% max: 77.25% x̄: 13.52% x̃: 11.77% HURT stats (abs) min: 1 max: 14518 x̄: 371.91 x̃: 36 HURT stats (rel) min: <.01% max: 103.23% x̄: 5.92% x̃: 2.55% 95% mean confidence interval for cycles value: -514.34 -147.50 95% mean confidence interval for cycles %-change: -9.69% -9.14% Cycles are helped. total spills in shared programs: 5763 -> 5848 (1.47%) spills in affected programs: 1797 -> 1882 (4.73%) helped: 13 HURT: 13 total fills in shared programs: 17163 -> 16931 (-1.35%) fills in affected programs: 7214 -> 6982 (-3.22%) helped: 22 HURT: 19 total sends in shared programs: 730410 -> 730246 (-0.02%) sends in affected programs: 2705 -> 2541 (-6.06%) helped: 114 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 1.44 x̃: 1 helped stats (rel) min: 0.60% max: 20.00% x̄: 7.26% x̃: 5.88% 95% mean confidence interval for sends value: -1.55 -1.33 95% mean confidence interval for sends %-change: -7.90% -6.62% Sends are helped. LOST: 4 GAINED: 0 Sandy Bridge total instructions in shared programs: 10760511 -> 10724637 (-0.33%) instructions in affected programs: 961305 -> 925431 (-3.73%) helped: 3734 HURT: 110 helped stats (abs) min: 1 max: 151 x̄: 9.66 x̃: 8 helped stats (rel) min: 0.14% max: 41.21% x̄: 4.93% x̃: 3.95% HURT stats (abs) min: 1 max: 20 x̄: 1.68 x̃: 1 HURT stats (rel) min: 0.12% max: 5.41% x̄: 0.88% x̃: 0.52% 95% mean confidence interval for instructions value: -9.76 -8.91 95% mean confidence interval for instructions %-change: -4.90% -4.63% Instructions are helped. total cycles in shared programs: 153359411 -> 152991077 (-0.24%) cycles in affected programs: 11615401 -> 11247067 (-3.17%) helped: 2725 HURT: 1138 helped stats (abs) min: 1 max: 2844 x̄: 164.27 x̃: 80 helped stats (rel) min: 0.02% max: 48.60% x̄: 7.47% x̃: 3.91% HURT stats (abs) min: 1 max: 4351 x̄: 69.69 x̃: 25 HURT stats (rel) min: 0.02% max: 40.00% x̄: 3.39% x̃: 1.47% 95% mean confidence interval for cycles value: -103.18 -87.52 95% mean confidence interval for cycles %-change: -4.57% -3.97% Cycles are helped. total sends in shared programs: 584038 -> 583855 (-0.03%) sends in affected programs: 3512 -> 3329 (-5.21%) helped: 157 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.17 x̃: 1 helped stats (rel) min: 2.38% max: 25.00% x̄: 6.52% x̃: 6.06% 95% mean confidence interval for sends value: -1.26 -1.07 95% mean confidence interval for sends %-change: -7.17% -5.87% Sends are helped. LOST: 23 GAINED: 0 Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8122617 -> 8111592 (-0.14%) instructions in affected programs: 380503 -> 369478 (-2.90%) helped: 912 HURT: 86 helped stats (abs) min: 1 max: 129 x̄: 12.19 x̃: 9 helped stats (rel) min: 0.30% max: 39.21% x̄: 3.69% x̃: 2.57% HURT stats (abs) min: 1 max: 2 x̄: 1.05 x̃: 1 HURT stats (rel) min: 0.12% max: 3.64% x̄: 0.54% x̃: 0.36% 95% mean confidence interval for instructions value: -12.00 -10.10 95% mean confidence interval for instructions %-change: -3.56% -3.10% Instructions are helped. total cycles in shared programs: 188509780 -> 188534398 (0.01%) cycles in affected programs: 7211542 -> 7236160 (0.34%) helped: 859 HURT: 132 helped stats (abs) min: 2 max: 690 x̄: 46.59 x̃: 16 helped stats (rel) min: 0.01% max: 26.76% x̄: 1.53% x̃: 0.33% HURT stats (abs) min: 2 max: 1592 x̄: 489.67 x̃: 618 HURT stats (rel) min: 0.03% max: 185.92% x̄: 23.35% x̃: 6.26% 95% mean confidence interval for cycles value: 9.58 40.10 95% mean confidence interval for cycles %-change: 0.65% 2.93% Cycles are HURT.
* intel/fs: Disable conditional discard optimization on Gen4 and Gen5Ian Romanick2019-11-211-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The CMP instruction on Gen4 and Gen5 generates one bit (the LSB) of valid data and 31 bits of junk. Results of comparisons that are used as Boolean values need to have a fixup applied to generate the proper 0/~0 values. Calling fs_visitor::nir_emit_alu with need_dest=false prevents the fixup code from being generated. This results in a sequence like: cmp.l.f0.0(16) g8<1>F g14<8,8,1>F 0x0F /* 0F */ ... cmp.l.f0.0(16) g4<1>F g6<8,8,1>F 0x0F /* 0F */ (+f0.1) or.z.f0.1(16) null<1>UD g4<8,8,1>UD g8<8,8,1>UD instead of cmp.l.f0.0(16) g8<1>F g14<8,8,1>F 0x0F /* 0F */ ... cmp.l.f0.0(16) g4<1>F g6<8,8,1>F 0x0F /* 0F */ or(16) g4<1>UD g4<8,8,1>UD g8<8,8,1>UD (+f0.1) and.z.f0.1(16) null<1>UD g4<8,8,1>UD 1UD I examined a couple of the shaders hurt by this change, and ALL of them would have been affected by this bug. :( Reviewed-by: Tapani Pälli <[email protected]> Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1836 Fixes: 0ba9497e66a ("intel/fs: Improve discard_if code generation") Iron Lake total instructions in shared programs: 8122757 -> 8122957 (<.01%) instructions in affected programs: 8307 -> 8507 (2.41%) helped: 0 HURT: 100 HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.84% max: 6.67% x̄: 2.81% x̃: 2.76% 95% mean confidence interval for instructions value: 2.00 2.00 95% mean confidence interval for instructions %-change: 2.58% 3.03% Instructions are HURT. total cycles in shared programs: 188510100 -> 188510376 (<.01%) cycles in affected programs: 76018 -> 76294 (0.36%) helped: 0 HURT: 55 HURT stats (abs) min: 2 max: 12 x̄: 5.02 x̃: 4 HURT stats (rel) min: 0.07% max: 3.75% x̄: 0.86% x̃: 0.56% 95% mean confidence interval for cycles value: 4.33 5.71 95% mean confidence interval for cycles %-change: 0.60% 1.12% Cycles are HURT. GM45 total instructions in shared programs: 4994403 -> 4994503 (<.01%) instructions in affected programs: 4212 -> 4312 (2.37%) helped: 0 HURT: 50 HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.84% max: 6.25% x̄: 2.76% x̃: 2.72% 95% mean confidence interval for instructions value: 2.00 2.00 95% mean confidence interval for instructions %-change: 2.45% 3.07% Instructions are HURT. total cycles in shared programs: 128928750 -> 128928982 (<.01%) cycles in affected programs: 67442 -> 67674 (0.34%) helped: 0 HURT: 47 HURT stats (abs) min: 2 max: 12 x̄: 4.94 x̃: 4 HURT stats (rel) min: 0.09% max: 3.75% x̄: 0.75% x̃: 0.53% 95% mean confidence interval for cycles value: 4.19 5.68 95% mean confidence interval for cycles %-change: 0.50% 1.00% Cycles are HURT.
* Revert "i965/fs: Merge CMP and SEL into CSEL on Gen8+"Jason Ekstrand2019-11-202-108/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 52c7df1643ec9af119fd66f916f7fbdbcc798d2d. The pass, while clearly useful for some shaders, has at least three bugs that I was able to find fairly quickly: 1. It doesn't work for type-converting MOVs because f > 0 is not the same as f2i(f) > 0 2. CSEL is a 3src instruction and only supports one source type; it doesn't take this into account and tries to create instructions which do a F compare and a D select. This is especially nasty to debug because you don't see that in the dumped assembly because we don't properly assert that types are the same in codegen. 3. While you can handle 2, in theory, by reinterpreting types, you can't do that in the presence of source modifiers. This pass doesn't even attempt to detect that. Those are just the ones I found with the one almost trival shader I was debugging. There very likely may be more and. Best thing to do for now is just shut it off until someone has the time to figure out how to do this properly and write tests to ensure it's correct. Fixes: 3cb085e6d61a "i965/fs: Merge CMP and SEL into CSEL on Gen8+" Reviewed-by: Brian Paul <[email protected]>
* nir: move data.image.access to data.accessMarek Olšák2019-11-191-2/+2
| | | | | | The size of the data structure doesn't change. Reviewed-by: Connor Abbott <[email protected]>
* intel/compiler: Don't change hstride if not neededIván Briano2019-11-181-5/+6
| | | | | | | | | | | | Alignment requirements may have changed the horizontal stride already, so don't set it if not required to avoid breaking said requirements. Fixes several tests such as dEQP-VK.subgroups.vote.graphics.subgroupallequal_int8_t Signed-off-by: Iván Briano <[email protected]> Reviewed-by: Paulo Zanoni <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: Add a flag to avoid compacting push constantsJason Ekstrand2019-11-183-145/+167
| | | | | | | In vec4, we can just not run the pass. In fs, things are a bit more deeply intertwined. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/compiler: remove old commentItalo Nicola2019-11-181-3/+0
| | | | | | | This comment was correct some time ago, but since commit d3c10ad42729c1fe74a7f7c67465bd2, it isn't true anymore. Reviewed-by: Paulo Zanoni <[email protected]>
* intel/fs: Do not lower large local arrays to scratch on gen7Danylo Piliaiev2019-11-141-1/+5
| | | | | | | | | | | | | | On gen7 and earlier the scratch space size is limited to 12kB. By enabling this optimization we may easily exceed this limit without having any fallback. arb_compute_shader/linker/bug-93840.shader_test crashes with this lowering on IVB due to exceeding scratch size limit. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2092 Fixes: 69244fc7 Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: fix nir_op_{i,u}*32 on ICLPaulo Zanoni2019-11-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | On ICL we have the src1 restriction which is applied through fix_byte_src() and potentially changes the type of the operands from 8 to 32 bits. When this change happens, we fall into the "else if (bit_size < 32)" case and miscompute src_type because it takes into consideration bit_size (8) instead of the adjusted size of temp_op (32). This results in the shader reading unused memory, giving us mostly failures, but occasional passes due to whatever was already in the registers we were reading. This commit fixes a lot of dEQP subgroup i8vec2 tests on ICL, such as: dEQP-VK.subgroups.arithmetic.compute.subgroupadd_i8vec2 This can also be verified by simply changing fix_byte_src() to apply on all platforms. Fixes: 5847de6e9afe ("intel/compiler: don't use byte operands for src1 on ICL") Reviewed-by: Ivan Briano <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Paulo Zanoni <[email protected]>
* intel/fs: Lower large local arrays to scratchJason Ekstrand2019-11-111-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shader-db results on Kaby Lake: total instructions in shared programs: 14929212 -> 14880028 (-0.33%) instructions in affected programs: 72428 -> 23244 (-67.91%) helped: 6 HURT: 2 helped stats (abs) min: 2165 max: 15981 x̄: 8590.00 x̃: 7624 helped stats (rel) min: 56.06% max: 74.52% x̄: 67.55% x̃: 72.08% HURT stats (abs) min: 1178 max: 1178 x̄: 1178.00 x̃: 1178 HURT stats (rel) min: 350.60% max: 361.35% x̄: 355.97% x̃: 355.97% 95% mean confidence interval for instructions value: -11947.03 -348.97 95% mean confidence interval for instructions %-change: -125.72% 202.37% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 368585300 -> 342557344 (-7.06%) cycles in affected programs: 28144921 -> 2116965 (-92.48%) helped: 6 HURT: 2 helped stats (abs) min: 1404978 max: 7766106 x̄: 4353922.00 x̃: 3890682 helped stats (rel) min: 82.01% max: 95.57% x̄: 89.95% x̃: 92.28% HURT stats (abs) min: 47778 max: 47798 x̄: 47788.00 x̃: 47788 HURT stats (rel) min: 278.20% max: 282.98% x̄: 280.59% x̃: 280.59% 95% mean confidence interval for cycles value: -5900438.73 -606550.27 95% mean confidence interval for cycles %-change: -140.79% 146.16% Inconclusive result (%-change mean confidence interval includes 0). total spills in shared programs: 9243 -> 8901 (-3.70%) spills in affected programs: 2718 -> 2376 (-12.58%) helped: 4 HURT: 4 total fills in shared programs: 21831 -> 10141 (-53.55%) fills in affected programs: 11804 -> 114 (-99.03%) helped: 6 HURT: 2 total sends in shared programs: 815912 -> 815912 (0.00%) sends in affected programs: 0 -> 0 helped: 0 HURT: 0 LOST: 1 GAINED: 3 The helped shaders are all compute shaders in Aztec Ruins. There is also a compute shader in synmark2 OglCSDof that's helped but it doesn't show up in above shader-db results because it went from SIMD8 to SIMD16. That shader improves enough to yield an 15-20% performance boost to the benchmark as a whole on my KBL laptop. The hurt shaders are a couple shaders in Kerbal Space Program and a couple in Aztec Ruins. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: Implement the new load/store_scratch intrinsicsJason Ekstrand2019-11-115-17/+241
| | | | | | | | | | | | | | | | | This commit fills in a number of different pieces: 1. We add support to brw_nir_lower_mem_access_bit_sizes to handle the new intrinsics. This involves simple plumbing work as well as a tiny bit of extra logic to always scalarize scratch intrinsics 2. Add code to brw_fs_nir.cpp to turn nir_load/store_scratch intrinsics into byte/dword scattered read/write messages which use the A32 stateless model. 3. Add code to lower_surface_logical_send to handle dword scattered messages and the A32 stateless model. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/nir: Plumb devinfo through lower_mem_access_bit_sizesJason Ekstrand2019-11-113-9/+14
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: refactor surface header setupJason Ekstrand2019-11-111-23/+16
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: Add DWord scattered read/write opcodesJason Ekstrand2019-11-115-0/+66
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/nir: Use nir_extract_bits in lower_mem_access_bit_sizesJason Ekstrand2019-11-111-37/+15
| | | | | | | The new helper solves most of the annoying problems with data wrangling in brw_nir_lower_mem_access_bit_sizes. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: remove the operand restriction for src1 on GLKPaulo Zanoni2019-11-051-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | Commit 5847de6e9afe implemented a restriction that applies to ICL, but wrongly marked it as also applying to GLK. Reviewers or MR !1125 pointed this, and the commit history shows removal of GLK to parts of the patch, but it turns there was still a left-over GLK check in the code. This code was breaking some of the i8vec2 tests on GLK, for example: dEQP-VK.subgroups.arithmetic.compute.subgroupadd_i8vec2 Removing the GLK check solves the issue for GLK. I don't see a reason on why implementing this restriction would actually break GLK, so there's still more to investigate here since this bug may be affecting ICL+, but let's apply the real GLK fix while we analyze and discuss the other possible issues. Fixes: 5847de6e9afe ("intel/compiler: don't use byte operands for src1 on ICL") BSpec: 3017 Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Paulo Zanoni <[email protected]>
* intel/compiler: Report the number of non-spill/fill SEND messages on vec4 tooIan Romanick2019-10-301-5/+35
| | | | | | | | | | | | | | | | | | This make shader-db's report.py work on Haswell and earlier platforms. The problem is that the script would detect the "sends" output for scalar shaders and expect in in vec4 shaders too. When it didn't find it, the script would fail with: Traceback (most recent call last): File "./report.py", line 351, in <module> main() File "./report.py", line 182, in main before_count = before[p][m] KeyError: 'sends' Fixes: f192741ddd8 ("intel/compiler: Report the number of non-spill/fill SEND messages") Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu/validate/gen12: Add TGL to eu_validate tests.Jordan Justen2019-10-301-0/+9
| | | | | | | | | | | | These reworks were combined into this patch: * Matt Turner: i965: Disable NoDDChk/NoDDClr test on Gen12+ * Francisco Jerez: intel/eu/validate/gen12: Disable qword_low_power_no_depctrl eu_validate test. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: Add instruction compaction support on Gen12Matt Turner2019-10-302-184/+868
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Make separate src0/src1 index tablesMatt Turner2019-10-301-11/+18
| | | | | | TGL uses different data (and even a different format!) for each source. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Inline get_src_index()Matt Turner2019-10-301-26/+15
| | | | | | | TGL will have separate tables for src0 and src1, so the shared function will no longer make sense. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Restructure instruction compaction in preparation for Gen12Matt Turner2019-10-301-20/+28
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Remove unreachable() from brw_reg_type.cMatt Turner2019-10-301-3/+3
| | | | | | | | | | | | | | | | | The EU compaction unit test fuzzes the compaction code by flipping bits. We use a simple skip_bits() function with a list of reserved bits to ignore, but for more complex cases like invalid combinations of register file:type, we need either machinery to check validity or for these functions to simply inform us whether a combination was valid. enum brw_reg_type a 4-bit field in brw_reg, so rather than expanding it with an "INVALID" value, just return -1 and let the caller check for that. Scott suggested redefining unreachable() within the unit test to longjmp() which would allow driver code like this to still use it and allow the test to handle expected failures like this. If that plan works out, I plan to revert this.
* intel/vec4: Set brw_stage_prog_data::has_ubo_pullJason Ekstrand2019-10-301-0/+2
| | | | | | | | | | | | | | In 0e4a75f917, Ken added a flag brw_stage_prog_data which indicates whether any UBO pulls ever occur. Unfortunately, he neglected to set the bit in the vec4 back-end. This was fine at the time because the optimization was intended for iris which does not support gen7 and using the vec4 back-end on Gen8+ requires an environment variable. We want to use this in Vulkan which does support Gen7 so we want the information from the vec4 back-end as well as scalar. Fixes: 0e4a75f917 "intel/compiler: Record whether any pull constant..." Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* util: rename list_empty() to list_is_empty()Timothy Arceri2019-10-281-3/+3
| | | | | | | This makes it clear that it's a boolean test and not an action (eg. "empty the list"). Reviewed-by: Eric Engestrom <[email protected]>
* intel/compiler: Fix C++ one definition rule violationsDanylo Piliaiev2019-10-284-20/+20
| | | | | | | | When building with "-flto" brw::block_data definitions were colliding. Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Implement scoped_memory_barrierCaio Marcelo de Oliveira Filho2019-10-241-8/+19
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Check for NULL key in fs_visitor constructorMichel Dänzer2019-10-241-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Flagged by UBSan: ../src/intel/compiler/brw_fs_visitor.cpp:986:20: runtime error: member access within null pointer of type 'const struct brw_base_prog_key' #0 0x559fadb48556 in fs_visitor::init() ../src/intel/compiler/brw_fs_visitor.cpp:986 #1 0x559fadb46db3 in fs_visitor::fs_visitor(brw_compiler const*, void*, void*, brw_base_prog_key const*, brw_stage_prog_data*, nir_shader const*, unsigned int, int, brw_vue_map const*) ../src/intel/compiler/brw_fs_visitor.cpp:962 #2 0x559fad9c7cd8 in saturate_propagation_fs_visitor::saturate_propagation_fs_visitor(brw_compiler*, brw_wm_prog_data*, nir_shader*) (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/intel/compiler/fs_saturate_propagation+0x61bcd8) #3 0x559fad9960a1 in saturate_propagation_test::SetUp() ../src/intel/compiler/test_fs_saturate_propagation.cpp:65 #4 0x559fadd7a32d in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402 #5 0x559fadd65c3b in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438 #6 0x559fadd0af75 in testing::Test::Run() ../src/gtest/src/gtest.cc:2470 #7 0x559fadd0d8a4 in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656 #8 0x559fadd10032 in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774 #9 0x559fadd2ba0c in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649 #10 0x559fadd7df46 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402 #11 0x559fadd69613 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438 #12 0x559fadd2302e in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257 #13 0x559fadda2d61 in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233 #14 0x559fadda2c21 in main ../src/gtest/src/gtest_main.cc:37 #15 0x7fe8f6748bba in __libc_start_main ../csu/libc-start.c:308 #16 0x559fad9950f9 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/intel/compiler/fs_saturate_propagation+0x5e90f9) Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Adam Jackson <[email protected]>
* intel/compiler: Cast to target type before shifting leftMichel Dänzer2019-10-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Otherwise a smaller type may be promoted to int, which can hit undefined behaviour: ../src/intel/compiler/brw_packed_float.c:66:17: runtime error: left shift of 128 by 24 places cannot be represented in type 'int' #0 0x5604a03969aa in brw_vf_to_float ../src/intel/compiler/brw_packed_float.c:66 #1 0x5604a0391305 in vf_float_conversion_test_test_vf_to_float_Test::TestBody() ../src/intel/compiler/test_vf_float_conversions.cpp:70 #2 0x5604a041a323 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402 #3 0x5604a0405c31 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438 #4 0x5604a03ab03b in testing::Test::Run() ../src/gtest/src/gtest.cc:2474 #5 0x5604a03ad714 in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656 #6 0x5604a03afea2 in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774 #7 0x5604a03cb87c in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649 #8 0x5604a041df3c in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402 #9 0x5604a0409609 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438 #10 0x5604a03c2e9e in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257 #11 0x5604a0442d57 in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233 #12 0x5604a0442c17 in main ../src/gtest/src/gtest_main.cc:37 #13 0x7f9a1983dbba in __libc_start_main ../csu/libc-start.c:308 #14 0x5604a0390d89 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/intel/compiler/vf_float_conversions+0x8dd89) Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Adam Jackson <[email protected]>
* intel/compiler: Don't left-shift by >= the number of bits of the typeMichel Dänzer2019-10-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | To avoid it, use the modulo of the number of bits in the value being shifted, which is presumably what ended up happening on x86. Flagged by UBSan: ../src/intel/compiler/brw_eu_validate.c:974:33: runtime error: shift exponent 64 is too large for 64-bit type 'long unsigned int' #0 0x561abb612ab3 in general_restrictions_on_region_parameters ../src/intel/compiler/brw_eu_validate.c:974 #1 0x561abb617574 in brw_validate_instructions ../src/intel/compiler/brw_eu_validate.c:1851 #2 0x561abb53bd31 in validate ../src/intel/compiler/test_eu_validate.cpp:106 #3 0x561abb555369 in validation_test_source_cannot_span_more_than_2_registers_Test::TestBody() ../src/intel/compiler/test_eu_validate.cpp:486 #4 0x561abb742651 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402 #5 0x561abb72e64d in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438 #6 0x561abb6d5451 in testing::Test::Run() ../src/gtest/src/gtest.cc:2474 #7 0x561abb6d7b2a in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656 #8 0x561abb6da2b8 in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774 #9 0x561abb6f5c92 in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649 #10 0x561abb74626a in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402 #11 0x561abb732025 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438 #12 0x561abb6ed2b4 in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257 #13 0x561abb768b3b in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233 #14 0x561abb7689fb in main ../src/gtest/src/gtest_main.cc:37 #15 0x7f525e5a9bba in __libc_start_main ../csu/libc-start.c:308 #16 0x561abb538ed9 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/intel/compiler/eu_validate+0x1b8ed9) Reviewed-by: Adam Jackson <[email protected]>
* intel/compiler: Refactor disassembly of sources in 3src instructionSagar Ghuge2019-10-211-19/+10
| | | | | Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/compiler: Don't move immediate in registerSagar Ghuge2019-10-211-0/+38
| | | | | | | | | On Gen12, we support mixed mode HF/F operands, and also 3 source instruction supports immediate value support, so keep immediate as it is, if it fits properly in 16 bit field. Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/compiler: Set bits according to source fileSagar Ghuge2019-10-211-2/+12
| | | | | | | | On Gen >= 12, if src0 or src2 holds immediate value, we need set src[0/2]_is_imm bits instead of register file. Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/compiler: Add Immediate support for 3 source instructionSagar Ghuge2019-10-211-21/+32
| | | | | | | | On Gen >= 10, Either src0 or src2 can use 16-bit immediate value, but not both. Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/compiler: Remove emit_alpha_to_coverage workaround from backendSagar Ghuge2019-10-212-84/+13
| | | | | | | | | | | | | Remove emit_alpha_to_coverage workaround from backend compiler and start using ported workaround from NIR. v2: Copy comment from brw_fs_visitor (Caio Marcelo de Oliveira Filho) Fixes piglit test on HSW: - arb_sample_shading-builtin-gl-sample-mask-mrt-alpha-to-coverage-combinations Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* nir: Add alpha_to_coverage lowering passSagar Ghuge2019-10-213-0/+171
| | | | | | | | | | | | | | | | | | | | | Importing this pass from fs_visitor::emit_alpha_to_coverage_workaround() in intel/compiler. v2 (Caio Marcelo de Oliveira Filho): - Track store output and sample mask instruction - Nest math insturction for more readability - Bail out early if no gl_SampleMask v3: (Caio Marcelo de Oliveira Filho): - Do math instructions after instruction block - Restructure code - Move pass under src/intel/compiler v4: (Caio Marcelo de Oliveira Filho): - Organize dither mask calculation Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/compiler: Report the number of non-spill/fill SEND messagesKenneth Graunke2019-10-171-5/+30
| | | | | | | | | | This can be useful to measure whether memory access optimizations are having the desired effect. For example, we might see a reduction in image loads/stores, or constant buffer loads. We can already see this in cycle estimates to some extent, but this is a more direct approach, minus a lot of the noise of random scheduler shuffling. Reviewed-by: Jason Ekstrand <[email protected]>
* intel/vec4: Don't try both sources as immediates for DPHIan Romanick2019-10-171-1/+1
| | | | | | | | | | DPH isn't actually commutative, so this doesn't work. If the immediate in src0 would be a VF candidate, we could do better. *shrug* No shader-db changes on any Intel platform. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Fixes: b04beaf41d2 ("intel/vec4: Try both sources as candidates for being immediates")
* intel/fs/gen12: Add tests for scoreboard passCaio Marcelo de Oliveira Filho2019-10-172-1/+864
| | | | | | | | | | | | | Tests the combinations of cases of RAW, WAW and WAR hazards involving both inorder and outoforder instructions. Also tests that dependencies combine and propagate correctly through control flow (loops and conditionals). v2: Add an extra test illustrating that the non-logical CFG edge between then-block and else-block is being taking into account. (Curro) Reviewed-by: Francisco Jerez <[email protected]>
* intel/fs/gen12: Use TCS 8_PATCH mode.Kenneth Graunke2019-10-112-6/+8
| | | | Reviewed-by: Francisco Jerez <[email protected]>
* intel/fs/gen12: Implement gl_FrontFacing on gen12+.Jason Ekstrand2019-10-112-2/+25
| | | | | | | | The bit moved on gen12 in order to prepare for dual-SIMD8 dispatch. This implementation isn't an entirely complete as it only works on SIMD8 and SIMD16 and not dual-SIMD8. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs/gen11+: Fix CS_OPCODE_CS_TERMINATE codegen.Francisco Jerez2019-10-112-8/+11
| | | | | | | | | | | | | Apparently the ts_request_type and ts_resource_select thread spawner message descriptor bits were removed from the hardware at least since ICL. Drop them in order to avoid assertion failures on Gen12+ platforms which don't have any encoding for this. On Gen9+ these are probably just ignored by the hardware, so this is unlikely to have had any functional implications prior to Gen12. v2: Mark TS message fields as non-existing in brw_inst.h on ICL. (Caio) Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs/gen12: Fix barrier codegen.Francisco Jerez2019-10-112-2/+7
| | | | | | | | The WAIT instruction has been removed, but SYNC.bar can be used instead to wait for a notification on n0.0. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu: Don't set notify descriptor field of gateway barrier message.Francisco Jerez2019-10-111-1/+0
| | | | | | | | | | Apparently this field was removed on SKL, and according to the hardware docs for previous platforms "This field is only valid for a ForwardMsg message. It is ignored for other messages. The BarrierMsg message always increments the N0 notification counter". Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/ir/gen12: Update assert in brw_stage_has_packed_dispatch().Francisco Jerez2019-10-111-1/+1
| | | | | | | Confirmed no regressions after a full Piglit run on TGL with the brw_fs_test_dispatch_packing() test enabled. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu/validate/gen12: Don't blow up on indirect src0.Jason Ekstrand2019-10-111-1/+2
| | | | | | | They look like a NULL source if you don't look at the address mode. Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu/validate/gen12: Validation fixes for SEND instruction.Francisco Jerez2019-10-111-22/+28
| | | | | | | | The following fix-up by Jordan Justen is squashed in: intel/eu/validate: gen12 send instruction doesn't have a dst type field Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu/validate/gen12: Fix validation of SYNC instruction.Francisco Jerez2019-10-111-1/+1
| | | | | | | src0 will typically be null for this instruction. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/eu/validate/gen12: Implement integer multiply restrictions in EU ↵Francisco Jerez2019-10-111-0/+33
| | | | | | | | validator. Due to hardware bug filed as HSDES#1604601757. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/ir: Lower fpow on Gen12.Jordan Justen2019-10-111-0/+1
| | | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>