aboutsummaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* replace _mesa_logbase2 with util_logbase2Dylan Baker2020-04-215-8/+8
| | | | | | | Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3024>
* replace _mesa_is_pow_two with util_is_power_of_two_*Dylan Baker2020-04-212-3/+2
| | | | | | | | | | | | | Mostly this uses util_is_power_of_two_or_zero, which has the same behavior as _mesa_is_pow_two when the input is zero. In cases where the value is known to be != 0 ahead of time I used the _nonzero variant as it may be faster on some platforms. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3024>
* anv/android: fix assert in anv_import_ahw_memoryAbhishek Kumar2020-04-211-1/+1
| | | | | | | | | | | | Commit fixes assert that triggers when running dEQP-VK.api.external.memory.android_hardware_buffer.dedicated.buffer#bind_export_import_bind on a debug build of Mesa. Fixes: c79a528d ("anv/android: support import/export of AHardwareBuffer objects") Signed-off-by: Abhishek Kumar <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4655>
* nir: Delete the fnoise opcodesJason Ekstrand2020-04-212-36/+0
| | | | | | | | | As of the previous commit, they are never used. Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4624>
* intel/fs: Coalesce when the src live range is contained in the dstJason Ekstrand2020-04-211-7/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider the following case: // g119-123 are written somewhere above mul.sat(16) g67<1>F g6.4<0,1,0>F g125<8,8,1>F mul.sat(16) g69<1>F g6.5<0,1,0>F g125<8,8,1>F mul.sat(16) g71<1>F g6.6<0,1,0>F g125<8,8,1>F mov(16) g119<1>F g67<8,8,1>F mov(16) g121<1>F g69<8,8,1>F mov(16) g123<1>F g71<8,8,1>F We should be able to coalesce it into mul.sat(16) g119<1>F g6.4<0,1,0>F g125<8,8,1>F mul.sat(16) g121<1>F g6.5<0,1,0>F g125<8,8,1>F mul.sat(16) g123<1>F g6.6<0,1,0>F g125<8,8,1>F What's stopping us is an overly conservative check for writes to the two registers being coalesced. The check walks over the intersection of their live ranges and checks for no writes to either one. However, because the register which starts the live range (the mul.sat in this case) is inside that intersection, we flag it as a write in the intersection and don't coalesce. However, this case is safe because the destination register of the copy is never read after the source is written. Shader-db changes on ICL: total instructions in shared programs: 16043613 -> 16042610 (<.01%) instructions in affected programs: 43036 -> 42033 (-2.33%) helped: 226 HURT: 0 helped stats (abs) min: 1 max: 30 x̄: 4.44 x̃: 4 helped stats (rel) min: 0.09% max: 26.67% x̄: 4.89% x̃: 3.43% 95% mean confidence interval for instructions value: -4.86 -4.02 95% mean confidence interval for instructions %-change: -5.57% -4.22% Instructions are helped. total cycles in shared programs: 334766372 -> 334710124 (-0.02%) cycles in affected programs: 617548 -> 561300 (-9.11%) helped: 214 HURT: 2 helped stats (abs) min: 15 max: 1512 x̄: 263.21 x̃: 212 helped stats (rel) min: 0.30% max: 75.36% x̄: 25.30% x̃: 21.58% HURT stats (abs) min: 40 max: 40 x̄: 40.00 x̃: 40 HURT stats (rel) min: 0.15% max: 0.15% x̄: 0.15% x̃: 0.15% 95% mean confidence interval for cycles value: -277.91 -242.90 95% mean confidence interval for cycles %-change: -27.58% -22.55% Cycles are helped. No spill/fill changes or gained/lost Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4627>
* intel/fs: Rename block to scan_block in can_coalesce_varsJason Ekstrand2020-04-211-4/+4
| | | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4627>
* anv: use common nir_convert_ycbcrJonathan Marek2020-04-201-123/+8
| | | | | | | | | Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: D Scott Phillips <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4528>
* anv: Add support for new MMAP_OFFSET ioctl.Rafael Antognolli2020-04-203-5/+49
| | | | | | | | | | | v2: Update getparam check (Ken). [[email protected]: use 0 offset for MMAP_OFFSET] Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1675>
* anv: Add anv_device parameter to anv_gem_munmap.Rafael Antognolli2020-04-205-7/+8
| | | | | | | | | | | | Also update all of its callers. On the next commit, the device will be used by anv_gem_munmap to choose whether we need to call the valgrind code or not, depending on which type of mmap we are using. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1675>
* intel/fs,vec4: Properly account SENDs in IVB memory fenceCaio Marcelo de Oliveira Filho2020-04-204-8/+20
| | | | | | | | | | | Change brw_memory_fence to return the number of messages emitted, and use that to update the send_count statistic in code generation. This will fix the book-keeping for IVB since the memory fences will result in two SEND messages. Reviewed-by: Francisco Jerez <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4646>
* anv: Apply any needed PIPE_CONTROLs before emitting stateJason Ekstrand2020-04-191-0/+12
| | | | | | | | | | | | | Push constants in particular can get picked up by the hardware at weird times that happen *before* 3DPRIMITIVE. Therefore, we need to flush before we emit all our state to ensure that any data they may pick up is in memory in time. This fixes an app which does vkCmdCopyBuffers immediately followed by a vkCmdBeginRenderPass and vkCmdDraw which uses the destination of the copy as a UBO which we push. Cc: [email protected] Reviewed-by: Lionel Landwerlin <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4601>
* anv: Move vb_emit setup closer to where it's used in flush_stateJason Ekstrand2020-04-191-4/+4
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4601>
* Fix promotion of floats to doublesAlbert Astals Cid2020-04-181-1/+1
| | | | | | | | | Use the f variants of the math functions if the input parameter is a float, saves converting from float to double and running the double variant of the math function for gaining no precision at all Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3969>
* anv: skip writing perfcntr in results on Gen12+Lionel Landwerlin2020-04-181-0/+4
| | | | | | | | | | | | We were not capturing the register already so don't bother writing the delta in the results (we were previously doing a delta between two 0 values). v2: Fix unused function warning Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4586>
* intel/perf: Enable MDAPI queries for Gen12Lionel Landwerlin2020-04-182-5/+8
| | | | | | | | | | We're missing the cases for gen12 leading to those metrics going missing. Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: 15b7b56eb2fb41 ("intel/perf: add TGL support") Reviewed-by: Kenneth Graunke <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4586>
* intel/compiler: Fixup operands in fs_builder::emit() that takes arrayIan Romanick2020-04-171-1/+10
| | | | | | | | | | The versions that take a specific number of operands will do various fixups depending on the platform and the opcode. However, the version that takes an array of sources did not. This makes all version operate similarly. Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4582>
* intel/compiler: CSEL can do saturateIan Romanick2020-04-171-0/+1
| | | | | Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4582>
* intel/compiler: Only GE and L modifiers are commutative for SELIan Romanick2020-04-171-1/+5
| | | | | Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4582>
* intel/compiler: Silence unused parameter warning in update_inst_scoreboardIan Romanick2020-04-171-3/+3
| | | | | | | | | | src/intel/compiler/brw_fs_scoreboard.cpp: In function ‘void {anonymous}::update_inst_scoreboard(const fs_visitor*, const ordered_address*, const fs_inst*, unsigned int, {anonymous}::scoreboard&)’: src/intel/compiler/brw_fs_scoreboard.cpp:793:45: warning: unused parameter ‘shader’ [-Wunused-parameter] 793 | update_inst_scoreboard(const fs_visitor *shader, const ordered_address *jps, | ~~~~~~~~~~~~~~~~~~^~~~~~ Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4582>
* intel/compiler: Silence unused parameter warning in ↵Ian Romanick2020-04-172-4/+3
| | | | | | | | | | | | fs_live_variables::setup_one_read src/intel/compiler/brw_fs_live_variables.cpp: In member function ‘void brw::fs_live_variables::setup_one_read(brw::fs_live_variables::block_data*, fs_inst*, int, const fs_reg&)’: src/intel/compiler/brw_fs_live_variables.cpp:56:67: warning: unused parameter ‘inst’ [-Wunused-parameter] 56 | fs_live_variables::setup_one_read(struct block_data *bd, fs_inst *inst, | ~~~~~~~~~^~~~ Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4582>
* intel/compiler: Silence unused parameter warnings in vec4_tcs_visitorIan Romanick2020-04-171-4/+4
| | | | | | | | | | | | | | | In file included from src/intel/compiler/brw_vec4_tcs.cpp:31: src/intel/compiler/brw_vec4_tcs.h: In member function ‘virtual void brw::vec4_tcs_visitor::emit_urb_write_header(int)’: src/intel/compiler/brw_vec4_tcs.h:74:43: warning: unused parameter ‘mrf’ [-Wunused-parameter] 74 | virtual void emit_urb_write_header(int mrf) {} | ~~~~^~~ src/intel/compiler/brw_vec4_tcs.h: In member function ‘virtual brw::vec4_instruction* brw::vec4_tcs_visitor::emit_urb_write_opcode(bool)’: src/intel/compiler/brw_vec4_tcs.h:75:57: warning: unused parameter ‘complete’ [-Wunused-parameter] 75 | virtual vec4_instruction *emit_urb_write_opcode(bool complete) { return NULL; } | ~~~~~^~~~~~~~ Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4582>
* intel/blorp: Delete an unused enumJason Ekstrand2020-04-171-15/+0
| | | | | | This was lying around from back when BLORP write to fs_visitor directly. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4606>
* anv: Emit pushed UBO bounds checking code in the back-end compilerJason Ekstrand2020-04-175-144/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit fixes performance regressions introduced by e03f9652801ad7 in which we started bounds checking our push constants. This added a LOT of shader code to shaders which use the robustBufferAccess feature and led to substantial spilling. The checking we just added to the FS back-end is far more efficient for two reasons: 1. It can be done at a whole register granularity rather than per- scalar and so we emit one SIMD8 SEL per 32B GRF rather than one SIMD16 SEL (executed as two SELs) for each component loaded. 2. Because we do it with NoMask instructions, we can do it on whole pushed GRFs without splatting them out to SIMD8 or SIME16 values. This means that robust buffer access no longer explodes our register pressure for no good reason. As a tiny side-benefit, we're now using can use AND instead of SEL which means no need for the flag and better scheduling. Vulkan pipeline database results on ICL: Instructions in all programs: 293586059 -> 238009118 (-18.9%) SENDs in all programs: 13568515 -> 13568515 (+0.0%) Loops in all programs: 149720 -> 149720 (+0.0%) Cycles in all programs: 88499234498 -> 84348917496 (-4.7%) Spills in all programs: 1229018 -> 184339 (-85.0%) Fills in all programs: 1348397 -> 246061 (-81.8%) This also improves the performance of a few apps: - Shadow of the Tomb Raider: +4% - Witcher 3: +3.5% - UE4 Shooter demo: +2% Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4447>
* intel/cfg: Add first/last_block helpersJason Ekstrand2020-04-171-0/+55
| | | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4447>
* intel/batch_decoder: Stop printing to stdoutJason Ekstrand2020-04-161-2/+2
| | | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4597>
* anv: Report correct SLM sizeJason Ekstrand2020-04-161-1/+1
| | | | | | Fixes: d787a2d0 "anv: Implement VK_KHR_pipeline_executable_properties" Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4597>
* intel: Add _const versions of prog_data cast helpersJason Ekstrand2020-04-161-5/+10
| | | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4597>
* anv: Fix UBO range detection in anv_nir_compute_push_layoutJason Ekstrand2020-04-151-10/+5
| | | | | | | | | | | This fixes two bugs: First, if the same block index showed up twice, we only pick the first one. Second, we weren't multiplying by 32. This didn't show up in tests because RBA testing is garbage. Found while looking at shaders from the UE4 Shooter demo. Fixes: e03f9652 "anv: Bounds-check pushed UBOs when..." Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4578>
* anv: Advertise SEND count through VK_EXT_pipeline_executable_propertiesJason Ekstrand2020-04-154-0/+13
| | | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4578>
* intel/compiler: Remove cs_prog_data->threadsCaio Marcelo de Oliveira Filho2020-04-092-20/+3
| | | | | | | | | | At this point all drivers are doing this math on their own -- since most of them need to cover the variable group size case, in which at compile time the group size (and number of threads) is not defined. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Paulo Zanoni <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4504>
* anv: Stop using cs_prog_data->threadsCaio Marcelo de Oliveira Filho2020-04-095-6/+32
| | | | | | | | | | | | | | Move the calculation to helper functions -- similar to what GL already needs to do. This is a preparation for dropping this field since this value is expected to be calculated by the drivers now for variable group size case. And also the field would get in the way of brw_compile_cs producing multiple SIMD variants (like FS). Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Paulo Zanoni <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4504>
* intel/compiler: Add support for variable workgroup sizePlamena Manolova2020-04-095-29/+100
| | | | | | | | | | | | | | | | | | | | | | Add new builtin parameters that are used to keep track of the group size. This will be used to implement ARB_compute_variable_group_size. The compiler will use the maximum group size supported to pick a suitable SIMD variant. A later improvement will be to keep all SIMD variants (like FS) so the driver can select the best one at dispatch time. When variable workgroup size is used, the small workgroup optimization is disabled as it we can't prove at compile time that the barriers won't be needed. Extracted from original i965 patch with additional changes by Caio Marcelo de Oliveira Filho. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Paulo Zanoni <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4504>
* intel/compiler: Replace cs_prog_data->push.total with a helperCaio Marcelo de Oliveira Filho2020-04-093-8/+18
| | | | | | | | | | | | | | | The push.total field had three values but only one was directly used (size). Replace it with a helper function that explicitly takes the cs_prog_data and the number of threads -- and use that in the drivers. This is a preparation for ARB_compute_variable_group_size where the number of threads (hence the total size for push constants) is not defined at compile time (not cs_prog_data->threads). Reviewed-by: Paulo Zanoni <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4504>
* anv/gen12: Lower VK_KHR_multiview using Primitive ReplicationCaio Marcelo de Oliveira Filho2020-04-078-16/+471
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Identify if view_index is used only for position calculation, and use Primitive Replication to implement Multiview in Gen12. This feature allows storing per-view position information in a single execution of the shader, treating position as an array. The shader is transformed by adding a for-loop around it, that have an iteration per active view (in the view_mask). Stores to the position now store into the position array for the current index in the loop, and load_view_index() will return the view index corresponding to the current index in the loop. The feature is controlled by setting the environment variable ANV_PRIMITIVE_REPLICATION_MAX_VIEWS, which defaults to 2 if unset. For pipelines with view counts larger than that, the regular instancing will be used instead of Primitive Replication. To disable it completely set the variable to 0. v2: Don't assume position is set in vertex shader; remove only stores for position; don't apply optimizations since other passes will do; clone shader body without extract/reinsert; don't use last_block (potentially stale). (Jason) Fix view_index immediate to contain the view index, not its order. Check for maximum number of views supported. Add guard for gen12. v3: Clone the entire shader function and change it before reinsert; disable optimization when shader has memory writes. (Jason) Use a single environment variable with _DEBUG on the name. v4: Change to use new nir_deref_instr. When removing stores, look for mode nir_var_shader_out instead of the walking the list of outputs. Ensure unused derefs are removed in the non-position part of the shader. Remove dead control flow when identifying if can use or not primitive replication. v5: Consider all the active shaders (including fragment) when deciding that Primitive Replication can be used. Change environment variable to ANV_PRIMITIVE_REPLICATION. Squash the emission of 3DSTATE_PRIMITIVE_REPLICATION into this patch. Disable Prim Rep in blorp_exec_3d. v6: Use a loop around the shader, instead of manually unrolling, since the regular unroll pass will kick in. Document that we don't expect to see copy_deref or load_deref involving the position variable. Recover use_primitive_replication value when loading pipeline from the cache. Set VARYING_SLOT_LAYER to 0 in the shader. Earlier versions were relying on ForceZeroRTAIndexEnable but that might not be sufficient. Disable Prim Rep in cmd_buffer_so_memcpy. v7: Don't use Primitive Replication if position is not set, fallback to instancing; change environment variable to be ANV_PRIMITVE_REPLICATION_MAX_VIEWS and default it to 2 based on experiments. Reviewed-by: Rafael Antognolli <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2313> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2313>
* intel/fs: Allow multiple slots for positionCaio Marcelo de Oliveira Filho2020-04-079-12/+41
| | | | | | | | | | | | | | | | | | | | | | Change brw_compute_vue_map() to also take the number of pos slots. If more than one slot is used, the VARYING_SLOT_POS is treated as an array. When using Primitive Replication, instead of a single position, the VUE must contain an array of positions. Padding might be necessary (after clip distance) to ensure rest of attributes start aligned. v2: Add note about array in the commit message and assert that pos_slots >= 1 to make clear 0 is invalid. (Jason) Move padding to be after the clip distance. v3: Apply the correct offset when gathering the sources from outputs. Reviewed-by: Jason Ekstrand <[email protected]> [v2] Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2313>
* intel/gen12: Add XML description for 3DSTATE_PRIMITIVE_REPLICATIONCaio Marcelo de Oliveira Filho2020-04-071-0/+16
| | | | | | | | | v2: Use groups for the 16-element arrays "Viewport Offset" and "RTAI Offset". (Ken) Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2313>
* intel/nir: Enable load/store vectorizationJason Ekstrand2020-04-031-11/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit enables the I/O vectorization pass that was originally written for ACO for Intel drivers. We enable it for UBOs, SSBOs, global memory, and SLM. We only enable vectorization for the scalar back-end because it vec4 makes certain alignment assumptions. Shader-db results with iris on ICL: total instructions in shared programs: 16077927 -> 16068236 (-0.06%) instructions in affected programs: 199839 -> 190148 (-4.85%) helped: 324 HURT: 0 helped stats (abs) min: 2 max: 458 x̄: 29.91 x̃: 4 helped stats (rel) min: 0.11% max: 38.94% x̄: 4.32% x̃: 1.64% 95% mean confidence interval for instructions value: -37.02 -22.80 95% mean confidence interval for instructions %-change: -5.07% -3.58% Instructions are helped. total cycles in shared programs: 336806135 -> 336151501 (-0.19%) cycles in affected programs: 16009735 -> 15355101 (-4.09%) helped: 458 HURT: 154 helped stats (abs) min: 1 max: 77812 x̄: 1542.50 x̃: 75 helped stats (rel) min: <.01% max: 34.46% x̄: 5.16% x̃: 2.01% HURT stats (abs) min: 1 max: 22800 x̄: 336.55 x̃: 20 HURT stats (rel) min: <.01% max: 17.11% x̄: 2.12% x̃: 1.00% 95% mean confidence interval for cycles value: -1596.83 -542.49 95% mean confidence interval for cycles %-change: -3.83% -2.82% Cycles are helped. total sends in shared programs: 814177 -> 809049 (-0.63%) sends in affected programs: 15422 -> 10294 (-33.25%) helped: 324 HURT: 0 helped stats (abs) min: 1 max: 256 x̄: 15.83 x̃: 2 helped stats (rel) min: 1.33% max: 67.90% x̄: 21.21% x̃: 15.38% 95% mean confidence interval for sends value: -19.67 -11.98 95% mean confidence interval for sends %-change: -23.03% -19.39% Sends are helped. LOST: 7 GAINED: 2 Most of the helped shaders were in the following titles: - Doom - Deus Ex: Mankind Divided - Aztec Ruins - Shadow of Mordor - DiRT Showdown - Tomb Raider (Rise, I think) Five of the lost programs are SIMD16 shaders we lost from dirt showdown. The other two are compute shaders in Aztec Ruins which switched from SIMD8 to SIMD16. Vulkan pipeline-db stats on ICL: Instructions in all programs: 296780486 -> 293493363 (-1.1%) Loops in all programs: 149669 -> 149669 (+0.0%) Cycles in all programs: 90999206722 -> 88513844563 (-2.7%) Spills in all programs: 1710217 -> 1730691 (+1.2%) Fills in all programs: 1931235 -> 1958138 (+1.4%) By far the most help was in the Tomb Raider games. A couple of Batman games with DXVK were also helped. In Shadow of the Tomb Raider: Instructions in all programs: 41614336 -> 39408023 (-5.3%) Loops in all programs: 32200 -> 32200 (+0.0%) Cycles in all programs: 1875498485 -> 1667034831 (-11.1%) Spills in all programs: 196307 -> 214945 (+9.5%) Fills in all programs: 282736 -> 307113 (+8.6%) Benchmarks of real games I've done on this patch: - Rise of the Tomb Raider: +3% - Shadow of the Tomb Raider: +10% Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
* intel/nir: Lower memory access bit sizes laterJason Ekstrand2020-04-031-2/+12
| | | | | | | | | | | We're about to do load/store vectorization right before this but we need that to happen after we've done a round of optimization. Otherwise, we'll be getting unoptimized NIR in from ANV and the vectorizer won't be able to do anything with it. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
* anv: Improve brw_nir_lower_mem_access_bit_sizesJason Ekstrand2020-04-031-5/+5
| | | | | | | | | | This commit makes us take both bit size and alignment into account so that we can properly handle cases such as when we have a 32-bit store to an 8-bit-aligned address. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
* intel/fs: Choose memory message type based on bit sizeJason Ekstrand2020-04-031-30/+42
| | | | | | | | | | | | | | Thanks to the NIR vectorizing pass, we're about to see alignments that are higher than the bit size. Previously, we could use either and we just happened to choose alignment (probably the wrong choice) so it's harmless to switch to detecting based on bit size. This commit changes things to take both into account which is more accurate to what the messages we're using do. We also beef up the asserts and make them more consistent, more accurate, and more complete. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
* intel/aub_viewer: fix access to freed memoryLionel Landwerlin2020-04-031-1/+1
| | | | | | | | | | Windows closed while we're displaying them might lead to invalid memory accessed, so use the safe iterators on the list of windows. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4430> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4430>
* anv/image: Use align_u64 for image offsetsJason Ekstrand2020-04-021-2/+2
| | | | | | | | | | | | | The ALIGN functions in util/u_math.h work on uintptr_t whose size changes depending on your platform. Use ones which take an explicit 64-bit type instead to avoid 32-bit platform issues. Cc: [email protected] Reported-by: Mark Janes <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4414> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4414>
* anv/pipeline: allow more than 16 FS inputsJuan A. Suarez Romero2020-04-011-14/+21
| | | | | | | | | | | | | | A fragment shader can have more than 16 inputs, so SBE emission should deal with all of them. This fixes dEQP-VK.pipeline.max_varyings.* Cc: [email protected] Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Ivan Briano <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2010> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2010>
* intel/compiler: store the FS inputs in WM prog dataJuan A. Suarez Romero2020-04-012-0/+6
| | | | | | | | | | | Store the fragment shader inputs in the program data so we can use them later when required without needing the NIR shader. Cc: [email protected] Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Ivan Briano <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2010>
* anv: use urb_setup_attribs in SBEJuan A. Suarez Romero2020-04-011-3/+3
| | | | | | | | | | Avoid looping over all VARYING_SLOT_MAX urb_setup arrray entries. Cc: [email protected] Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Ivan Briano <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2010>
* anv: Do not sample from 3d depth image with HiZDanylo Piliaiev2020-04-011-0/+10
| | | | | | | | | | | | | | | | | | | For Gen8-11, there are some restrictions around sampling from HiZ. The Skylake PRM docs for RENDER_SURFACE_STATE::AuxiliarySurfaceMode say: "If this field is set to AUX_HIZ, Number of Multisamples must be MULTISAMPLECOUNT_1, and Surface Type cannot be SURFTYPE_3D." Fixes: dEQP-VK.geometry.layered.3d.*.readback Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2720 Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Arcady Goldmints-Orlov <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4409> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4409>
* nir/algebraic: Distribute source modifiers into instructionsIan Romanick2020-04-011-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are three main classes of cases that are helped by this change: 1. When the negation is applied to a value being type converted (e.g., float(-x)). This could possibly also be handled with more clever code generation. 2. When the negation is applied to a phi node source (e.g., x = -(...); at the end of a basic block). This was the original case that caught my attention while looking at shader-db dumps. 3. When the negation is applied to the source of an instruction that cannot have source modifiers. This includes texture instructions and math box instructions on pre-Gen7 platforms (see more details below). In many these cases the negation can be propagated into the instructions that generate the value (e.g., -(a*b) = (-a)*b). In addition to the operations implemtned in this patch, I also tried: - frcp - Helped 6 or fewer shaders on Gen7+, and hurt just as many on pre-Gen7. On Gen6 and earlier, frcp is a math box instruction, and math box instructions cannot have source modifiers. I suspect this is why so many more shaders are helped on Gen6 than on Gen5 or Gen7. Gen6 supports OpenGL 3.3, so a lot more shaders compile on it. A lot of these shaders may have things like cos(-x) or rcp(-x) that could result in an explicit negation instruction. - bcsel - Hurt a few shaders with none helped. bcsel operates on integer sources, so the fabs or fneg cannot be a source modifier in the bcsel itself. - Integer instructions - No changes on any Intel platform. Some notes about the shader-db results below. - On Tiger Lake, a single Deus Ex fragment shader is hurt for both spills and fills. - On Haswell, a different Deus Ex fragment shader is hurt for both spills and fills. - On GM45, the "LOST: 1" and "GAINED: 1" is a single Left4Dead 2 (very high graphics settings, lol) fragment shader that upgrades from SIMD8 to SIMD16. v2: Add support for fsign. Add some patterns that remove redundant negations and redundant absolute value rather than trying to push them down the tree. Tiger Lake total instructions in shared programs: 17611333 -> 17586465 (-0.14%) instructions in affected programs: 3033734 -> 3008866 (-0.82%) helped: 10310 HURT: 632 helped stats (abs) min: 1 max: 35 x̄: 2.61 x̃: 1 helped stats (rel) min: 0.04% max: 16.67% x̄: 1.43% x̃: 1.01% HURT stats (abs) min: 1 max: 47 x̄: 3.21 x̃: 2 HURT stats (rel) min: 0.04% max: 5.08% x̄: 0.88% x̃: 0.63% 95% mean confidence interval for instructions value: -2.33 -2.21 95% mean confidence interval for instructions %-change: -1.32% -1.27% Instructions are helped. total cycles in shared programs: 338365223 -> 338262252 (-0.03%) cycles in affected programs: 125291811 -> 125188840 (-0.08%) helped: 5224 HURT: 2031 helped stats (abs) min: 1 max: 5670 x̄: 46.73 x̃: 12 helped stats (rel) min: <.01% max: 34.78% x̄: 1.91% x̃: 0.97% HURT stats (abs) min: 1 max: 2882 x̄: 69.50 x̃: 14 HURT stats (rel) min: <.01% max: 44.93% x̄: 2.35% x̃: 0.74% 95% mean confidence interval for cycles value: -18.71 -9.68 95% mean confidence interval for cycles %-change: -0.80% -0.63% Cycles are helped. total spills in shared programs: 8942 -> 8946 (0.04%) spills in affected programs: 8 -> 12 (50.00%) helped: 0 HURT: 1 total fills in shared programs: 9399 -> 9401 (0.02%) fills in affected programs: 21 -> 23 (9.52%) helped: 0 HURT: 1 Ice Lake total instructions in shared programs: 16124348 -> 16102258 (-0.14%) instructions in affected programs: 2830928 -> 2808838 (-0.78%) helped: 11294 HURT: 2 helped stats (abs) min: 1 max: 12 x̄: 1.96 x̃: 1 helped stats (rel) min: 0.07% max: 17.65% x̄: 1.32% x̃: 0.93% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 3.45% max: 4.00% x̄: 3.72% x̃: 3.72% 95% mean confidence interval for instructions value: -1.99 -1.93 95% mean confidence interval for instructions %-change: -1.34% -1.29% Instructions are helped. total cycles in shared programs: 335393932 -> 335325794 (-0.02%) cycles in affected programs: 123834609 -> 123766471 (-0.06%) helped: 5034 HURT: 2128 helped stats (abs) min: 1 max: 3256 x̄: 43.39 x̃: 11 helped stats (rel) min: <.01% max: 35.79% x̄: 1.98% x̃: 1.00% HURT stats (abs) min: 1 max: 2634 x̄: 70.63 x̃: 16 HURT stats (rel) min: <.01% max: 49.49% x̄: 2.73% x̃: 0.62% 95% mean confidence interval for cycles value: -13.66 -5.37 95% mean confidence interval for cycles %-change: -0.69% -0.48% Cycles are helped. LOST: 0 GAINED: 2 Skylake total instructions in shared programs: 14949240 -> 14927930 (-0.14%) instructions in affected programs: 2594756 -> 2573446 (-0.82%) helped: 11000 HURT: 2 helped stats (abs) min: 1 max: 12 x̄: 1.94 x̃: 1 helped stats (rel) min: 0.07% max: 18.75% x̄: 1.39% x̃: 0.94% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 4.76% max: 4.76% x̄: 4.76% x̃: 4.76% 95% mean confidence interval for instructions value: -1.97 -1.91 95% mean confidence interval for instructions %-change: -1.42% -1.37% Instructions are helped. total cycles in shared programs: 324829346 -> 324821596 (<.01%) cycles in affected programs: 121566087 -> 121558337 (<.01%) helped: 4611 HURT: 2147 helped stats (abs) min: 1 max: 3715 x̄: 33.29 x̃: 10 helped stats (rel) min: <.01% max: 36.08% x̄: 1.94% x̃: 1.00% HURT stats (abs) min: 1 max: 2551 x̄: 67.88 x̃: 16 HURT stats (rel) min: <.01% max: 53.79% x̄: 3.69% x̃: 0.89% 95% mean confidence interval for cycles value: -4.25 1.96 95% mean confidence interval for cycles %-change: -0.28% -0.02% Inconclusive result (value mean confidence interval includes 0). Broadwell total instructions in shared programs: 14971203 -> 14949957 (-0.14%) instructions in affected programs: 2635699 -> 2614453 (-0.81%) helped: 10982 HURT: 2 helped stats (abs) min: 1 max: 12 x̄: 1.93 x̃: 1 helped stats (rel) min: 0.07% max: 18.75% x̄: 1.39% x̃: 0.94% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 4.76% max: 4.76% x̄: 4.76% x̃: 4.76% 95% mean confidence interval for instructions value: -1.97 -1.90 95% mean confidence interval for instructions %-change: -1.42% -1.37% Instructions are helped. total cycles in shared programs: 336215033 -> 336086458 (-0.04%) cycles in affected programs: 127383198 -> 127254623 (-0.10%) helped: 4884 HURT: 1963 helped stats (abs) min: 1 max: 25696 x̄: 51.78 x̃: 12 helped stats (rel) min: <.01% max: 58.28% x̄: 2.00% x̃: 1.05% HURT stats (abs) min: 1 max: 3401 x̄: 63.33 x̃: 16 HURT stats (rel) min: <.01% max: 39.95% x̄: 2.20% x̃: 0.70% 95% mean confidence interval for cycles value: -29.99 -7.57 95% mean confidence interval for cycles %-change: -0.89% -0.71% Cycles are helped. total fills in shared programs: 24905 -> 24901 (-0.02%) fills in affected programs: 117 -> 113 (-3.42%) helped: 4 HURT: 0 LOST: 0 GAINED: 16 Haswell total instructions in shared programs: 13148927 -> 13131528 (-0.13%) instructions in affected programs: 2220941 -> 2203542 (-0.78%) helped: 8017 HURT: 4 helped stats (abs) min: 1 max: 12 x̄: 2.17 x̃: 1 helped stats (rel) min: 0.07% max: 15.25% x̄: 1.40% x̃: 0.93% HURT stats (abs) min: 1 max: 7 x̄: 2.50 x̃: 1 HURT stats (rel) min: 0.33% max: 4.76% x̄: 2.73% x̃: 2.91% 95% mean confidence interval for instructions value: -2.21 -2.13 95% mean confidence interval for instructions %-change: -1.43% -1.37% Instructions are helped. total cycles in shared programs: 321221791 -> 321079870 (-0.04%) cycles in affected programs: 126886055 -> 126744134 (-0.11%) helped: 4674 HURT: 1729 helped stats (abs) min: 1 max: 23654 x̄: 56.47 x̃: 16 helped stats (rel) min: <.01% max: 53.22% x̄: 2.13% x̃: 1.05% HURT stats (abs) min: 1 max: 3694 x̄: 70.58 x̃: 18 HURT stats (rel) min: <.01% max: 63.06% x̄: 2.48% x̃: 0.90% 95% mean confidence interval for cycles value: -33.31 -11.02 95% mean confidence interval for cycles %-change: -0.99% -0.78% Cycles are helped. total spills in shared programs: 19872 -> 19874 (0.01%) spills in affected programs: 21 -> 23 (9.52%) helped: 0 HURT: 1 total fills in shared programs: 20941 -> 20941 (0.00%) fills in affected programs: 62 -> 62 (0.00%) helped: 1 HURT: 1 LOST: 0 GAINED: 8 Ivy Bridge total instructions in shared programs: 11875553 -> 11853839 (-0.18%) instructions in affected programs: 1553112 -> 1531398 (-1.40%) helped: 7304 HURT: 3 helped stats (abs) min: 1 max: 16 x̄: 2.97 x̃: 2 helped stats (rel) min: 0.07% max: 15.25% x̄: 1.62% x̃: 1.15% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.05% max: 3.33% x̄: 2.44% x̃: 2.94% 95% mean confidence interval for instructions value: -3.04 -2.90 95% mean confidence interval for instructions %-change: -1.65% -1.59% Instructions are helped. total cycles in shared programs: 178246425 -> 178184484 (-0.03%) cycles in affected programs: 13702146 -> 13640205 (-0.45%) helped: 4409 HURT: 1566 helped stats (abs) min: 1 max: 531 x̄: 24.52 x̃: 13 helped stats (rel) min: <.01% max: 38.67% x̄: 2.14% x̃: 1.02% HURT stats (abs) min: 1 max: 356 x̄: 29.48 x̃: 10 HURT stats (rel) min: <.01% max: 64.73% x̄: 1.87% x̃: 0.70% 95% mean confidence interval for cycles value: -11.60 -9.14 95% mean confidence interval for cycles %-change: -1.19% -0.99% Cycles are helped. LOST: 0 GAINED: 10 Sandy Bridge total instructions in shared programs: 10695740 -> 10667483 (-0.26%) instructions in affected programs: 2337607 -> 2309350 (-1.21%) helped: 10720 HURT: 1 helped stats (abs) min: 1 max: 49 x̄: 2.64 x̃: 2 helped stats (rel) min: 0.07% max: 20.00% x̄: 1.54% x̃: 1.13% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.04% max: 1.04% x̄: 1.04% x̃: 1.04% 95% mean confidence interval for instructions value: -2.69 -2.58 95% mean confidence interval for instructions %-change: -1.57% -1.51% Instructions are helped. total cycles in shared programs: 153478839 -> 153416223 (-0.04%) cycles in affected programs: 22050900 -> 21988284 (-0.28%) helped: 5342 HURT: 2200 helped stats (abs) min: 1 max: 1020 x̄: 20.34 x̃: 16 helped stats (rel) min: <.01% max: 24.05% x̄: 1.51% x̃: 0.86% HURT stats (abs) min: 1 max: 335 x̄: 20.93 x̃: 6 HURT stats (rel) min: <.01% max: 20.18% x̄: 1.03% x̃: 0.30% 95% mean confidence interval for cycles value: -9.18 -7.42 95% mean confidence interval for cycles %-change: -0.82% -0.71% Cycles are helped. Iron Lake total instructions in shared programs: 8114882 -> 8105574 (-0.11%) instructions in affected programs: 1232504 -> 1223196 (-0.76%) helped: 4109 HURT: 2 helped stats (abs) min: 1 max: 6 x̄: 2.27 x̃: 1 helped stats (rel) min: 0.05% max: 8.33% x̄: 0.99% x̃: 0.66% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.94% max: 4.35% x̄: 2.65% x̃: 2.65% 95% mean confidence interval for instructions value: -2.31 -2.21 95% mean confidence interval for instructions %-change: -1.01% -0.96% Instructions are helped. total cycles in shared programs: 188504036 -> 188466296 (-0.02%) cycles in affected programs: 31203798 -> 31166058 (-0.12%) helped: 3447 HURT: 36 helped stats (abs) min: 2 max: 92 x̄: 11.03 x̃: 8 helped stats (rel) min: <.01% max: 5.41% x̄: 0.21% x̃: 0.13% HURT stats (abs) min: 2 max: 30 x̄: 7.33 x̃: 6 HURT stats (rel) min: 0.01% max: 1.65% x̄: 0.18% x̃: 0.10% 95% mean confidence interval for cycles value: -11.16 -10.51 95% mean confidence interval for cycles %-change: -0.22% -0.20% Cycles are helped. LOST: 0 GAINED: 1 GM45 total instructions in shared programs: 4989697 -> 4984531 (-0.10%) instructions in affected programs: 703952 -> 698786 (-0.73%) helped: 2493 HURT: 2 helped stats (abs) min: 1 max: 6 x̄: 2.07 x̃: 1 helped stats (rel) min: 0.05% max: 8.33% x̄: 1.03% x̃: 0.66% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.95% max: 4.35% x̄: 2.65% x̃: 2.65% 95% mean confidence interval for instructions value: -2.13 -2.01 95% mean confidence interval for instructions %-change: -1.07% -0.99% Instructions are helped. total cycles in shared programs: 128929136 -> 128903886 (-0.02%) cycles in affected programs: 21583096 -> 21557846 (-0.12%) helped: 2214 HURT: 17 helped stats (abs) min: 2 max: 92 x̄: 11.44 x̃: 8 helped stats (rel) min: <.01% max: 5.41% x̄: 0.24% x̃: 0.13% HURT stats (abs) min: 2 max: 8 x̄: 4.24 x̃: 4 HURT stats (rel) min: 0.01% max: 1.65% x̄: 0.20% x̃: 0.09% 95% mean confidence interval for cycles value: -11.75 -10.88 95% mean confidence interval for cycles %-change: -0.25% -0.22% Cycles are helped. LOST: 1 GAINED: 1 Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359>
* intel/vec4: Allow late copy propagation on vec4Ian Romanick2020-04-011-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change incurs a small amount of hurt now, but it enables a lot of benefit on vec4 shaders on the next commit. nir_opt_algebraic_late converts dph, dot3, etc. to dhp_replicated, dot_replicated3, etc. In the process, it introduces extra moves. If the original NIR contained vec1 32 ssa_45 = fdot4 ssa_51, ssa_44 vec1 32 ssa_46 = fneg ssa_45 nir_opt_algebraic_late will produce vec4 32 ssa_18 = fdot_replicated4 ssa_1, ssa_15 vec1 32 ssa_19 = mov ssa_18.x vec1 32 ssa_17 = fneg ssa_19 The algebraic pass added in the next commit can't see through the move to know that the fneg applies to a fdot_replicated4. Haswell, Ivy Bridge, and Sandybridge had similar results. (Haswell shown) total cycles in shared programs: 187077604 -> 187079858 (<.01%) cycles in affected programs: 350132 -> 352386 (0.64%) helped: 174 HURT: 194 helped stats (abs) min: 2 max: 124 x̄: 23.60 x̃: 16 helped stats (rel) min: 0.12% max: 15.88% x̄: 4.98% x̃: 3.86% HURT stats (abs) min: 2 max: 164 x̄: 32.78 x̃: 16 HURT stats (rel) min: 0.17% max: 22.82% x̄: 6.46% x̃: 0.86% 95% mean confidence interval for cycles value: 2.04 10.21 95% mean confidence interval for cycles %-change: 0.17% 1.93% Cycles are HURT. No shader-db changes on any other Intel platform. Reviewed-by: Matt Turner <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359>
* isl: don't warn in physical extent calculation for yuv formatsLionel Landwerlin2020-03-312-2/+8
| | | | | | | | | | | | | | | Those format have correct descriptions already with the exception of the planar format. In that case we introduce an assert. This fine because we don't use the planar format in any of our drivers. There are restrictions on how the addresses of the 2 planes are relative to one another which make this annoying. The sampler is also more limited than what we can do with a shader snippet. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2999> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2999>
* isl: set bpb for Y8_UNORMLionel Landwerlin2020-03-311-1/+1
| | | | | | | | | | | | This isn't a format we use in any of the drivers but for consistency just give it a correct bpb. We also set the luminance in the G channel. We can't actually use this format with the 3D sampler (only media). Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2999>