summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* intel/fs: Allow cmod propagation to instructions with saturate modifierIan Romanick2019-05-142-9/+528
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | v2: Add unit tests. Suggested by Matt. All Intel GPUs had similar results. (Ice Lake shown) total instructions in shared programs: 17229441 -> 17228658 (<.01%) instructions in affected programs: 159574 -> 158791 (-0.49%) helped: 489 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.60 x̃: 1 helped stats (rel) min: 0.07% max: 2.70% x̄: 0.61% x̃: 0.59% 95% mean confidence interval for instructions value: -1.72 -1.48 95% mean confidence interval for instructions %-change: -0.64% -0.58% Instructions are helped. total cycles in shared programs: 360944149 -> 360937144 (<.01%) cycles in affected programs: 1072195 -> 1065190 (-0.65%) helped: 254 HURT: 27 helped stats (abs) min: 2 max: 234 x̄: 30.51 x̃: 9 helped stats (rel) min: 0.04% max: 8.99% x̄: 0.75% x̃: 0.24% HURT stats (abs) min: 2 max: 83 x̄: 27.56 x̃: 24 HURT stats (rel) min: 0.09% max: 3.79% x̄: 1.28% x̃: 1.16% 95% mean confidence interval for cycles value: -30.11 -19.75 95% mean confidence interval for cycles %-change: -0.70% -0.41% Cycles are helped. Reviewed-by: Matt Turner <[email protected]> [v1]
* intel/fs/ra: Spill without destroying the interference graphJason Ekstrand2019-05-141-13/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of re-building the interference graph every time we spill, we modify it in place so we can avoid recalculating liveness and the whole O(n^2) interference graph building process. We make a simplifying assumption in order to do so which is that all spill/fill temporary registers live for the entire duration of the instruction around which we're spilling. This isn't quite true because a spill into the source of an instruction doesn't need to interfere with its destination, for instance. Not re-calculating liveness also means that we aren't adjusting spill costs based on the new liveness. The combination of these things results in a bit of churn in spilling. It takes a large cut out of the run-time of shader-db on my laptop. Shader-db results on Kaby Lake: total instructions in shared programs: 15311224 -> 15311360 (<.01%) instructions in affected programs: 77027 -> 77163 (0.18%) helped: 11 HURT: 18 total cycles in shared programs: 355544739 -> 355830749 (0.08%) cycles in affected programs: 203273745 -> 203559755 (0.14%) helped: 234 HURT: 190 total spills in shared programs: 12049 -> 12042 (-0.06%) spills in affected programs: 2465 -> 2458 (-0.28%) helped: 9 HURT: 16 total fills in shared programs: 25112 -> 25165 (0.21%) fills in affected programs: 6819 -> 6872 (0.78%) helped: 11 HURT: 16 Total CPU time (seconds): 2469.68 -> 2360.22 (-4.43%) Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs/ra: Put the VGRFs at the end of the nodesJason Ekstrand2019-05-141-18/+32
| | | | | | | This is slightly less convenient in some places but it will make it much easier when we want to start adding nodes dynamically. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs/ra: Re-arrange interference setupJason Ekstrand2019-05-141-217/+205
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old code was arranged by the type of interference being added. It would set up payload registers and then add payload interference for all VGRFs. It would set up MRFs and add MRF interference for all VGRFs. This commit re-arranges things to be organized differently. It first creates and sets up all RA nodes and then groups interference into two new categories: live range and instruction interference. Once all the RA nodes have been set up, it walks the list of VGRFs and sets up their live range interference and then walks the list of instructions and sets up instruction interference. This new arrangement will be advantageous for a future patch but, at the moment, it cuts 2% off the run-time of shader-db on my laptop. Shader-db results on Kaby Lake: total instructions in shared programs: 15311224 -> 15311224 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 355544739 -> 355544739 (0.00%) cycles in affected programs: 0 -> 0 helped: 0 HURT: 0 Total CPU time (seconds): 2523.45 -> 2469.68 (-2.13%) Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs/ra: Do the spill loop inside RAJason Ekstrand2019-05-142-21/+28
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs/ra: Only add MRF hack interference if we're spillingJason Ekstrand2019-05-141-62/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | The only use of the MRF hack these days is for spilling and there we don't need the precise MRF usage information. If we're spilling then we know pretty well how many MRFs are going to be used. It is possible if the only things that are spilled have fewer SIMD channels than the dispatch width of the shader that this may be more MRFs than needed. That's a risk we're willing to takd. Shader-db results on Kaby Lake: total instructions in shared programs: 15311100 -> 15311224 (<.01%) instructions in affected programs: 16664 -> 16788 (0.74%) helped: 1 HURT: 5 total cycles in shared programs: 355543197 -> 355544739 (<.01%) cycles in affected programs: 731864 -> 733406 (0.21%) helped: 3 HURT: 6 The hurt shaders are all SIMD32 compute shaders where we reserve enough space for a 32-wide spill/fill but don't need it. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs/ra: Pull the guts of RA into its own classJason Ekstrand2019-05-142-76/+103
| | | | | | | | This accomplishes two things. First, it makes interfaces which are really private to RA private to RA. Second, it gives us a place to store some common stuff as we go through the algorithm. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs/ra: Move assign_regs further down in the fileJason Ekstrand2019-05-141-70/+70
| | | | | | | It's the main function from which all the other functions are called. It belongs at the bottom. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs/ra: Split building the interference graph into a helperJason Ekstrand2019-05-141-23/+42
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs/ra: Initialize grf_used with first_non_payload_grfJason Ekstrand2019-05-141-1/+1
| | | | | | | | | There's no reason why we need to use the calculated payload_node_count value which is just first_non_payload_grf aligned up. The grf_used value will be aligned up to 16 anyway (which is a much bigger alignment) before being handed off to hardware. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs/ra: Stop adding RA interference to too many SENDS nodesJason Ekstrand2019-05-141-8/+3
| | | | | | | | | | | | | | | | | | | | We only have one node per VGRF so this was adding way too much interference. No idea how we didn't catch this before. Shader-db results on Kaby Lake: total instructions in shared programs: 15311100 -> 15311100 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 355468050 -> 355543197 (0.02%) cycles in affected programs: 2472492 -> 2547639 (3.04%) helped: 17 HURT: 20 Fixes: 014edff0d20d "intel/fs: Add interference between SENDS sources" Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs/ra: Only add dest interference to sources that existJason Ekstrand2019-05-141-1/+1
| | | | | Fixes: 83dedb6354d "i965: Add src/dst interference for certain" Reviewed-by: Kenneth Graunke <[email protected]>
* intel/fs: Stop doing extra RA callsJason Ekstrand2019-05-141-19/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the last phase of the schedule and RA loop, the RA call is redundant if we spill. Immediately afterwards, we're going to see that we couldn't allocate without spilling and call back into RA and tell it to go ahead and spill. We've known about it for a while but we've always brushed over it on the theory that, if you're going to spill, you'll be calling RA a bunch anyway and what does one extra RA hurt? As it turns out, it hurts more than you'd expect. Because the RA interference graph gets sparser with each spill and the RA algorithm is more efficient on sparser graphs, the RA call that we're duplicating is actually the most expensive call in the RA-and-spill loop. There's another extra RA call we do that's a bit harder to see which this also removes. If we try to compile a shader that isn't the minimum dispatch width and it fails to allocate without spilling we call fail() to set an error but then go ahead and do the first spilling RA pass and only after that's complete do we detect the fail and bail out. By making minimum dispatch widths part of the spill condition, we side-step this problem. Getting rid of these extra spills takes the compile time of a nasty Aztec Ruins shader from about 28 seconds to about 26 seconds on my laptop. It also makes shader-db 1.5% faster Shader-db results on Kaby Lake: total instructions in shared programs: 15311100 -> 15311100 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 355468050 -> 355468050 (0.00%) cycles in affected programs: 0 -> 0 helped: 0 HURT: 0 Total CPU time (seconds): 2524.31 -> 2486.63 (-1.49%) Reviewed-by: Kenneth Graunke <[email protected]>
* isl: Add restrictions to isl_surf_get_hiz_surf()Nanley Chery2019-05-141-0/+25
| | | | | | | Import some restrictions from intel_tiling_supports_hiz() and intel_miptree_supports_hiz(). Reviewed-by: Rafael Antognolli <[email protected]>
* isl: Add restriction and comments to isl_surf_get_ccs_surf()Nanley Chery2019-05-141-1/+17
| | | | | | Import some restrictions and comments from intel_miptree_supports_ccs(). Reviewed-by: Rafael Antognolli <[email protected]>
* isl: Modify restrictions in isl_surf_get_mcs_surf()Nanley Chery2019-05-141-5/+19
| | | | | | | Import some restrictions from intel_miptree_supports_mcs() and don't assume that the caller knows which device generations are supported. Reviewed-by: Rafael Antognolli <[email protected]>
* anv: Implement VK_KHR_uniform_buffer_standard_layoutJason Ekstrand2019-05-132-0/+8
| | | | | | | | There's no real work to do here since we already support scalar block layout which is a direct superset of what this extension allows. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Lionel Landwerlin <[email protected]>
* intel/tools: Fix build with glibc < 2.27.Vinson Lee2019-05-131-0/+3
| | | | | | | | | | | | | | | | | | | | | glibc < 2.27 defines OVERFLOW in /usr/include/math.h. This patch fixes this build error. In file included from ../include/c99_math.h:37:0, from ../src/util/u_math.h:44, from ../src/mesa/main/macros.h:35, from ../src/intel/compiler/brw_reg.h:47, from ../src/intel/tools/i965_asm.h:32, from ../src/intel/tools/i965_gram.y:29: src/intel/tools/i965_gram.tab.c:562:5: error: expected identifier before numeric constant OVERFLOW = 412, ^ Fixes: 70308a5a8a80 ("intel/tools: New i965 instruction assembler tool") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110656 Signed-off-by: Vinson Lee <[email protected]> Acked-by: Eric Engestrom <[email protected]>
* intel: drop misleading driver name from gen_get_device_info()Mike Blumenkrantz2019-05-111-1/+1
|
* anv: Fix limits when VK_EXT_descriptor_indexing is usedCaio Marcelo de Oliveira Filho2019-05-101-9/+14
| | | | | | | | | | | | | | | | | | | | | | Update various limits in VkPhysicalDeviceDescriptorIndexingPropertiesEXT that were previously zero to their values from VkPhysicalDeviceLimits. When using VK_EXT_descriptor_indexing, the former limits will apply to all the descriptor layout sets -- not only those using the new feature bits. For the reference, VK_EXT_descriptor_indexing says "There are new descriptor set layout and descriptor pool creation flags that are required to opt in to the update-after-bind functionality, and there are separate maxPerStage* and maxDescriptorSet* limits that apply to these descriptor set layouts which may be much higher than the pre-existing limits. The old limits only count descriptors in non-updateAfterBind descriptor set layouts, and the new limits count descriptors in all descriptor set layouts in the pipeline layout." Fixes: 6e230d7607f "anv: Implement VK_EXT_descriptor_indexing" Reviewed-by: Jason Ekstrand <[email protected]>
* nir: allow specifying a set of opcodes in lower_alu_to_scalarJonathan Marek2019-05-101-2/+2
| | | | | | | | | This can be used by both etnaviv and freedreno/a2xx as they are both vec4 architectures with some instructions being scalar-only. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* intel/fs/copy-prop: Don't walk all the ACPs for each instructionJason Ekstrand2019-05-101-11/+68
| | | | | | | | | | | | | | | | | | | | | In order to set up KILL sets, the dataflow code was walking the entire array of ACPs for every instruction. If you assume the number of ACPs increases roughly with the number of instructions, this is O(n^2). As it turns out, regions_overlap() is not nearly as cheap as one would like and shows up as a significant chunk on perf traces. This commit changes things around and instead first builds an array of exec_lists which it uses like a hash table (keyed off ACP source or destination) similar to what's done in the rest of the copy-prop code. By first walking the list of ACPs and populating the table and then walking instructions and only looking at ACPs which probably have the same VGRF number, we can reduce the complexity to O(n). This takes the execution time of the piglit vs-isnan-dvec test from about 56.4 seconds on an unoptimized debug build (what we run in CI) with NIR_VALIDATE=0 to about 38.7 seconds. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/fs/copy-prop: Purge unused ACPsJason Ekstrand2019-05-101-0/+19
| | | | | | | | | | | | | | If the destination of an ACP entry exists only within this block, then there's no need to keep it for dataflow analysis. We can delete it from the out_acp table and avoid growing the bitsets any bigger than we absolutely have to. This reduces the maximum number of global ACP entries in the vs-isnan-dvec with software fp64 on Kaby Lake from 8630 to 3942 and takes the execution time of the piglit vs-isnan-dvec test from about 1:16.2 on an unoptimized debug build (what we run in CI) with NIR_VALIDATE=0 to about 56.4 seconds. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/fs/copy-prop: Bump the hash table size to 64Jason Ekstrand2019-05-101-1/+1
| | | | | | | | | | While the number of ACPs is generally not huge compared to the number of blocks, 16 does seem a bit small. Bumping it to 64 takes the execution time of the piglit vs-isnan-dvec test from about 1:18.1 on an unoptimized debug build (what we run in CI) with NIR_VALIDATE=0 to about 1:16.2. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* anv: Remove special allocation for anv_push_constantsCaio Marcelo de Oliveira Filho2019-05-093-79/+7
| | | | | | | | | | | | | The key reason for that mechanism is gone: all the extra optional data that could be in the anv_push_constants was moved elsewhere. At this point, just put anv_push_constants directly in anv_cmd_state (part of anv_cmd_buffer). v2: Remove a NULL check we don't need anymore in anv_cmd_buffer_push_constants(). (Lionel) Fix size we consider for valid push params. (Lionel) Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Use corresponding type from the vector allocationLionel Landwerlin2019-05-092-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We didn't notice this issue much because the 2 struct share a similar layout, expect for the additional fields... We run into that issue in Anv : ==15236== Invalid write of size 8 ==15236== at 0x8CF3939C: anv_state_table_expand_range (anv_allocator.c:211) ==15236== by 0x8CF394D5: anv_state_table_grow (anv_allocator.c:264) ==15236== by 0x8CF3967E: anv_state_table_add (anv_allocator.c:312) ==15236== by 0x8CF3B13C: anv_state_pool_alloc_no_vg (anv_allocator.c:1167) ==15236== by 0x8CF3B2B0: anv_state_pool_alloc (anv_allocator.c:1190) ==15236== by 0x8CF60871: alloc_surface_state (anv_image.c:1122) ==15236== by 0x8CF61FF9: anv_CreateImageView (anv_image.c:1519) ==15236== by 0x8BCBD2ED: vkCreateImageView (trampoline.c:1358) ==15236== Address 0x8994ef10 is 0 bytes after a block of size 128 alloc'd ==15236== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==15236== by 0x8D2578E6: u_vector_init (u_vector.c:47) ==15236== by 0x8CF3929A: anv_state_table_init (anv_allocator.c:168) ==15236== by 0x8CF3A99A: anv_state_pool_init (anv_allocator.c:921) ==15236== by 0x8CF56517: anv_CreateDevice (anv_device.c:1909) ==15236== by 0x8BCB4FBA: terminator_CreateDevice (loader.c:6073) ==15236== by 0x8DD2CB3D: ??? (in /home/djdeath/.steam/ubuntu12_64/libVkLayer_steam_fossilize.so) ==15236== by 0x8DF4D241: vkCreateDevice (in /home/djdeath/.steam/ubuntu12_64/steamoverlayvulkanlayer.so) ==15236== by 0x8BCB35C6: loader_create_device_chain (loader.c:5449) ==15236== by 0x8BCBC230: vkCreateDevice (trampoline.c:838) v2: Rename mmap_cleanups to avoid confusion (Caio) v3: s/fail_mmap_cleanups/fail_cleanups/ (Caio) Signed-off-by: Lionel Landwerlin <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110648 Cc: <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* i965_asm: avoid free()ing uninitialized pointersEric Engestrom2019-05-091-1/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* i965_asm: fix memleakEric Engestrom2019-05-091-0/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: fix use after freeLionel Landwerlin2019-05-081-3/+3
| | | | | | | | Once mem->bo is removed from the cache, it is likely to be freed. Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: b80930a6fea075 ("anv: add support for VK_EXT_memory_budget") Reviewed-by: Eric Engestrom <[email protected]>
* anv: rework queries writes to ensure ordering memory writesLionel Landwerlin2019-05-081-17/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We use a mix of MI & PIPE_CONTROL commands to write our queries' data (results & availability). Those commands' memory write order is not guaranteed with regard to their order in the command stream, unless CS stalls are inserted between them. This is problematic for 2 reasons : 1. We copy results from the device using MI commands even though the values are generated from PIPE_CONTROL, meaning we could copy unlanded values into the results and then copy the availability that is inconsistent with the values. 2. We allow the user to poll on the availability values of the query pool from the CPU. If the availability lands in memory before the values then we could return invalid values. This change does 2 things to address this problem : - We use either PIPE_CONTROL or MI commands to write both queries values and availability, so that the ordering of the memory writes guarantees that if availability is visible, results are also visible. - For the occlusion & timestamp queries we apply a CS stall before copying the results on the device, to ensure copying with MI commands see the correct values of previous PIPE_CONTROL writes of availability (required by the Vulkan spec). Signed-off-by: Lionel Landwerlin <[email protected]> Reported-by: Iago Toral Quiroga <[email protected]> Cc: [email protected] Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: Unset flag reg when FB write is not predicatedMatt Turner2019-05-071-0/+1
| | | | | | | | | | | In the FS IR we pretend that the instruction is predicated with (+f0.1) just for flag dependency tracking purposes. Since the instruction doesn't support predication before Haswell, we unset the predicate so we should also unset the flag register so that we can round-trip the disassembly. Reviewed-by: Sagar Ghuge <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/disasm: Disassemble immediate value properly for dimSagar Ghuge2019-05-071-3/+12
| | | | | | | | | | On haswell, for dim instruction we encode immediate float value operand into double float, v2: Fix comment (Matt Turner) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/disasm: Disassemble JIP offset for whileSagar Ghuge2019-05-071-1/+2
| | | | | | Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/compiler: Replicate 16 bit immediate value correctlySagar Ghuge2019-05-072-0/+6
| | | | | | | | | | | | For the W or UW (signed or unsigned word) source types, the 16-bit value must be replicated in both the low and high words of the 32-bit immediate value. v2: Fix replication in other places as well V3: fix a few nits (Matt Turner) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/compiler: Print quad value in hex formatSagar Ghuge2019-05-071-1/+1
| | | | | | | | | | Print quad value same as unsigned quad so that we can distinguish in between quater control disassembled values for e.g 1/2/3[Q] and immediate quad value for e.g 1Q. This allows round-tripping through the assembler/disassembler. Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/tools: Add unit tests for assemblerSagar Ghuge2019-05-07594-0/+28756
| | | | | | | | | v1: Pass executable object from meson to test(Dylan Baker) v2: Ignore generated output files from git status(Matt Turner) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* intel/tools: Initialize offset correctly for i965_asmMika Kuoppala2019-05-071-10/+7
| | | | | | | | | | If we leave offset uninitialized, access to store will be random depending on stack value and can segfault. Signed-off-by: Mika Kuoppala <[email protected]> Reviewed-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/tools: Add meson pthread dependancy for i965_asmMika Kuoppala2019-05-071-0/+1
| | | | | | Signed-off-by: Mika Kuoppala <[email protected]> Reviewed-by: Sagar Ghuge <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/tools: New i965 instruction assembler toolSagar Ghuge2019-05-075-0/+3040
| | | | | | | | | | | | | | Tool is inspired from igt's assembler tool. Thanks to Matt Turner, who mentored me through out this project. v2: Fix memory leaks and naming convention (Caio) v3: Fix meson changes (Dylan Baker) v4: Fix usage options (Matt Turner) Signed-off-by: Sagar Ghuge <[email protected]> Reviewed-by: Dylan Baker <[email protected]> Reviewed-by: Matt Turner <[email protected]> Closes: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/141
* anv: fix alphaToCoverage when there is no color attachmentSamuel Iglesias Gonsálvez2019-05-071-10/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are tests in CTS for alpha to coverage without a color attachment that are failing. This happens because we remove the shader color outputs when we don't have a valid color attachment for them, but when alpha to coverage is enabled we still want to preserve the the output at location 0 since we need the alpha component. In that case we will also need to create a null render target for RT 0. v2: - We already create a null rt when we don't have any, so reuse that for this case (Jason) - Simplify the code a bit (Iago) v3: - Take alpha to coverage from the key and don't tie this to depth-only rendering only, we want the same behavior if we have multiple render targets but the one at location 0 is not used. (Jason). - Rewrite commit message (Iago) v4: - Make sure we take into account the array length of the shader outputs, which we were no handling correctly either and make sure we also create null render targets for any invalid array entries too. v5: - Simplify removal of unused outputs by using rt_used[] so we don't have to special case alpha to coverage there too. Fixes the following CTS tests: dEQP-VK.pipeline.multisample.alpha_to_coverage_no_color_attachment.* Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: Don't always require precise lowering of flrpIan Romanick2019-05-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No changes on any other Intel platforms. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8164367 -> 8135551 (-0.35%) instructions in affected programs: 3271235 -> 3242419 (-0.88%) helped: 13636 HURT: 90 helped stats (abs) min: 1 max: 30 x̄: 2.13 x̃: 1 helped stats (rel) min: 0.04% max: 10.77% x̄: 1.16% x̃: 0.97% HURT stats (abs) min: 1 max: 4 x̄: 1.80 x̃: 2 HURT stats (rel) min: 0.26% max: 11.11% x̄: 1.76% x̃: 0.78% 95% mean confidence interval for instructions value: -2.13 -2.07 95% mean confidence interval for instructions %-change: -1.16% -1.13% Instructions are helped. total cycles in shared programs: 188719974 -> 188586222 (-0.07%) cycles in affected programs: 70415766 -> 70282014 (-0.19%) helped: 12563 HURT: 515 helped stats (abs) min: 2 max: 600 x̄: 10.90 x̃: 6 helped stats (rel) min: <.01% max: 5.48% x̄: 0.48% x̃: 0.27% HURT stats (abs) min: 2 max: 54 x̄: 6.07 x̃: 4 HURT stats (rel) min: 0.01% max: 4.48% x̄: 0.24% x̃: 0.08% 95% mean confidence interval for cycles value: -10.56 -9.90 95% mean confidence interval for cycles %-change: -0.47% -0.45% Cycles are helped. LOST: 0 GAINED: 13 Reviewed-by: Matt Turner <[email protected]>
* intel/compiler: Use the flrp lowering pass for all stages on Gen4 and Gen5Ian Romanick2019-05-061-10/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously lower_flrp32 was only set for vertex shaders. Fragment shaders performed a(1-c)+bc lowering during code generation. The shaders with loops hurt are SIMD8 and SIMD16 shaders for a text-identical fragment shader. v2: Rebase on 26391cceaa1 ("intel/compiler: Lower ffma on Gen4 and Gen5"). v3: Rebase on a004e95dd73 ("radeonsi/nir: create si_nir_opts() helper") Iron Lake total instructions in shared programs: 8211385 -> 8185974 (-0.31%) instructions in affected programs: 2503898 -> 2478487 (-1.01%) helped: 9936 HURT: 921 helped stats (abs) min: 1 max: 155 x̄: 2.86 x̃: 2 helped stats (rel) min: 0.10% max: 35.48% x̄: 1.67% x̃: 1.11% HURT stats (abs) min: 1 max: 12 x̄: 3.24 x̃: 2 HURT stats (rel) min: 0.21% max: 13.64% x̄: 1.86% x̃: 0.89% 95% mean confidence interval for instructions value: -2.43 -2.25 95% mean confidence interval for instructions %-change: -1.41% -1.33% Instructions are helped. total cycles in shared programs: 188523186 -> 188401198 (-0.06%) cycles in affected programs: 71541604 -> 71419616 (-0.17%) helped: 11649 HURT: 1871 helped stats (abs) min: 2 max: 930 x̄: 12.62 x̃: 6 helped stats (rel) min: <.01% max: 44.61% x̄: 0.68% x̃: 0.25% HURT stats (abs) min: 2 max: 138 x̄: 13.38 x̃: 8 HURT stats (rel) min: <.01% max: 10.99% x̄: 0.49% x̃: 0.17% 95% mean confidence interval for cycles value: -9.42 -8.63 95% mean confidence interval for cycles %-change: -0.54% -0.50% Cycles are helped. total loops in shared programs: 852 -> 856 (0.47%) loops in affected programs: 0 -> 4 helped: 0 HURT: 4 HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.00% max: 0.00% x̄: 0.00% x̃: 0.00% 95% mean confidence interval for loops value: 1.00 1.00 95% mean confidence interval for loops %-change: 0.00% 0.00% Loops are HURT. LOST: 3 GAINED: 12 GM45 total instructions in shared programs: 5046407 -> 5033694 (-0.25%) instructions in affected programs: 1303584 -> 1290871 (-0.98%) helped: 5010 HURT: 464 helped stats (abs) min: 1 max: 155 x̄: 2.85 x̃: 2 helped stats (rel) min: 0.10% max: 34.38% x̄: 1.63% x̃: 1.08% HURT stats (abs) min: 1 max: 75 x̄: 3.39 x̃: 2 HURT stats (rel) min: 0.20% max: 13.04% x̄: 1.84% x̃: 0.87% 95% mean confidence interval for instructions value: -2.45 -2.20 95% mean confidence interval for instructions %-change: -1.40% -1.28% Instructions are helped. total cycles in shared programs: 128889476 -> 128812366 (-0.06%) cycles in affected programs: 44845402 -> 44768292 (-0.17%) helped: 6079 HURT: 940 helped stats (abs) min: 2 max: 930 x̄: 15.16 x̃: 8 helped stats (rel) min: <.01% max: 41.03% x̄: 0.71% x̃: 0.25% HURT stats (abs) min: 2 max: 138 x̄: 16.01 x̃: 8 HURT stats (rel) min: <.01% max: 10.99% x̄: 0.50% x̃: 0.17% 95% mean confidence interval for cycles value: -11.63 -10.34 95% mean confidence interval for cycles %-change: -0.58% -0.52% Cycles are helped. total loops in shared programs: 633 -> 635 (0.32%) loops in affected programs: 0 -> 2 helped: 0 HURT: 2 total spills in shared programs: 60 -> 69 (15.00%) spills in affected programs: 54 -> 63 (16.67%) helped: 0 HURT: 1 total fills in shared programs: 92 -> 105 (14.13%) fills in affected programs: 80 -> 93 (16.25%) helped: 0 HURT: 1 LOST: 15 GAINED: 15 Reviewed-by: Jason Ekstrand <[email protected]> [v2] Reviewed-by: Matt Turner <[email protected]> [v2]
* nir: Use the flrp lowering pass instead of nir_opt_algebraicIan Romanick2019-05-061-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | I tried to be very careful while updating all the various drivers, but I don't have any of that hardware for testing. :( i965 is the only platform that sets always_precise = true, and it is only set true for fragment shaders. Gen4 and Gen5 both set lower_flrp32 only for vertex shaders. For fragment shaders, nir_op_flrp is lowered during code generation as a(1-c)+bc. On all other platforms 64-bit nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old nir_opt_algebraic method. No changes on any other Intel platforms. v2: Add panfrost changes. Iron Lake and GM45 had similar results. (Iron Lake shown) total cycles in shared programs: 188647754 -> 188647748 (<.01%) cycles in affected programs: 5096 -> 5090 (-0.12%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% Reviewed-by: Matt Turner <[email protected]>
* nir: nir_shader_compiler_options: drop native_integersChristian Gmeiner2019-05-071-1/+0
| | | | | | | | Driver which do not support native integers should use a lowering pass to go from integers to floats. Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv,i965: Stop warning about incomplete gen11 supportJason Ekstrand2019-05-031-4/+2
| | | | | | | | Both drivers are feature-complete and should be running more-or-less at perf at this point. Drop the warning. Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* anv: fix crash when application does not provide push constantsLionel Landwerlin2019-05-031-1/+1
| | | | | | | | | | | | Found while running Talos Principle. As far as I can tell running a draw call with a pipeline having push constants without the application having called vkCmdPushConstants gives undefined push constant values. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Cc: [email protected]
* intel/fs: Assert when brw_fs_nir sees a nir_deref_instrCaio Marcelo de Oliveira Filho2019-05-021-1/+1
| | | | | | | | | | | Since 09f1de97a76 "anv,i965: Lower away image derefs in the driver" the backend compiler is not expected to handle any derefs, so let's assert on it. This helps identifying problems when a deref is not lowered and "leaks" into the backend compiler. Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Stop including POS in FS input limitsJason Ekstrand2019-05-021-1/+1
| | | | | | | It is an input but it comes in as part of the shader payload and doesn't count towards the limits. Reviewed-by: Kenneth Graunke <[email protected]>
* anv: add support for VK_EXT_memory_budgetEric Engestrom2019-04-303-0/+92
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: enable descriptor indexing capabilitiesJuan A. Suarez Romero2019-04-301-0/+2
| | | | | | | | | This enables the remaining capabilities in SPV_EXT_descriptor_indexing. Fixes: 6e230d7607f "anv: Implement VK_EXT_descriptor_indexing" Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>