summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* intel/blorp: Make blorp update the clear color in gen11.Rafael Antognolli2019-04-291-2/+38
| | | | | | | | | | Hardware docs say that Gen11 requires the use of two MI_ATOMICs of size QWORD when updating the clear color. The second MI_ATOMIC also needs CS Stall and Return Data Control set. v2: Remove include of srgb header (Lionel) Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Update MI_ATOMIC genxml definition.Rafael Antognolli2019-04-293-15/+117
| | | | | | | Change some of the single bit fields to booleans, and add an enum with the definition of the ATOMIC_OPCODE. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Support base-16 in value & start fields in gen_sort_tags.pyJordan Justen2019-04-291-2/+2
| | | | | | | | With python's int(), if the optional second parameter is 0, then python will support the 0x prefix for hex numbers. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* isl: Set ClearColorConversionEnable.Plamena Manolova2019-04-291-0/+21
| | | | | | | | | The ClearColorConversionEnable bit needs to be set for GEN11 when inderect clear colors are used. Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* delete autotools .gitignore filesEric Engestrom2019-04-297-30/+0
| | | | | | | | One special case, `src/util/xmlpool/.gitignore` is not entirely deleted, as `xmlpool.pot` still gets generated (eg. by `ninja xmlpool-pot`). Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* Revert "anv: limit URB reconfigurations when using blorp"Lionel Landwerlin2019-04-293-11/+3
| | | | | | | | | | | | | | | | | | | In commit 0d46e404 ("anv: limit URB reconfigurations when using blorp") we tried to limit the number of URB reconfiguration by checking if the last allocation is large enough to fit the blorp dispatch. We used the last bound pipeline to compare the allocation. The problem with this is that the pipeline is bound but its commands might not have been emitted into the command buffer yet. Let's just revert commit 0d46e404677264bfb12ada15290e39c10a5eb455 since it didn't seem to yield any performance improvement. Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: 0d46e404 ("anv: limit URB reconfigurations when using blorp") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110535 Acked-by: Jason Ekstrand <[email protected]>
* intel/fs: Don't emit empty ELSE blocks.Kenneth Graunke2019-04-281-4/+4
| | | | | | | | While we can clean this up later, it's trivial to not generate the stupid code in the first place, which saves some optimization work. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* anv: expose VK_EXT_queue_family_foreign on AndroidTapani Pälli2019-04-291-0/+1
| | | | | | | | | | | | | | VK_ANDROID_external_memory_android_hardware_buffer requires this extension. It is safe to enable it since currently aux usage is disabled for ahw buffers. Fixes following dEQP extension dependency test on Android: dEQP-VK.api.info.device#extensions Cc: <[email protected]> Signed-off-by: Tapani Pälli <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/descriptor_set: Don't fully destroy sets in pool destroy/resetJason Ekstrand2019-04-261-2/+3
| | | | | | | | | | | | | | | | | | | In 105002bd2d617, we fixed a memory leak bug where we weren't properly destroying descriptor when destroying/resetting a descriptor pool. However, the only real leak that happened was that we we take a reference to the descriptor set layout in the descriptor set and we weren't dropping our reference. Everything else in the descriptor set is tied to the pool itself and doesn't need to be freed on a per-set basis. This commit changes the destroy/reset functions to only bother walking the list of sets to unref the layouts and otherwise we just assume that the whole-pool destroy/reset takes care of the rest. Now that we're doing more non-trivial things with descriptor sets such as allocating things with util_vma_heap, per-set destruction is starting to show up on perf traces. This takes reset back to where it's supposed to be as a cheap whole-pool operation. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Better handle 32-byte alignment of descriptor set buffersJason Ekstrand2019-04-261-3/+3
| | | | | | | | | | | | | | | | | | | In c520f4dec9c, we chose to align the sizes of descriptor set buffers to 32 bytes. We have to align the descriptor set buffer to 32B so that it's valid for using with push constants. We align the size as well so we don't leave lots of holes with util_vma_heap_alloc. Unfortunately, we were only aligning it for alloc and not for free so we were still creating piles of holes when we delete descriptor sets. This causes terrible perf for the allocator once we've deleted piles of descriptor sets. This commit reworks the code so that we align the descriptor set buffer size to 32B for both alloc and free. The result is that it takes the new crucible vkResetDescriptorPool from 104.567719 to 2.898354 seconds. Fixes: c520f4dec9c "anv: Add a concept of a descriptor buffer" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110497 Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/fs: Don't handle texop_tex for shaders without implicit LODCaio Marcelo de Oliveira Filho2019-04-252-6/+2
| | | | | | | | | These will be lowered by nir_lower_tex() with the lower_tex_when_implicit_lod_not_supported, so don't need the extra handling here. Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/compiler/fs/icl: Use dummy masked urb write for tess evalTopi Pohjolainen2019-04-251-1/+50
| | | | | | | | | | | One cannot write the URB arbitrarily and therefore the message has to be carefully constructed. The clever tricks originate from Kenneth and Jason, I'm just writing the patch. Fixes GPU hangs on ICL with Vulkan CTS. Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* Revert "intel/compiler: split is_partial_write() into two variants"Juan A. Suarez Romero2019-04-2511-54/+30
| | | | | | | | | | This reverts commit 40b3abb4d16af4cef0307e1b4904c2ec0924299e. It is not clear that this commit was entirely correct, and unfortunately it was pushed by error. CC: Jason Ekstrand <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
* i965: fix icelake performance query enablingLionel Landwerlin2019-04-251-0/+2
| | | | | | | | | This was a rebase issue which lost of change to a file moved from i965 to src/intel/perf. Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: 134e750e16bfc5 ("i965: extract performance query metrics") Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: fix uninit non-static variable. (v2)Dave Airlie2019-04-251-0/+3
| | | | | | | | Pointed out by coverity. v2: init nir_locals also. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/isl: Resize clear color buffer to full cachelineRafael Antognolli2019-04-241-1/+2
| | | | | | | | | | | Fixes MCS fast clear gpu hangs with Vulkan CTS on ICL in CI. v2 (Nanley): In the title s/Align/Resize/ Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Nanley Chery <[email protected]> Tested-by: Topi Pohjolainen <[email protected]> Signed-off-by: Rafael Antognolli <[email protected]>
* anv/descriptor_set: Properly align descriptor buffer to a pageJason Ekstrand2019-04-241-1/+1
| | | | | | | | | Instead of aligning and then taking inline uniforms into account, we need to take inline uniforms into account and then align to a page. Otherwise, we may not be aligned to a page and allocation may fail. Fixes: 43f40dc7cb2 "anv: Implement VK_EXT_inline_uniform_block" Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/descriptor_set: Only vma_heap_finish if we have a descriptor bufferJason Ekstrand2019-04-241-2/+1
| | | | | Fixes: 7bb34ecff98 "anv: release memory allocated by bo_heap when..." Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/descriptor_set: Destroy sets before pool finalizationJason Ekstrand2019-04-241-5/+5
| | | | | Fixes: 105002bd2d "anv: destroy descriptor sets when pool gets..." Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/descriptor_set: Unlink sets from the pool in set_destroyJason Ekstrand2019-04-241-4/+4
| | | | | | | | | | anv_descriptor_pool_free_set is called on the clean-up path of anv_descriptor_set_create and the set may not have been added to the pool's list of sets yet. While we're here, we move adding it to that list into set_create for symmetry. Fixes: 105002bd2d "anv: destroy descriptor sets when pool gets..." Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/fs: Fix D to W conversion in opt_combine_constantsIan Romanick2019-04-231-1/+1
| | | | | | | | | | | | | | | | | Found by GCC warning: src/intel/compiler/brw_fs_combine_constants.cpp: In function ‘bool needs_negate(const fs_reg*, const imm*)’: src/intel/compiler/brw_fs_combine_constants.cpp:306:34: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits] return ((reg->d & 0xffffu) < 0) != (imm->w < 0); ~~~~~~~~~~~~~~~~~~~^~~ The result of the bit-and is a 32-bit value with the top bits all zero. This will never be < 0. Instead of masking off the bits, just cast to int16_t and let the compiler handle the actual conversion. Fixes: e64be391dd0 ("intel/compiler: generalize the combine constants pass") Cc: Iago Toral Quiroga <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: Lower ffma on Gen4 and Gen5Ian Romanick2019-04-231-0/+4
| | | | | | | | | | | flrp32 is also a 3-source instruction, but there is another pending series that handles that for Gen4 and Gen5. v2: Rebase on "intel/compiler: Don't have sepearate, per-Gen nir_options" Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* intel/compiler: Don't have sepearate, per-Gen nir_optionsIan Romanick2019-04-231-31/+11
| | | | | | | | | Instead, just have separate scalar vs. vector nir_options and do per-Gen "fix ups". Suggested-by: Jason Ekstrand <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* vulkan/wsi: Add X11 adaptive sync support based on dri options.Bas Nieuwenhuizen2019-04-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | The dri options are optional. When the dri options are not provided the WSI will not use adaptive sync. FWIW I think for xf86-video-amdgpu this still requires an X11 config option, so only people who opt in can get possible regressions from this. So then the remaining question is: why do this in the WSI? It has been suggested in another MR that the application sets this. However, I disagree with that as I don't think we'll ever get a reasonable set of applications setting it. The next questions is whether this can be a layer. It definitely can be as implemented now. However, I think this generally fits well with the function of the WSI. Furthemore, for e.g. the DISPLAY WSI this is much harder to do in a layer. Of course, most of the WSI could almost be a layer, but I think this still fits best in the WSI. Acked-by: Jason Ekstrand <[email protected]>
* anv: fix argument name for vkCmdEndQueryLionel Landwerlin2019-04-241-2/+2
| | | | | | | | | | Doesn't fix anything but it's not the right function prototype. Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: 673f33c77dd765 ("anv: Implement CmdBegin/EndQueryIndexed") Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Sagar Ghuge <[email protected]>
* intel: workaround VS fixed function issue on Gen9 GT1 partsLionel Landwerlin2019-04-231-0/+12
| | | | | | | | | | | | The issue is noticeable in the dEQP-GLES31.functional.geometry_shading.layered.render_with_default_layer_3d test where a triangle goes missing when we use the maximum number of URB entries as specified by the documentation. Signed-off-by: Lionel Landwerlin <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107505 Reviewed-by: Kristian H. Kristensen <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* intel/compiler: Improve fix_3src_operand()Matt Turner2019-04-221-5/+18
| | | | | | | | | | | | | | | | | | Allow ATTR and IMM sources unconditionally (ATTR are just GRFs, IMM will be handled by opt_combine_constants(). Both are already allowed by opt_copy_propagation(). Also allow FIXED_GRF if the regioning is 8,8,1. Could also allow other stride=1 regions (e.g., 4,4,1) and scalar regions but I don't think those occur. This is sufficient to allow a pass added in a future commit (fs_visitor::lower_linterp) to avoid emitting extra MOV instructions. I removed the 'src.stride > 1' case because it seems wrong: 3-src instructions on Gen6-9 are align16-only and can only do stride=1 or stride=0. A run through Jenkins with an assert(src.stride <= 1) never triggers, so it seems that it was dead code. Reviewed-by: Rafael Antognolli <[email protected]>
* intel/compiler: Add unit tests for sat prop for different exec sizesMatt Turner2019-04-221-0/+68
| | | | | | | The two new unit tests verify that propagating a saturate between instructions of different exec sizes does not happen. Reviewed-by: Rafael Antognolli <[email protected]>
* intel/compiler: Use SIMD16 instructions in fs saturate prop unit testMatt Turner2019-04-221-59/+59
| | | | | | | | | | | | Will allow us to test that propagation between instructions of different exec sizes does not happen (in the next commit). The stray-looking change in intervening_dest_write is to adjust the size of the texture result to keep the test functioning identically when the instructions' exec sizes are doubled. Without the change, the texture does not overwrite the destination fully as the unit test intends. Reviewed-by: Rafael Antognolli <[email protected]>
* intel/fs: Remove fs_generator::generate_linterp from gen11+.Rafael Antognolli2019-04-221-44/+6
| | | | | | | | | We now have a lowering pass that will do this at the fs_visitor level, so we can remove this code from gen11+. v2: Reduce size of the "i" array from 4 to 2 (Matt). Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Add a lowering pass for linear interpolation.Rafael Antognolli2019-04-222-0/+47
| | | | | | | | | | | | | | | | On gen11, instead of using a PLN instruction, we convert FS_OPCODE_LINTERP to 2 or 4 multiply adds. That is done in the fs_generator code. This patch adds a lowering pass that does the same thing at the fs_visitor. It also drops the usage of NF types, since we don't need the extra precision and it lets us skip the accumulator. With all that, some optimizations will still be run on the generated code, and we should get better scheduling. v2: Update comment about saturation and conditional mod (Matt) Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Move the scalar-region conversion to the generator.Rafael Antognolli2019-04-224-5/+5
| | | | | | | | | | Move the scalar-region conversion from the IR to the generator, so it doesn't affect the Gen11 path. We need the non-scalar regioning for a later lowering pass that we are adding. v2: Better commit message (Matt) Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Only propagate saturation if exec_size is the same.Rafael Antognolli2019-04-221-1/+2
| | | | | | | | | Otherwise it could propagate the saturation from a SIMD16 instruction into a SIMD8 instruction. With that, only part of the destination register, which is the source of the move with saturation, would have been updated. Reviewed-by: Matt Turner <[email protected]>
* intel/fs: Add support for float16 to the fsign optimizationsIan Romanick2019-04-201-6/+24
| | | | | | | | | | | | | | | Commit ad98fbc2174 ("intel/fs: Refactor code generation for nir_op_fsign to its own function") criss-crossed with c2b8fb9a810 ("anv/device: expose VK_KHR_shader_float16_int8 in gen8+"), and I was not paying enough attention when I rebased. This adds back the float16 changes and enables the optimization. v2: Incorporate more changes from 19cd2f5debd and a8d8b1a1391 that I missed in the previous version. Fixes: ad98fbc2174 ("intel/fs: Refactor code generation for nir_op_fsign to its own function") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110474 Reviewed-by: Matt Turner <[email protected]> [v1]
* anv: Rework the descriptor set layout create loopJason Ekstrand2019-04-191-14/+13
| | | | | | | | | | | Previously, we were storing the per-binding create info pointer in the immutable_samplers field temporarily so that we can switch the order in which we walk the loop. However, now that we have multiple arrays of structs to walk, it makes more sense to store an index of some sort. Because we want to leave immutable_samplers as NULL for undefined bindings, we store index + 1 and then subtract one later. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Ignore descriptor binding flags if bindingCount == 0Jason Ekstrand2019-04-191-3/+2
| | | | | | | | | I missed this on the first go round. The bindingCount field of VkDescriptorSetLayoutBindingFlagsCreateInfoEXT is allowed to be zero which means the flags array is ignored. Fixes: d6c9bd6e01b4d "anv: Put binding flags in descriptor set layouts" Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv/nir: Add a central helper for figuring out SSBO address formatsJason Ekstrand2019-04-193-57/+98
| | | | Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Implement VK_EXT_descriptor_indexingJason Ekstrand2019-04-195-2/+93
| | | | | | | | | Now that everything is in place to do bindless for all resource types except input attachments and UBOs, VK_EXT_descriptor_indexing is "trivial". Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Put binding flags in descriptor set layoutsJason Ekstrand2019-04-192-0/+19
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Use bindless handles for imagesJason Ekstrand2019-04-195-4/+63
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: Add support for bindless image load/store/atomicJason Ekstrand2019-04-193-8/+72
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Use bindless textures and samplersJason Ekstrand2019-04-196-31/+228
| | | | | | | | | | This commit changes anv to put bindless handles and sampler pointers into the descriptor buffer and use those instead of bindful when we run out of binding table space. This "spilling" of descriptors allows to to advertise an almost unbounded number of images and samplers. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Pass the plane into lower_tex_derefJason Ekstrand2019-04-191-7/+5
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Use write_image_view to initialize immutable samplersJason Ekstrand2019-04-191-5/+13
| | | | | | | | | Instead of setting it manually, call the helper. When setting descriptor sets becomes more complicated than just setting some struct values, this will keep immutable sampler handling correct. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Count the number of planes in each descriptor bindingJason Ekstrand2019-04-192-3/+19
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel/fs: Add support for bindless texture opsJason Ekstrand2019-04-195-10/+86
| | | | | | | | | | | | | | We add two new texture sources for bindless surface and sampler handles. Bindless surface handles are expected to be pre-shifted so that the 20-bit surface state table index is in the top 20 bits of the 32-bit handle. This lets us avoid any extra shifts in the shader. Bindless sampler handles are 32-byte aligned byte offsets from general state base address. We use 32-byte aligned instead of 16-byte aligned to avoid having to use more indirect messages than needed. It means we can't tightly pack samplers but that's probably not a big deal. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* intel,nir: Lower TXD with a bindless samplerJason Ekstrand2019-04-191-0/+1
| | | | | | | | | When we have a bindless sampler, we need an instruction header. Even in SIMD8, this pushes the instruction over the sampler message size maximum of 11 registers. Instead, we have to lower TXD to TXL. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Implement VK_KHR_shader_atomic_int64Jason Ekstrand2019-04-198-5/+51
| | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Implement SSBOs bindings with GPU addresses in the descriptor BOJason Ekstrand2019-04-196-35/+347
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds a new way for ANV to do SSBO bindings by just passing a GPU address in through the descriptor buffer and using the A64 messages to access the GPU address directly. This means that our variable pointers are now "real" pointers instead of a vec2(BTI, offset) pair. This carries a few of advantages: 1. It lets us support a virtually unbounded number of SSBO bindings. 2. It lets us implement VK_KHR_shader_atomic_int64 which we couldn't implement before because those atomic messages are only available in the bindless A64 form. 3. It's way better than messing around with bindless handles for SSBOs which is the only other option for VK_EXT_descriptor_indexing. 4. It's more future looking, maybe? At the least, this is what NVIDIA does (they don't have binding based SSBOs at all). This doesn't a priori mean it's better, it just means it's probably not terrible. The big disadvantage, of course, is that we have to start doing our own bounds checking for robustBufferAccess again have to push in dynamic offsets. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* anv: Lower some SSBO operations in apply_pipeline_layoutJason Ekstrand2019-04-191-2/+212
| | | | | | | | | | | | In order to avoid the potential overhead of A64 operations on all SSBO ops, we look for those SSBO ops where we can get to the descriptor set from the SSBO access operation and lower those to a binding-table approach. When robustBufferAccess is enabled, this lets the hardware do the bounds checking for us. It also avoids some potentially expensive 64-bit integer calculations. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>