summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* anv: Enable VK_KHX_multiview and SPV_KHR_multiviewJason Ekstrand2017-05-032-0/+5
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* anv/cmd_buffer: Emit instanced draws for multiple viewsJason Ekstrand2017-05-033-5/+135
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* anv/cmd_buffer: Pull indirect draw parameter loading into a helperJason Ekstrand2017-05-031-10/+24
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* anv/pipeline: Add shader lowering for multiviewJason Ekstrand2017-05-034-0/+244
| | | | | | | | v2 (Jason Ekstrand): - Take a view_mask rather than a whole subpass - Build the view mask into the VS shader key Reviewed-by: Iago Toral Quiroga <[email protected]>
* anv/pipeline: Add a subpass field to anv_pipelineJason Ekstrand2017-05-032-5/+8
| | | | | | This simplifies the code a variety of places. Reviewed-by: Iago Toral Quiroga <[email protected]>
* anv/pipeline: Call nir_gather_info laterJason Ekstrand2017-05-031-2/+2
| | | | | | | We want to insert more lowering code that may insert system values and we need to gather info after that lowering. Reviewed-by: Iago Toral Quiroga <[email protected]>
* anv: Move shader hashing to anv_pipelineJason Ekstrand2017-05-033-45/+46
| | | | | | | | | | Shader hashing is very closely related to shader compilation. Putting them right next to each other in anv_pipeline makes it easier to verify that we're actually hashing everything we need to be hashing. The only real change (other than the order of hashing) is that we now hash in the shader stage. Reviewed-by: Iago Toral Quiroga <[email protected]>
* anv/pass: Store the per-subpass view maskJason Ekstrand2017-05-032-0/+21
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* anv: Add the KHX_multiview boilerplateJason Ekstrand2017-05-032-0/+18
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* anv/nir: Delete the apply_dynamic_offsets prototypeJason Ekstrand2017-05-031-3/+0
| | | | | | | | That pass hasn't existed since dd4db84640bbb694f180dd50850c3388f67228be but the prototype stuck around for no reason. Reviewed-by: Elie Tournier <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/vec4: don't modify regioning parameters to the sources of DF align1 ↵Samuel Iglesias Gonsálvez2017-05-031-8/+1
| | | | | | | | | | | | | instructions The regioning parameters are now properly set by convert_to_hw_regs() and we don't need to fix them in the generator. That latter fix previously done in the generator was strictly speaking wrong for any non-identity regions. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: fix register width for DF VGRF and UNIFORMSamuel Iglesias Gonsálvez2017-05-031-5/+7
| | | | | | | | | | | | | | | | | | | | | | | On gen7, the swizzles used in DF align16 instructions works for element size of 32 bits, so we can address only 2 consecutive DFs. As we assumed that in the rest of the code and prepare the instructions for this (scalarize_df()), we need to set it to two again. However, for DF align1 instructions, a width of 2 is wrong as we are not reading the data we want. For example, an uniform would have a region of <0, 2, 1> so it would repeat the first 2 DFs, when we wanted to access to the first 4. This patch sets the default one to 4 and then modifies the width of align16 instruction's DF sources when we translate the logical swizzle to the physical one. v2: - Remove conditional (Curro). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: fix vertical stride to avoid breaking region parameter ruleSamuel Iglesias Gonsálvez2017-05-031-18/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | From IVB PRM, vol4, part3, "General Restrictions on Regioning Parameters": "If ExecSize = Width and HorzStride ≠ 0, VertStride must be set to Width * HorzStride." In next patch, we are going to modify the region parameter for uniforms and vgrf. For uniforms that are the source of DF align1 instructions, they will have <0, 4, 1> regioning and the execsize for those instructions will be 4, so they will break the regioning rule. This will be the same for VGRF sources where we use the vstride == 0 exploit. As we know we are not going to cross the GRF boundary with that execsize and parameters (not even with the exploit), we just fix the vstride here. v2: - Move is_align1_df() (Curro) - Refactor exec_size == width calculation (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* anv/tests: Create a dummy instance as well as deviceJason Ekstrand2017-05-014-4/+16
| | | | | | | | | This fixes crashes caused by 35e626bd0e59e7ce9fd97ccef66b2468c09206a4 which made us start referencing the instance in the allocators. With this commit, the tests now happily pass again. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100877 Tested-by: Vinson Lee <[email protected]>
* anv: Drop 'x11' prefix from non-X11 WSI funcsChad Versace2017-04-281-16/+16
| | | | | | | Drop it from x11_anv_wsi_image_create and x11_anv_wsi_image_free. The functions are used by Wayland WSI too. Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Alphabetize KHR extensionsJason Ekstrand2017-04-281-18/+18
| | | | Reviewed-by: Alejandro Piñeiro <[email protected]>
* anv: Move queues, events, and semaphores to their own fileJason Ekstrand2017-04-273-484/+516
| | | | | | | Things are about to get more complicated, especially as far as semaphores are concerned. Reviewed-by: Chad Versace <[email protected]>
* anv: Implement VK_KHX_external_memory_fdJason Ekstrand2017-04-273-18/+113
| | | | | | | | | | | | | | | | | | This commit just exposes the memory handle type. There's interesting we need to do here for images. So long as the user doesn't set any crazy environment variables such as INTEL_DEBUG=nohiz, all of the compression formats etc. should "just work" at least for opaque handle types. v2 (chadv): - Rebase. - Fix vkGetPhysicalDeviceImageFormatProperties2KHR when handleType == 0. - Move handleType-independency comments out of handleType-switch, in vkGetPhysicalDeviceExternalBufferPropertiesKHX. Reduces diff in future dma_buf patches. Co-authored-with: Chad Versace <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* anv: Use the BO cache for DeviceMemory allocationsJason Ekstrand2017-04-275-26/+30
| | | | Reviewed-by: Chad Versace <[email protected]>
* anv/allocator: Add a BO cacheJason Ekstrand2017-04-272-0/+278
| | | | | | | | | | | | This cache allows us to easily ensure that we have a unique anv_bo for each gem handle. We'll need this in order to support multiple-import of memory objects and semaphores. v2 (Jason Ekstrand): - Reject BO imports if the size doesn't match the prime fd size as reported by lseek(). Reviewed-by: Chad Versace <[email protected]>
* anv: Implement VK_KHX_external_memoryJason Ekstrand2017-04-272-0/+5
| | | | | | | This is the trivial implementation that just exposes the extension string but exposes zero external handle types. Reviewed-by: Chad Versace <[email protected]>
* anv: Implement VK_KHX_external_memory_capabilitiesChad Versace2017-04-274-14/+116
| | | | | | | | | | | | | | | | | | This is a complete but trivial implementation. It's trivial becasue We support no external memory capabilities yet. Most of the real work in this commit is in reworking the UUIDs advertised by the driver. v2 (chadv): - Fix chain traversal in vkGetPhysicalDeviceImageFormatProperties2KHR. Extract VkPhysicalDeviceExternalImageFormatInfoKHX from the chain of input structs, not the chain of output structs. - In vkGetPhysicalDeviceImageFormatProperties2KHR, iterate over the input chain and the output chain separately. Reduces diff in future dma_buf patches. Co-authored-with: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/physical_device: Rename uuid to pipeline_cache_uuidJason Ekstrand2017-04-273-5/+6
| | | | | | | We're about to have more UUIDs for different things so this one really needs to be properly labeled. Reviewed-by: Chad Versace <[email protected]>
* anv: Refactor device_get_cache_uuid into physical_device_init_uuidsJason Ekstrand2017-04-271-13/+17
| | | | Reviewed-by: Chad Versace <[email protected]>
* anv: Set EXEC_OBJECT_ASYNC when availableJason Ekstrand2017-04-274-0/+10
| | | | Reviewed-by: Chad Versace <[email protected]>
* anv/cmd_buffer: Use the device allocator for QueueSubmitJason Ekstrand2017-04-271-3/+3
| | | | | | | | The command is really operating on a Queue not a command buffer and the nearest object to that with an allocator is VkDevice. Reviewed-by: Chad Versace <[email protected]> Cc: "17.0 17.1" <[email protected]>
* anv: Don't place scratch buffers above the 32-bit boundaryJason Ekstrand2017-04-271-0/+19
| | | | | | | | | | | | This fixes rendering corruptions in DOOM. Hopefully, it will also make Jenkins a bit more stable as we've been seeing some random failures and GPU hangs ever since turning on 48bit. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100620 Fixes: 651ec926fc1 "anv: Add support for 48-bit addresses" Tested-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "17.1" <[email protected]>
* genxml: Fix gen_pack_header.py crash when field type is invalid.Rafael Antognolli2017-04-241-2/+2
| | | | | | | | Just return earlier in that case. Also set prefix to an empty string, so we don't get to use it undefined. Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* genxml: Make BLEND_STATE command support variable length array.Rafael Antognolli2017-04-247-48/+74
| | | | | | | | | | | | | | | | | | | | | | | | We need to emit BLEND_STATE, which size is 1 + 2 * nr_draw_buffers dwords (on gen8+), but the BLEND_STATE struct length is always 17. By marking it size 1, which is actually the size of the struct minus the BLEND_STATE_ENTRY's, we can emit a BLEND_STATE of variable number of entries. For gen6 and gen7 we set length to 0, since it only contains BLEND_STATE_ENTRY's, and no other data. With this change, we also change the code for blorp and anv to emit only the needed BLEND_STATE_ENTRY's, instead of always emitting 16 dwords on gen6-7 and 17 dwords on gen8+. v2: - Use designated initializers on blorp and remove 0 from initialization (Jason) - Default entries to disabled on Vulkan (Jason) - Rebase code. Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* genxml: Fix python crash when no dwords are found.Rafael Antognolli2017-04-241-5/+12
| | | | | | | | | | | | | | | If the 'dwords' dict is empty, max(dwords.keys()) throws an exception. This case could happen when we have an instruction that is only an array of other structs, with variable length. v2: - Add another clause for empty dwords and make it work with python 3 (Dylan) - Set the length to 0 if dwords is empty, and do not declare dw Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Dylan Baker <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* genxml: Remove unused parameter.Rafael Antognolli2017-04-241-2/+2
| | | | | | | 'start' parameter from Group.emit_pack_function() is useless. Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/aubinator: Correctly read variable length structs.Rafael Antognolli2017-04-243-6/+54
| | | | | | | | | | | | | | | | Before this commit, when a group with count="0" is found, only one field is added to the struct representing the instruction. This causes only one entry to be printed by aubinator, for variable length groups. With this commit we "detect" that there's a variable length group (count="0") and store the offset of the last entry added to the struct when reading the xml. When finally reading the aubdump file, we check the size of the group and whether we have variable number of elements, and in that case, reuse the last field to add the remaining elements. Signed-off-by: Rafael Antognolli <[email protected]> Tested-by: Jason Ekstrand <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* isl/format: Update the R16G16B16X16_FLOAT entryNanley Chery2017-04-241-1/+1
| | | | | | | | | | | The section of the PRM mentioned in the code comment above this table says that this format supports the render target write message. Internal documentation says that this format also supports alpha blending. As a side effect, this allows CCS_D buffers to be created for images with this format. Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Nanley Chery <[email protected]>
* anv/pass: Delete anv_pass::subpass_attachmentsNanley Chery2017-04-241-1/+0
| | | | | | | This field has no users. Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Nanley Chery <[email protected]>
* intel/fs: Take into account amount of data read in spilling cost heuristic.Francisco Jerez2017-04-241-1/+1
| | | | | | | | | | | | | | | | | | | Until now the spilling cost calculation was neglecting the amount of data read from the register during the spilling cost calculation. This caused it to make suboptimal decisions in some cases leading to higher memory bandwidth usage than necessary. Improves Unigine Heaven performance by ~4% on BDW, reversing an unintended FPS regression from my previous commit 147e71242ce539ff28e282f009c332818c35f5ac with n=12 and statistical significance 5%. In addition SynMark2 OglCSDof performance is improved by an additional ~5% on SKL, and a Kerbal Space Program apitrace around the Moho planet I can provide on request improves by ~20%. Cc: <[email protected]> Reviewed-by: Plamena Manolova <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/fs: Use regs_written() in spilling cost heuristic for improved accuracy.Francisco Jerez2017-04-241-2/+1
| | | | | | | | | | This is what we use later on to compute the number of registers that will actually get spilled to memory, so it's more likely to match reality than the current open-coded approximation. Cc: <[email protected]> Reviewed-by: Plamena Manolova <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/vec4: Use reads_accumulator_implicitly(), not MACH checks.Kenneth Graunke2017-04-241-4/+4
| | | | | | | | | Curro pointed out that I should not just check for MACH, but use the reads_accumulator_implicitly() helper, which would also prevent the same bug with MAC and SADA2 (if we ever decide to use them). Cc: [email protected] Reviewed-by: Francisco Jerez <[email protected]>
* nir/i965: add before ffma algebraic optsTimothy Arceri2017-04-241-0/+6
| | | | | | | | | | | | | | | | | | | | | | | This shuffles constants down in the reverse of what the previous patch does and applies some simpilifications that may be made possible from doing so. Shader-db results BDW: total instructions in shared programs: 12980814 -> 12977822 (-0.02%) instructions in affected programs: 281889 -> 278897 (-1.06%) helped: 1231 HURT: 128 total cycles in shared programs: 246562852 -> 246567288 (0.00%) cycles in affected programs: 11271524 -> 11275960 (0.04%) helped: 1630 HURT: 1378 V2: mark float opts as inexact Reviewed-by: Elie Tournier <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Avoid reswizzling MACH instructions in opt_register_coalesce().Kenneth Graunke2017-04-221-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | opt_register_coalesce() was optimizing sequences such as: mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D mach(8) vgrf5.xy:D, attr18.xyyy:D, attr19.xyyy:D mov(8) m4.zw:F, vgrf5.xxxy:F into: mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D mach(8) m4.zw:D, attr18.xxxy:D, attr19.xxxy:D This doesn't work - if we're going to reswizzle MACH, we'd need to reswizzle the MUL as well. Here, the MUL fills the accumulator's .zw components with attr18.yy * attr19.yy. But the MACH instruction expects .z to contain attr18.x * attr19.x. Bogus results ensue. No change in shader-db on Haswell. Prevents regressions in Timothy's patches to use enhanced layouts for varying packing (which rearrange code just enough to trigger this pre-existing bug, but were fine themselves). Acked-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/query: Use genxml for MI_MATHJason Ekstrand2017-04-201-43/+28
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed by: Iago Toral Quiroga <[email protected]>
* genxml: Add better support for MI_MATHJason Ekstrand2017-04-203-12/+195
| | | | | | | | This breaks the guts of MI_MATH (the instruction part) out into its own structure with proper named values. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed by: Iago Toral Quiroga <[email protected]>
* genxml/pack: Allow hex values in the XMLJason Ekstrand2017-04-201-1/+2
| | | | | | Acked-by: Kenneth Graunke <[email protected]> Reviewed by: Iago Toral Quiroga <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* anv/cmd_buffer: Disable CCS on BDW input attachmentsNanley Chery2017-04-172-30/+13
| | | | | | | | | | | | | | | | | The description under RENDER_SURFACE_STATE::RedClearColor says, For Sampling Engine Multisampled Surfaces and Render Targets: Specifies the clear value for the red channel. For Other Surfaces: This field is ignored. This means that the sampler on BDW doesn't support CCS. Cc: Samuel Iglesias Gonsálvez <[email protected]> Cc: Jordan Justen <[email protected]> Cc: <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Nanley Chery <[email protected]>
* anv: blorp: flush memory after copyLionel Landwerlin2017-04-171-2/+2
| | | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Cc: "13.0 17.0" <[email protected]>
* intel/decoder: Fix is_header_field starting condition.Kenneth Graunke2017-04-161-1/+1
| | | | | | | | | | | Starting positions >= 32 are not part of the header, rather than >. Caught by Coverity, which found that "bits <<= field->start" may shift by 32, which has undefined behavior. CID: 1404968 Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Add the pci_id into the shader cache UUIDJason Ekstrand2017-04-141-5/+15
| | | | | | | | | | This prevents a user from using a cache created on one hardware generation on a different one. Of course, with Intel hardware, this requires moving their drive from one machine to another but it's still possible and we should prevent it. Reviewed-by: Chad Versace <[email protected]> Cc: [email protected]
* i965: Use correct VertStride on align16 instructions.Matt Turner2017-04-141-10/+34
| | | | | | | | | | | | | | | | | | | | | | In commit c35fa7a, we changed the "width" of DF source registers to 2, which is conceptually fine. Unfortunately a VertStride of 2 is not allowed by align16 instructions on IVB/BYT, and the regular VertStride of 4 works fine in any case. See generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/vs-round-double.shader_test for example: cmp.ge.f0(8) g18<1>DF g1<0>.xyxyDF -g8<2>DF { align16 1Q }; ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed cmp.ge.f0(8) g19<1>DF g1<0>.xyxyDF -g9<2>DF { align16 2N }; ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed v2: - Add spec quote (Curro). - Change the condition to only BRW_VERTICAL_STRIDE_2 (Curro) Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4/dce: improve track of partial flag register writesSamuel Iglesias Gonsálvez2017-04-141-1/+1
| | | | | | | | | | | | | | | This is required for correctness in presence of multiple 4-wide flag writes (e.g. 4-wide instructions with a conditional mod set) which update a different portion of the same 8-bit flag subregister. Right now we keep track of flag dataflow with 8-bit granularity and consider flag writes to have killed any previous definition of the same subregister even if the write was less than 8 channels wide, which can cause live flag register updates to be dead code-eliminated incorrectly. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: don't do horizontal stride on some register file typesSamuel Iglesias Gonsálvez2017-04-141-2/+5
| | | | | | | | | | | | | horiz_offset() shouldn't be doing anything for scalar registers, because all channels of any SIMD instructions will end up reading or writing the same component of the register, so shifting the register offset would be wrong. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> [ Francisco Jerez: Re-implement in terms of is_uniform() for simplicity. Pass argument by const reference. Clarify commit message. ] Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: Fix exec size for MOVs {SET,PICK}_{HIGH,LOW}_32BIT.Matt Turner2017-04-141-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise for a pack_double_2x32_split opcode, we emit: vec1 64 ssa_135 = pack_double_2x32_split ssa_133, ssa_134 mov(8) g5<1>UD g5<4>.xUD { align16 1Q compacted }; mov(8) g7<2>UD g5<4,4,1>UD { align1 1Q }; ERROR: When the destination spans two registers, the source must span two registers (exceptions for scalar source and packed-word to packed-dword expansion) mov(8) g8<2>UD g5.4<4,4,1>UD { align1 2N }; ERROR: The offset from the two source registers must be the same mov(8) g5<1>UD g6<4>.xUD { align16 1Q compacted }; mov(8) g7.1<2>UD g5<4,4,1>UD { align1 1Q }; ERROR: When the destination spans two registers, the source must span two registers (exceptions for scalar source and packed-word to packed-dword expansion) mov(8) g8.1<2>UD g5.4<4,4,1>UD { align1 2N }; ERROR: The offset from the two source registers must be the same The intention was to emit mov(4)s for the instructions that have ERROR annotations. See tests/spec/arb_gpu_shader_fp64/execution/vs-isinf-dvec.shader_test for example. v2 (Samuel): - Instead of setting the exec size to a fixed value, don't double it (Curro). - Add PICK_{HIGH,LOW}_32BIT to the condition. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> [ Francisco Jerez: Trivial rebase changes. ] Reviewed-by: Francisco Jerez <[email protected]>