summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* anv: Import mako templates only during execution of anv_extensionsAndres Gomez2018-01-111-5/+5
| | | | | | | | | | | | | | | | | | anv_extensions usage from anv_icd was bringing the unwanted dependency of mako templates for the latter. We don't want that since it will force the dependency even for distributable tarballs which was not needed until now. Jason suggested this approach. v2: Patch simplification (Jason). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104551 Fixes: 0ab04ba979b ("anv: Use python to generate ICD json files") Cc: Jason Ekstrand <[email protected]> Cc: Emil Velikov <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: fix maxDescriptorSet* limitsSamuel Iglesias Gonsálvez2018-01-111-5/+5
| | | | | | | | | | | | | | | "The maxDescriptorSet* limit is n times the corresponding maxPerStageDescriptor* limit, where n is the number of shader stages supported by the VkPhysicalDevice. If all shader stages are supported, n = 6 (vertex, tessellation control, tessellation evaluation, geometry, fragment, compute)." Fixes: dEQP-VK.api.info.device.properties Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/nir: add a helper to lower gl_PatchVerticesIn to a uniformIago Toral Quiroga2018-01-101-0/+2
| | | | | | | | | | | | v2: do not try to handle it as a system value directly for the SPIR-V path. In GL we rather handle it as a uniform like we do for the GLSL path (Jason). v3: - Remove the uniform variable, it is alwats -1 now (Jason) - Also do the lowering for the TessEval stage (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* genxml: Add missing INSTDONE_1 bits on Gen7.5+.Kenneth Graunke2018-01-094-0/+8
| | | | | | This will make aubinator_error_decode decode them properly. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel: Apply Geminilake "Barrier Mode" workaround.Kenneth Graunke2018-01-092-0/+29
| | | | | | | | | | | | | | | | | | | Apparently, Geminilake requires you to whack a chicken bit to select either compute or tessellation mode for barriers. The recommendation is to switch between them at PIPELINE_SELECT time. We may not need to do this all the time, but I don't know that it hurts either. PIPELINE_SELECT is already a pretty giant stall. This appears to fix hangs in tessellation control shaders with barriers on Geminilake. Note that this requires a corresponding kernel change, drm/i915: Whitelist SLICE_COMMON_ECO_CHICKEN1 on Geminilake. in order for the register write to actually happen. Without an updated kernel, this register write will be noop'd and the fix will not work. Reviewed-by: Rafael Antognolli <[email protected]>
* aubinator: add support for aubinating memtrace aubsScott D Phillips2018-01-081-35/+83
| | | | | | | | | Memtrace aubs are similar to classic aubs, with the major difference being how command submission is serialized (as register writes instead of a high-level submit message). Some internal tools generate or consume only memtrace aubs. Reviewed-by: Jordan Justen <[email protected]>
* aubinator: extract aubinator_init() out of the header handler functionScott D Phillips2018-01-081-16/+23
| | | | | | | A later patch will use the aubinator_init() function from the memtrace aub header handler. Reviewed-by: Jordan Justen <[email protected]>
* aubinator: honor --color option when printing the headerScott D Phillips2018-01-081-1/+5
| | | | Reviewed-by: Jordan Justen <[email protected]>
* anv: Allow PMA optimization to be enabled in secondary command buffersAlex Smith2018-01-081-1/+21
| | | | | | | | | | | | | | | | | This was never enabled in secondary buffers because hiz_enabled was never set to true for those. If the app provides a framebuffer in the inheritance info when beginning a secondary buffer, we can determine if HiZ is enabled and therefore allow the PMA optimization to be enabled within the command buffer. This improves performance by ~13% on an internal benchmark on Skylake. v2: Use anv_cmd_buffer_get_depth_stencil_view(). Signed-off-by: Alex Smith <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Take write mask into account in has_color_buffer_write_enabledAlex Smith2018-01-051-9/+18
| | | | | | | | | | | | | | | | | | | | If we have a color attachment, but its writes are masked, this would have still returned true. This is inconsistent with how HasWriteableRT in 3DSTATE_PS_BLEND is set, which does take the mask into account. This could lead to PixelShaderHasUAV not being set in 3DSTATE_PS_EXTRA if the fragment shader does use UAVs, meaning the fragment shader may not be invoked because HasWriteableRT is false. Specifically, this was seen to occur when the shader also enables early fragment tests: the fragment shader was not invoked despite passing depth/stencil. Fix by taking the color write mask into account in this function. This is consistent with how things are done on i965. Signed-off-by: Alex Smith <[email protected]> Cc: [email protected] Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Add missing unlock in anv_scratch_pool_allocAlex Smith2018-01-041-1/+3
| | | | | | | | | Fixes hangs seen due to the lock not being released here. Signed-off-by: Alex Smith <[email protected]> Cc: [email protected] Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Drop support for the legacy SNORM -> Float equation.Kenneth Graunke2018-01-029-44/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Older OpenGL defines two equations for converting from signed-normalized to floating point data. These are: f = (2c + 1)/(2^b - 1) (equation 2.2) f = max{c/2^(b-1) - 1), -1.0} (equation 2.3) Both OpenGL 4.2+ and OpenGL ES 3.0+ mandate that equation 2.3 is to be used in all scenarios, and remove equation 2.2. DirectX uses equation 2.3 as well. Intel hardware only supports equation 2.3, so Gen7.5+ systems that use the vertex fetcher hardware to do the conversions always get formula 2.3. This can make a big difference for 10-10-10-2 formats - the 2-bit value can represent 0 with equation 2.3, and cannot with equation 2.2. Ivybridge and older were using equation 2.2 for OpenGL, and 2.3 for ES. Now that Ivybridge supports OpenGL 4.2, this is wrong - we need to use the new rules, at least in core profile. That would leave Gen4-6 doing something different than all other hardware, which seems...lame. With context version promotion, applications that requested a pre-4.2 context may get promoted to 4.2, and thus get the new rules. Zero cases have been reported of this being a problem. However, we've received a report that following the old rules breaks expectations. SuperTuxKart apparently renders the cars red when following equation 2.2, and works correctly when following equation 2.3: https://github.com/supertuxkart/stk-code/issues/2885#issuecomment-353858405 So, this patch deletes the legacy equation 2.2 support entirely, making all hardware and APIs consistently use the new equation 2.3 rules. If we ever find an application that truly requires the old formula, then we'd likely want that application to work on modern hardware, too. We'd likely restore this support as a driconf option. Until then, drop it. This commit will regress Piglit's draw-vertices-2101010 test on pre-Haswell without the corresponding Piglit patch to accept either formula (commit 35daaa1695ea01eb85bc02f9be9b6ebd1a7113a1): draw-vertices-2101010: Accept either SNORM conversion formula. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Combine {VS,FS}_OPCODE_GET_BUFFER_SIZE opcodes.Kenneth Graunke2017-12-308-19/+14
| | | | | | These are the same, we don't need a separate opcode enum per backend. Reviewed-by: Jason Ekstrand <[email protected]>
* anv/device: Mark all state buffers as needing captureJason Ekstrand2017-12-281-3/+3
| | | | | | | Previously, we were flagging the instruction state buffer for capture but not surface state or dynamic state. We want those captured too. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/aubinator: Gracefully handle dynamic state not being availableJason Ekstrand2017-12-281-0/+5
| | | | | | | | Some older versions of the Vulkan driver didn't properly tag dynamic state as needing to be captured. Also, this prevents crashes when looking at dumps on older kernels. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/aubinator: Free section data lastJason Ekstrand2017-12-281-2/+4
| | | | | | | | | We were walking the sections, printing the batches, and then freeing them in one pass. If the batch happens to reference any earlier sections (which it almost certainly will since it's at the end), we will access freed memory. Reviewed-by: Lionel Landwerlin <[email protected]>
* Revert "i965/fs: Use align1 mode on ternary instructions on Gen10+"Anuj Phogat2017-12-221-8/+4
| | | | | | | | | | | This reverts commit 9cd60fce9c22737000a8f8dc711141f8a523fe75. Above commit caused 2000+ piglit tests to assert fail. Disabling the align1 mode on gen10 for now to avoid failures. Cc: Matt Turner <[email protected]> Cc: Rafael Antognolli <[email protected]> Signed-off-by: Anuj Phogat <[email protected]> Tested-by: Rafael Antognolli <[email protected]>
* intel/fs: Initialize fs_visitor::grf_used on construction.Francisco Jerez2017-12-211-0/+1
| | | | | | | | | | | | | | | This should shut up some Valgrind errors during pre-regalloc scheduling. The errors were harmless since they could only have led to the estimation of the bank conflict penalty of an instruction pre-regalloc, which is inaccurate at that point of the program compilation, but no less accurate than the intended "return 0" fall-back path. The scheduling pass is normally re-run after regalloc with a well-defined grf_used value and accurate bank conflict information. Fixes: acf98ff933d "intel/fs: Teach instruction scheduler about GRF bank conflict cycles." Reported-by: Eero Tamminen <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* intel/fs/bank_conflicts: Use posix_memalign() instead of overaligned new to ↵Francisco Jerez2017-12-211-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | obtain vector storage. The weight_vector_type constructor was inadvertently assuming C++17 semantics of the new operator applied on a type with alignment requirement greater than the largest fundamental alignment. Unfortunately on earlier C++ dialects the implementation was allowed to raise an allocation failure when the alignment requirement of the allocated type was unsupported, in an implementation-defined fashion. It's expected that a C++ implementation recent enough to implement P0035R4 would have honored allocation requests for such over-aligned types even if the C++17 dialect wasn't active, which is likely the reason why this problem wasn't caught by our CI system. A more elegant fix would involve wrapping the __SSE2__ block in a '__cpp_aligned_new >= 201606' preprocessor conditional and continue taking advantage of the language feature, but that would yield lower compile-time performance on old compilers not implementing it (e.g. GCC versions older than 7.0). Fixes: af2c320190f3c731 "intel/fs: Implement GRF bank conflict mitigation pass." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104226 Reported-by: Józef Kucia <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* anv: disallow VK_REMAINING_ARRAY_LAYERS in vkCmdClearAttachments()Samuel Iglesias Gonsálvez2017-12-201-0/+2
| | | | | | | | Vulkan spec doesn't specify that VK_REMAINING_ARRAY_LAYERS is allowed in the passed VkClearRect struct. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler/gen10: Disable push constants.Rafael Antognolli2017-12-191-0/+9
| | | | | | | | | | | | We still have gpu hangs on Cannonlake when using push constants, so disable them for now until we have a proper fix for these hangs. v2: Add warning message when creating context too. Signed-off-by: Rafael Antognolli <[email protected]> Cc: Ben Widawsky <[email protected]> Cc: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* anv: Remove unused variable.Bas Nieuwenhuizen2017-12-171-2/+0
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* isl: Don't require VALIGN_2 for R32G32B32_FLOAT on Haswell.Kenneth Graunke2017-12-151-1/+3
| | | | | | | | | | | | | | According to the RENDER_SURFACE_STATE internal documentation, the R32G32B32_FLOAT restriction is marked "IVB" only. We choose to apply it to Ivybridge and Baytrail, but not Haswell. Apparently fixes KHR-GL46.texture_size_promotion.functional on Haswell. Changes these tests from crashing to skipping on Haswell: - KHR-GL46.direct_state_access.textures_storage_multisample_2d_rgb32f - KHR-GL46.direct_state_access.textures_storage_multisample_3d_rgb32f Reviewed-by: Jason Ekstrand <[email protected]>
* intel/tools: Convert aubinator over to the common frameworkJason Ekstrand2017-12-143-690/+33
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/batch-decoder: Decode registersJason Ekstrand2017-12-141-0/+13
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/batch-decoder: Decode dynamic stateJason Ekstrand2017-12-141-0/+81
| | | | | | | | Unfortunately, in aubinator and aubinator_error_decode we don't always know how many of a given state we have, so we must guess. One day, we'll come up with a way to annotate the batch to solve this problem. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/batch-decoder: Decode constants, binding tables, and samplersJason Ekstrand2017-12-141-0/+73
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/tools: Switch aubinator_error_decode over to the gen_print_batchJason Ekstrand2017-12-143-205/+37
| | | | | | | The shared framework can now do everything that aubinator_error_decode ever did and more. It's time to make the switch. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/batch-decoder: Decode graphics shadersJason Ekstrand2017-12-141-0/+95
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/batch-decoder: Decode vertex and index buffersJason Ekstrand2017-12-142-0/+161
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/batch-decoder: Decode MEDIA_INTERFACE_DESCRIPTOR_LOADJason Ekstrand2017-12-141-0/+145
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/tools: Add the start of a generic batch decoderJason Ekstrand2017-12-142-0/+306
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/decoder: Expose the raw field value in the iteratorJason Ekstrand2017-12-142-1/+3
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/disasm: Take a devinfo in gen_disasm_createJason Ekstrand2017-12-144-8/+7
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/decoder: Take a bit offset in gen_print_groupJason Ekstrand2017-12-144-18/+22
| | | | | | | | | | Previously, if a group was nested in another group such that it didn't start on a dword boundary, we would decode it as if it started at the start of its first dword. This changes things to work even more in terms of bits so that we can properly decode these structs. This affects MOCS, attribute swizzles, and several other things. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/decoder: Stop rounding down to the nearest dwordJason Ekstrand2017-12-141-11/+12
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/decoder: Convert the iterator to work entirely in bitsJason Ekstrand2017-12-142-12/+9
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/decoder: Drop gen_field_decode helperJason Ekstrand2017-12-142-11/+0
| | | | | | It's unused Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/fs/bank_conflicts: Don't touch Gen7 MRF hack registers.Francisco Jerez2017-12-123-7/+19
| | | | | | | Fixes: af2c320190f3c731 "intel/fs: Implement GRF bank conflict mitigation pass." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104199 Reported-by: Darius Spitznagel <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* anv: fix bug when using component qualifier in FS outputsSamuel Iglesias Gonsálvez2017-12-121-19/+44
| | | | | | | | | | | | | | | | | | | | | | | We can write to the same output but in different components, like in this example: layout(location = 0, component = 0) out ivec2 dEQP_FragColor_0; layout(location = 0, component = 2) out ivec2 dEQP_FragColor_1; Therefore, they are not two different outputs but only one. Fixes: dEQP-VK.glsl.440.linkage.varying.component.frag_out.* v3: - Remove FRAG_RESULT_MAX. - Add const and use sizeof (Ian). - Do three-pass to set properly the locations of fragment outputs when having arrays (Jason). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Enable UBO pushingJason Ekstrand2017-12-082-0/+7
| | | | | | | | | | | | | Push constants on Intel hardware are significantly more performant than pull constants. Since most Vulkan applications don't actively use push constants on Vulkan or at least don't use it heavily, we're pulling way more than we should be. By enabling pushing chunks of UBOs we can get rid of a lot of those pulls. On my SKL GT4e, this improves the performance of Dota 2 and Talos by around 2.5% and improves Aztec Ruins by around 2%. Reviewed-by: Jordan Justen <[email protected]>
* i965/fs: Handle !supports_pull_constants and push UBOs properlyJason Ekstrand2017-12-081-1/+1
| | | | | | In Vulkan, we don't support classic pull constants and everything the client asks us to push, we push. However, for pushed UBOs, we still want to fall back to conventional pulls if we run out of space.
* anv/device: Increase the UBO alignment requirement to 32Jason Ekstrand2017-12-081-2/+10
| | | | | | | | Push constants work in terms of 32-byte chunks so if we want to be able to push UBOs, every thing needs to be 32-byte aligned. Currently, we only require 16-byte which is too small. Reviewed-by: Jordan Justen <[email protected]>
* anv/cmd_buffer: Add support for pushing UBO rangesJason Ekstrand2017-12-082-33/+112
| | | | | | | | In order to do this we have to modify push constant set up to handle ranges. We also have to tweak the way we handle dirty bits a bit so that we re-push whenever a descriptor set changes. Reviewed-by: Jordan Justen <[email protected]>
* anv/cmd_buffer: Add some stage assertsJason Ekstrand2017-12-081-0/+6
| | | | | | | There are several places where we look up opcodes in an array of stages. Assert that the we don't end up going out-of-bounds. Reviewed-by: Jordan Justen <[email protected]>
* anv/cmd_buffer: Add some helpers for working with descriptor setsJason Ekstrand2017-12-081-11/+34
| | | | Reviewed-by: Jordan Justen <[email protected]>
* anv/pipeline: Translate vulkan_resource_index to a constant when possibleJason Ekstrand2017-12-081-4/+13
| | | | | | | | | | We want to call brw_nir_analyze_ubo_ranges immedately after anv_nir_apply_pipeline_layout and it badly wants constants. We could run an optimization step and let constant folding do it but that's way more expensive than needed. It's really easy to just handle constants in apply_pipeline_layout. Reviewed-by: Jordan Justen <[email protected]>
* i965/fs: Rewrite assign_constant_locationsJason Ekstrand2017-12-081-133/+185
| | | | | | | | | | | | | | | | | | | | This rewires the logic for assigning uniform locations to work in terms of "complex alignments". The basic idea is that, as we walk the list of instructions, we keep track of the alignment and continuity requirements of each slot and assert that the alignments all match up. We then use those alignments in the compaction stage to ensure that everything gets placed at a properly aligned register. The old mechanism handled alignments by special-casing each of the bit sizes and placing 64-bit values first followed by 32-bit values. The old scheme had the advantage of never leaving a hole since all the 64-bit values could be tightly packed and so could the 32-bit values. However, the new scheme has no type size special cases so it handles not only 32 and 64-bit types but should gracefully extend to 16 and 8-bit types as the need arises. Tested-by: Jose Maria Casanova Crespo <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* anv: Disable VK_KHR_16bit_storageJason Ekstrand2017-12-082-3/+3
| | | | | | | | | | | | The testing for this extension is currently very poor. The CTS tests only test accessing UBOs and SSBOs at dynamic offsets so none of our constant-offset paths get triggered at all. Also, there's an assertion in our handling of nir_intrinsic_load_uniform that offset % 4 == 0 which is never triggered indicating that nothing every gets loaded from an offset which is not a dword. Both push constants and the constant offset pull paths are complex enough, we really don't want to ship without tests. We'll turn the extension back on once we have decent tests.
* intel/cfg: Represent divergent control flow paths caused by non-uniform loop ↵Francisco Jerez2017-12-071-6/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | execution. This addresses a long-standing back-end compiler bug that could lead to cross-channel data corruption in loops executed non-uniformly. In some cases live variables extending through a loop divergence point (e.g. a non-uniform break) into a convergence point (e.g. the end of the loop) wouldn't be considered live along all physical control flow paths the SIMD thread could possibly have taken in between due to some channels remaining in the loop for additional iterations. This patch fixes the problem by extending the CFG with physical edges that don't exist in the idealized non-vectorized program, but represent valid control flow paths the SIMD EU may take due to the divergence of logical threads. This makes sense because the i965 IR is explicitly SIMD, and it's not uncommon for instructions to have an influence on neighboring channels (e.g. a force_writemask_all header setup), so the behavior of the SIMD thread as a whole needs to be considered. No changes in shader-db. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>