summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* meson: Use dependencies for nirDylan Baker2018-01-113-12/+15
| | | | | | | | | | | | | | | | | This creates two new internal dependencies, idep_nir_headers and idep_nir. The former encapsulates the generation of nir_opcodes.h and nir_builder_opcodes.h and adding src/compiler/nir as an include path. This ensures that any target that needs nir headers will have the includes and that the generated headers will be generated before the target is build. The second, idep_nir, includes the first and additionally links to libnir. This is intended to make it easier to avoid race conditions in the build when using nir, since the number of consumers for libnir and it's headers are quite high. Acked-by: Eric Engestrom <[email protected]> Signed-off-by: Dylan Baker <[email protected]>
* meson: don't use intermediate variables that are immediately discardedDylan Baker2018-01-113-10/+5
| | | | | | | | | | | | | | | | For things like: loop x = func() list += x end just do: loop list += func() end Acked-by: Eric Engestrom <[email protected]> Signed-off-by: Dylan Baker <[email protected]>
* meson: Use consistent style for testsDylan Baker2018-01-113-26/+33
| | | | | | | Don't use intermediate variables, use consistent whitespace. Acked-by: Eric Engestrom <[email protected]> Signed-off-by: Dylan Baker <[email protected]>
* meson: Use consistent styleDylan Baker2018-01-112-32/+56
| | | | | | | | | | | | | | | | | | | | Currently the meosn build has a mix of two styles: arg : [foo, ... bar], and arg : [ foo, ..., bar, ] For consistency let's pick one. I've picked the later style, which I think is more readable, and is more common in the mesa code base. v2: - fix commit message Acked-by: Eric Engestrom <[email protected]> Signed-off-by: Dylan Baker <[email protected]>
* i965: Use UD types for gl_SampleID setupJason Ekstrand2018-01-111-3/+3
| | | | | | | | We already had to switch all of the W types to UW to prevent issues with vector immediates on gen10. We may as well use unsigned types everywhere. Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Use UW types when using V immediatesJason Ekstrand2018-01-112-5/+5
| | | | | | | | | | | | | | | | | | | | | | Gen 10 has a strange hardware bug involving V immediates with W types. It appears that a mov(8) g2<1>W 0x76543210V will actually result in g2 getting the value {3, 2, 1, 0, 3, 2, 1, 0}. In particular, the bottom four nibbles are repeated instead of the top four being taken. (A mov of 0x00003210V yields the same result.) This bug does not appear in any hardware documentation as far as we can tell and the simulator does not implement the bug either. Commit 6132992cdb858268af0e985727d80e4140be389c was mostly a no-op except that it changed the type of the subgroup invocation from UW to W and caused us to tickle this bug with basically every compute shader that uses any sort of invocation ID (which is most of them). This is also potentially an issue for geometry shader input pulls and SampleID setup. The easy solution is just to change the few places where we use a vector integer immediate with a W type to use a UW type. Reviewed-by: Matt Turner <[email protected]> Cc: [email protected] Fixes: 6132992cdb858268af0e985727d80e4140be389c
* Revert "Revert "i965/fs: Use align1 mode on ternary instructions on Gen10+""Matt Turner2018-01-111-4/+8
| | | | | | This reverts commit 2d0457203871c843ebfc90fb895b65a9b14cd9bb. Acked-by: Scott D Phillips <[email protected]>
* i965/fs: Add/use functions to convert to 3src_align1 vstride/hstrideMatt Turner2018-01-111-28/+41
| | | | | | | | | | Some cases weren't handled, such as stride 4 which is needed for 64-bit operations. Presumably fixes the assertion failure mentioned in commit 2d0457203871 (Revert "i965/fs: Use align1 mode on ternary instructions on Gen10+") but who can really say since the commit neglected to list any of them! Reviewed-by: Scott D Phillips <[email protected]>
* anv: Make sure state on primary is correct after CmdExecuteCommandsAlex Smith2018-01-111-0/+9
| | | | | | | | | | | | | | | | | After executing a secondary command buffer, we need to update certain state on the primary command buffer to reflect changes by the secondary. Otherwise subsequent commands may not have the correct state set. This fixes various issues (rendering errors, GPU hangs) seen after executing secondary command buffers in some cases. v2 (Jason Ekstrand): - Reset to invalid values instead of pulling from the secondary - Change the comment to be more descriptive Signed-off-by: Alex Smith <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Cc: [email protected]
* anv: Import mako templates only during execution of anv_extensionsAndres Gomez2018-01-111-5/+5
| | | | | | | | | | | | | | | | | | anv_extensions usage from anv_icd was bringing the unwanted dependency of mako templates for the latter. We don't want that since it will force the dependency even for distributable tarballs which was not needed until now. Jason suggested this approach. v2: Patch simplification (Jason). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104551 Fixes: 0ab04ba979b ("anv: Use python to generate ICD json files") Cc: Jason Ekstrand <[email protected]> Cc: Emil Velikov <[email protected]> Signed-off-by: Andres Gomez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: fix maxDescriptorSet* limitsSamuel Iglesias Gonsálvez2018-01-111-5/+5
| | | | | | | | | | | | | | | "The maxDescriptorSet* limit is n times the corresponding maxPerStageDescriptor* limit, where n is the number of shader stages supported by the VkPhysicalDevice. If all shader stages are supported, n = 6 (vertex, tessellation control, tessellation evaluation, geometry, fragment, compute)." Fixes: dEQP-VK.api.info.device.properties Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/nir: add a helper to lower gl_PatchVerticesIn to a uniformIago Toral Quiroga2018-01-101-0/+2
| | | | | | | | | | | | v2: do not try to handle it as a system value directly for the SPIR-V path. In GL we rather handle it as a uniform like we do for the GLSL path (Jason). v3: - Remove the uniform variable, it is alwats -1 now (Jason) - Also do the lowering for the TessEval stage (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* genxml: Add missing INSTDONE_1 bits on Gen7.5+.Kenneth Graunke2018-01-094-0/+8
| | | | | | This will make aubinator_error_decode decode them properly. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel: Apply Geminilake "Barrier Mode" workaround.Kenneth Graunke2018-01-092-0/+29
| | | | | | | | | | | | | | | | | | | Apparently, Geminilake requires you to whack a chicken bit to select either compute or tessellation mode for barriers. The recommendation is to switch between them at PIPELINE_SELECT time. We may not need to do this all the time, but I don't know that it hurts either. PIPELINE_SELECT is already a pretty giant stall. This appears to fix hangs in tessellation control shaders with barriers on Geminilake. Note that this requires a corresponding kernel change, drm/i915: Whitelist SLICE_COMMON_ECO_CHICKEN1 on Geminilake. in order for the register write to actually happen. Without an updated kernel, this register write will be noop'd and the fix will not work. Reviewed-by: Rafael Antognolli <[email protected]>
* aubinator: add support for aubinating memtrace aubsScott D Phillips2018-01-081-35/+83
| | | | | | | | | Memtrace aubs are similar to classic aubs, with the major difference being how command submission is serialized (as register writes instead of a high-level submit message). Some internal tools generate or consume only memtrace aubs. Reviewed-by: Jordan Justen <[email protected]>
* aubinator: extract aubinator_init() out of the header handler functionScott D Phillips2018-01-081-16/+23
| | | | | | | A later patch will use the aubinator_init() function from the memtrace aub header handler. Reviewed-by: Jordan Justen <[email protected]>
* aubinator: honor --color option when printing the headerScott D Phillips2018-01-081-1/+5
| | | | Reviewed-by: Jordan Justen <[email protected]>
* anv: Allow PMA optimization to be enabled in secondary command buffersAlex Smith2018-01-081-1/+21
| | | | | | | | | | | | | | | | | This was never enabled in secondary buffers because hiz_enabled was never set to true for those. If the app provides a framebuffer in the inheritance info when beginning a secondary buffer, we can determine if HiZ is enabled and therefore allow the PMA optimization to be enabled within the command buffer. This improves performance by ~13% on an internal benchmark on Skylake. v2: Use anv_cmd_buffer_get_depth_stencil_view(). Signed-off-by: Alex Smith <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Take write mask into account in has_color_buffer_write_enabledAlex Smith2018-01-051-9/+18
| | | | | | | | | | | | | | | | | | | | If we have a color attachment, but its writes are masked, this would have still returned true. This is inconsistent with how HasWriteableRT in 3DSTATE_PS_BLEND is set, which does take the mask into account. This could lead to PixelShaderHasUAV not being set in 3DSTATE_PS_EXTRA if the fragment shader does use UAVs, meaning the fragment shader may not be invoked because HasWriteableRT is false. Specifically, this was seen to occur when the shader also enables early fragment tests: the fragment shader was not invoked despite passing depth/stencil. Fix by taking the color write mask into account in this function. This is consistent with how things are done on i965. Signed-off-by: Alex Smith <[email protected]> Cc: [email protected] Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Add missing unlock in anv_scratch_pool_allocAlex Smith2018-01-041-1/+3
| | | | | | | | | Fixes hangs seen due to the lock not being released here. Signed-off-by: Alex Smith <[email protected]> Cc: [email protected] Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Drop support for the legacy SNORM -> Float equation.Kenneth Graunke2018-01-029-44/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Older OpenGL defines two equations for converting from signed-normalized to floating point data. These are: f = (2c + 1)/(2^b - 1) (equation 2.2) f = max{c/2^(b-1) - 1), -1.0} (equation 2.3) Both OpenGL 4.2+ and OpenGL ES 3.0+ mandate that equation 2.3 is to be used in all scenarios, and remove equation 2.2. DirectX uses equation 2.3 as well. Intel hardware only supports equation 2.3, so Gen7.5+ systems that use the vertex fetcher hardware to do the conversions always get formula 2.3. This can make a big difference for 10-10-10-2 formats - the 2-bit value can represent 0 with equation 2.3, and cannot with equation 2.2. Ivybridge and older were using equation 2.2 for OpenGL, and 2.3 for ES. Now that Ivybridge supports OpenGL 4.2, this is wrong - we need to use the new rules, at least in core profile. That would leave Gen4-6 doing something different than all other hardware, which seems...lame. With context version promotion, applications that requested a pre-4.2 context may get promoted to 4.2, and thus get the new rules. Zero cases have been reported of this being a problem. However, we've received a report that following the old rules breaks expectations. SuperTuxKart apparently renders the cars red when following equation 2.2, and works correctly when following equation 2.3: https://github.com/supertuxkart/stk-code/issues/2885#issuecomment-353858405 So, this patch deletes the legacy equation 2.2 support entirely, making all hardware and APIs consistently use the new equation 2.3 rules. If we ever find an application that truly requires the old formula, then we'd likely want that application to work on modern hardware, too. We'd likely restore this support as a driconf option. Until then, drop it. This commit will regress Piglit's draw-vertices-2101010 test on pre-Haswell without the corresponding Piglit patch to accept either formula (commit 35daaa1695ea01eb85bc02f9be9b6ebd1a7113a1): draw-vertices-2101010: Accept either SNORM conversion formula. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Combine {VS,FS}_OPCODE_GET_BUFFER_SIZE opcodes.Kenneth Graunke2017-12-308-19/+14
| | | | | | These are the same, we don't need a separate opcode enum per backend. Reviewed-by: Jason Ekstrand <[email protected]>
* anv/device: Mark all state buffers as needing captureJason Ekstrand2017-12-281-3/+3
| | | | | | | Previously, we were flagging the instruction state buffer for capture but not surface state or dynamic state. We want those captured too. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/aubinator: Gracefully handle dynamic state not being availableJason Ekstrand2017-12-281-0/+5
| | | | | | | | Some older versions of the Vulkan driver didn't properly tag dynamic state as needing to be captured. Also, this prevents crashes when looking at dumps on older kernels. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/aubinator: Free section data lastJason Ekstrand2017-12-281-2/+4
| | | | | | | | | We were walking the sections, printing the batches, and then freeing them in one pass. If the batch happens to reference any earlier sections (which it almost certainly will since it's at the end), we will access freed memory. Reviewed-by: Lionel Landwerlin <[email protected]>
* Revert "i965/fs: Use align1 mode on ternary instructions on Gen10+"Anuj Phogat2017-12-221-8/+4
| | | | | | | | | | | This reverts commit 9cd60fce9c22737000a8f8dc711141f8a523fe75. Above commit caused 2000+ piglit tests to assert fail. Disabling the align1 mode on gen10 for now to avoid failures. Cc: Matt Turner <[email protected]> Cc: Rafael Antognolli <[email protected]> Signed-off-by: Anuj Phogat <[email protected]> Tested-by: Rafael Antognolli <[email protected]>
* intel/fs: Initialize fs_visitor::grf_used on construction.Francisco Jerez2017-12-211-0/+1
| | | | | | | | | | | | | | | This should shut up some Valgrind errors during pre-regalloc scheduling. The errors were harmless since they could only have led to the estimation of the bank conflict penalty of an instruction pre-regalloc, which is inaccurate at that point of the program compilation, but no less accurate than the intended "return 0" fall-back path. The scheduling pass is normally re-run after regalloc with a well-defined grf_used value and accurate bank conflict information. Fixes: acf98ff933d "intel/fs: Teach instruction scheduler about GRF bank conflict cycles." Reported-by: Eero Tamminen <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* intel/fs/bank_conflicts: Use posix_memalign() instead of overaligned new to ↵Francisco Jerez2017-12-211-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | obtain vector storage. The weight_vector_type constructor was inadvertently assuming C++17 semantics of the new operator applied on a type with alignment requirement greater than the largest fundamental alignment. Unfortunately on earlier C++ dialects the implementation was allowed to raise an allocation failure when the alignment requirement of the allocated type was unsupported, in an implementation-defined fashion. It's expected that a C++ implementation recent enough to implement P0035R4 would have honored allocation requests for such over-aligned types even if the C++17 dialect wasn't active, which is likely the reason why this problem wasn't caught by our CI system. A more elegant fix would involve wrapping the __SSE2__ block in a '__cpp_aligned_new >= 201606' preprocessor conditional and continue taking advantage of the language feature, but that would yield lower compile-time performance on old compilers not implementing it (e.g. GCC versions older than 7.0). Fixes: af2c320190f3c731 "intel/fs: Implement GRF bank conflict mitigation pass." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104226 Reported-by: Józef Kucia <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* anv: disallow VK_REMAINING_ARRAY_LAYERS in vkCmdClearAttachments()Samuel Iglesias Gonsálvez2017-12-201-0/+2
| | | | | | | | Vulkan spec doesn't specify that VK_REMAINING_ARRAY_LAYERS is allowed in the passed VkClearRect struct. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler/gen10: Disable push constants.Rafael Antognolli2017-12-191-0/+9
| | | | | | | | | | | | We still have gpu hangs on Cannonlake when using push constants, so disable them for now until we have a proper fix for these hangs. v2: Add warning message when creating context too. Signed-off-by: Rafael Antognolli <[email protected]> Cc: Ben Widawsky <[email protected]> Cc: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* anv: Remove unused variable.Bas Nieuwenhuizen2017-12-171-2/+0
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* isl: Don't require VALIGN_2 for R32G32B32_FLOAT on Haswell.Kenneth Graunke2017-12-151-1/+3
| | | | | | | | | | | | | | According to the RENDER_SURFACE_STATE internal documentation, the R32G32B32_FLOAT restriction is marked "IVB" only. We choose to apply it to Ivybridge and Baytrail, but not Haswell. Apparently fixes KHR-GL46.texture_size_promotion.functional on Haswell. Changes these tests from crashing to skipping on Haswell: - KHR-GL46.direct_state_access.textures_storage_multisample_2d_rgb32f - KHR-GL46.direct_state_access.textures_storage_multisample_3d_rgb32f Reviewed-by: Jason Ekstrand <[email protected]>
* intel/tools: Convert aubinator over to the common frameworkJason Ekstrand2017-12-143-690/+33
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/batch-decoder: Decode registersJason Ekstrand2017-12-141-0/+13
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/batch-decoder: Decode dynamic stateJason Ekstrand2017-12-141-0/+81
| | | | | | | | Unfortunately, in aubinator and aubinator_error_decode we don't always know how many of a given state we have, so we must guess. One day, we'll come up with a way to annotate the batch to solve this problem. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/batch-decoder: Decode constants, binding tables, and samplersJason Ekstrand2017-12-141-0/+73
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/tools: Switch aubinator_error_decode over to the gen_print_batchJason Ekstrand2017-12-143-205/+37
| | | | | | | The shared framework can now do everything that aubinator_error_decode ever did and more. It's time to make the switch. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/batch-decoder: Decode graphics shadersJason Ekstrand2017-12-141-0/+95
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/batch-decoder: Decode vertex and index buffersJason Ekstrand2017-12-142-0/+161
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/batch-decoder: Decode MEDIA_INTERFACE_DESCRIPTOR_LOADJason Ekstrand2017-12-141-0/+145
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/tools: Add the start of a generic batch decoderJason Ekstrand2017-12-142-0/+306
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/decoder: Expose the raw field value in the iteratorJason Ekstrand2017-12-142-1/+3
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/disasm: Take a devinfo in gen_disasm_createJason Ekstrand2017-12-144-8/+7
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/decoder: Take a bit offset in gen_print_groupJason Ekstrand2017-12-144-18/+22
| | | | | | | | | | Previously, if a group was nested in another group such that it didn't start on a dword boundary, we would decode it as if it started at the start of its first dword. This changes things to work even more in terms of bits so that we can properly decode these structs. This affects MOCS, attribute swizzles, and several other things. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/decoder: Stop rounding down to the nearest dwordJason Ekstrand2017-12-141-11/+12
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/decoder: Convert the iterator to work entirely in bitsJason Ekstrand2017-12-142-12/+9
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/decoder: Drop gen_field_decode helperJason Ekstrand2017-12-142-11/+0
| | | | | | It's unused Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/fs/bank_conflicts: Don't touch Gen7 MRF hack registers.Francisco Jerez2017-12-123-7/+19
| | | | | | | Fixes: af2c320190f3c731 "intel/fs: Implement GRF bank conflict mitigation pass." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104199 Reported-by: Darius Spitznagel <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* anv: fix bug when using component qualifier in FS outputsSamuel Iglesias Gonsálvez2017-12-121-19/+44
| | | | | | | | | | | | | | | | | | | | | | | We can write to the same output but in different components, like in this example: layout(location = 0, component = 0) out ivec2 dEQP_FragColor_0; layout(location = 0, component = 2) out ivec2 dEQP_FragColor_1; Therefore, they are not two different outputs but only one. Fixes: dEQP-VK.glsl.440.linkage.varying.component.frag_out.* v3: - Remove FRAG_RESULT_MAX. - Add const and use sizeof (Ian). - Do three-pass to set properly the locations of fragment outputs when having arrays (Jason). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Enable UBO pushingJason Ekstrand2017-12-082-0/+7
| | | | | | | | | | | | | Push constants on Intel hardware are significantly more performant than pull constants. Since most Vulkan applications don't actively use push constants on Vulkan or at least don't use it heavily, we're pulling way more than we should be. By enabling pushing chunks of UBOs we can get rid of a lot of those pulls. On my SKL GT4e, this improves the performance of Dota 2 and Talos by around 2.5% and improves Aztec Ruins by around 2%. Reviewed-by: Jordan Justen <[email protected]>