summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* genxml: Fix PIPELINE_SELECT on G45/Ironlake.Kenneth Graunke2017-11-162-2/+2
| | | | | | | | | Original 965 sets bits 28:27 to 0, while G45 and later set it to 1. Note that the G45 docs are incorrect in this regard - see the DevCTG+ note in the Ironlake PRMs. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel: Drop mtypes.h include from brw_compiler.h.Kenneth Graunke2017-11-151-1/+0
| | | | This isn't necessary and causes trouble for a project I'm working on.
* i965: Use nir_lower_atomics_to_ssbos and delete ABO compiler code.Kenneth Graunke2017-11-155-128/+0
| | | | | | | | | | | | We use the same hardware mechanism for both atomic counters and SSBO atomics, so there's really no benefit to maintaining separate code to handle each case. Instead, we can just use Rob's shiny new NIR pass to convert atomic_uints to SSBOs, and delete piles of code. The ssbo_start section of the binding table becomes a combined ABO and SSBO section, with ABOs first, then SSBOs. Reviewed-by: Jason Ekstrand <[email protected]>
* anv/gen10: Enable float blend optimizationAnuj Phogat2017-11-141-0/+12
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/genxml: Add Cache Mode SubSlice Register to gen10.xmlAnuj Phogat2017-11-141-0/+12
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* anv/gen10: Implement WaSampleOffsetIZ workaroundAnuj Phogat2017-11-141-0/+61
| | | | | | | | | We already have this workaround in OpenGL driver. See Mesa commit 3cf4fe2219. Signed-off-by: Anuj Phogat <[email protected]> Cc: Nanley Chery <[email protected]> Cc: Rafael Antognolli <[email protected]>
* Revert "intel/fs: Use a pure vertical stride for large register strides"Matt Turner2017-11-141-13/+3
| | | | | | | This reverts commit e8c9e65185de3e821e1e482e77906d1d51efa3ec. With the actual bug fixed (by commit 6ac2d1690192), this is not necessary. I'm doubtful of its correctness in any case.
* i965/fs: Fix extract_i8/u8 to a 64-bit destinationMatt Turner2017-11-141-2/+23
| | | | | | | | | | | | | | | The MOV instruction can extract bytes to words/double words, and words/double words to quadwords, but not byte to quadwords. For unsigned byte to quadword, we can read them as words and AND off the high byte and extract to quadword in one instruction. For signed bytes, we need to first sign extend to word and the sign extend that word to a quadword. Fixes the following test on CHV, BXT, and GLK: KHR-GL46.shader_ballot_tests.ShaderBallotBitmasks Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103628 Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Split all 32->64-bit MOVs on CHV, BXT, GLKMatt Turner2017-11-141-4/+4
| | | | | | | Fixes the following tests on CHV, BXT, and GLK: KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint32_to_int64 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103115
* intel/blorp: Make the MOCS setting part of blorp_addressJason Ekstrand2017-11-134-18/+18
| | | | | | | | This makes our MOCS settings significantly more flexible. Cc: "17.3" <[email protected]> Tested-by: Lyude Paul <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv/blorp: Add a device parameter to blorp_surf_for_anv_imageJason Ekstrand2017-11-131-22/+34
| | | | | | Cc: "17.3" <[email protected]> Tested-by: Lyude Paul <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/blorp: Use mocs.tex for depth stencilJason Ekstrand2017-11-131-5/+1
| | | | | | Cc: "17.3" <[email protected]> Tested-by: Lyude Paul <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/tools/error: Decode compute shaders.Kenneth Graunke2017-11-131-7/+42
| | | | | | | | | | | | This is a bit more annoying than your average shader - we need to look at MEDIA_INTERFACE_DESCRIPTOR_LOAD in the batch buffer, then hop over to the dynamic state buffer to read the INTERFACE_DESCRIPTOR_DATA, then hop over to the instruction buffer to decode the program. Now that we store all the buffers before decoding, we can actually do this fairly easily. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/tools/error: Use do-while for field iterator loops.Kenneth Graunke2017-11-131-6/+6
| | | | | | | | | | | | while loops skip the first field of the instruction/structure, which is not what the code intended. It works out because the field we're looking for doesn't happen to be first, but we ought to do it right regardless. Found while writing the next patch, where Kernel Start Pointer is the first field of INTERFACE_DESCRIPTOR_DATA. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/tools/error: Decode shaders while decoding batch commands.Kenneth Graunke2017-11-131-85/+49
| | | | | | | | | | | | | This makes aubinator_error_decode's shader dumping work like aubinator. Instead of printing them after the fact, it prints them right inside the 3DSTATE_VS/HS/DS/GS/PS packet that references them. This saves you the effort of cross-referencing things and jumping back and forth. It also reduces a bunch of book-keeping, and eliminates the limitation that we could only handle 4096 programs. That code was also broken and failed to print any shaders if there were under 4096 programs. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/tools/error: Save error state sections and decode them later.Kenneth Graunke2017-11-131-37/+58
| | | | | | | | | This lets us complete parsing and storing of each buffer's data before we begin decoding the batchbuffer. This makes it possible to inspect the state buffer and program buffer, so we can properly decode any indirect state or shader programs. Reviewed-by: Chris Wilson <[email protected]>
* intel/tools/error: Fix null termination of ring name string.Kenneth Graunke2017-11-131-0/+1
| | | | | | Ported from intel_error_decode. We don't want to run off the end. Reviewed-by: Chris Wilson <[email protected]>
* intel/tools/error: Drop unused MAX_RINGS #define.Kenneth Graunke2017-11-131-2/+0
| | | | | | Dead code. Reviewed-by: Chris Wilson <[email protected]>
* intel/tools/error: Refactor buffer matching, add more buffers.Kenneth Graunke2017-11-131-62/+30
| | | | | | | | | | Based on a similar patch to intel_error_decode by Chris Wilson. While we're de-duplicating the gtt_offset calculation, we can simplify it to assume two hex digits are there - the kernel has done this since v4.6, and we already require error states from v4.10. Reviewed-by: Chris Wilson <[email protected]>
* intel/tools/error: Only decode a few sections of error states.Kenneth Graunke2017-11-131-1/+3
| | | | | | These three are the only we can reasonably decode with genxml. Reviewed-by: Chris Wilson <[email protected]>
* intel/tools/error: Drop unused parameters from decode() helper.Kenneth Graunke2017-11-131-5/+3
| | | | | | | | Also change count from a pointer into a value. We were supposed to be resetting it to 0 (and failed to), but that's gone since we dropped the pre-ascii85 handling. Reviewed-by: Chris Wilson <[email protected]>
* intel/tools/error: Drop support for non-ascii85 encoded error states.Kenneth Graunke2017-11-131-35/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Error state files used to look like: render ring --- gtt_offset = 0x0e8f6000 00000000 : 69040000 00000004 : 79090000 ... 00007ffc : 00000000 --- ringbuffer = 0x00001000 There were thousands of lines between sections. The file format changed with Kernel 4.10, and now has a single ascii85-encoded line following each section heading. This is much easier to parse. There are a bunch of bugs in our handling of the old style format, where we'd decode the wrong data, at the wrong time. Fixing all of these is going to be a giant pain. It's also a lot of extra code complexity. In order to properly decode indirect state, or compute shaders, we'll also need to parse data in advance of decoding, which is going to be a giant pain with this ad-hoc "decode everywhere!" mentality. So, let's just drop support for the older file format. This unfortunately requires an error state generated by Kernel 4.10 or later. That's probably not the end of the world, as we encourage users to upgrade to the latest kernel when encountering GPU hangs anyway. It might be a giant pain for people with LTS kernels, though... Reviewed-by: Chris Wilson <[email protected]>
* intel/tools/error: Do ascii85 decode first.Kenneth Graunke2017-11-131-31/+29
| | | | | | | | | | The dashes "---" may occur within an ascii85 block, but only an ascii85 block starts with ':' or '~'. Ported from Chris Wilson's intel-gpu-tools commit: bceec7e1d8a160226b783c6344eae8cbf4ece144 Reviewed-by: Chris Wilson <[email protected]>
* meson: Don't build intel shared components by defaultDylan Baker2017-11-133-5/+0
| | | | | | | | It's a neat idea, and still useful in some cases, but the intel common code is used by i965 and anvil only, this is a little clearer. Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* aubinator: Don't skip the first field in each subgroupJason Ekstrand2017-11-131-2/+3
| | | | | | | | | | The previous iteration algorithm would advance the field pointer right after we advance the group. This meant that you would end up with skipping the first field of the group. In the common case, where the only field is a struct (e.g. 3DSTATE_VERTEX_BUFFERS), it would get skipped entirely. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/genxml: Delete empty groupsJason Ekstrand2017-11-134-8/+0
| | | | | | | | | They serve no purpose other than to just fill empty space in the packet so each dword has something. Just disallowing empty groups is a bit easier on some of the tools. This does not change the generated packing headers in any way. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Don't crash on invalid heap sizes when the PCI ID is overridenJason Ekstrand2017-11-131-0/+12
|
* intel/tools: Fix detection of enabled shader stages.Kenneth Graunke2017-11-121-1/+1
| | | | | | | | | | We renamed "Function Enable" to "Enable", which broke our detection of whether shaders are enabled or not. So, we'd see a bunch of HS/DS packets with program offsets of 0, and think that was a valid TCS/TES. Fixes: c032cae9ff77e (genxml: Rename "Function Enable" to "Enable".) Reviewed-by: Lionel Landwerlin <[email protected]>
* autotools: Set C++ visibility flags on IntelDylan Baker2017-11-101-0/+3
| | | | | | | | | These flags are set for C sources, but not C++. This causes symbol visibility leaks from the C++ parts of the Intel compiler. Fixes: 700bebb958e93f4d ("i965: Move the back-end compiler to src/intel/compiler") Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* anv/meson: Generate dev_icd.jsonChad Versace2017-11-091-0/+12
| | | | | | | I tested this in a setup where the builddir was outside of the srcdir. Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* anv: Fix architecture in intel_icd.{arch}.jsonChad Versace2017-11-091-1/+1
| | | | | | | | | | | Use the host arch, not the target arch. In Meson and in recent Autotools, the host arch is where the binary will be used. The target arch is useful only when compiling a compiler. See: http://mesonbuild.com/Cross-compilation.html See: https://www.gnu.org/software/automake/manual/html_node/Cross_002dCompilation.html Reported-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* anv: Refactor anv_GetImageSubresourceLayout()Chad Versace2017-11-091-21/+11
| | | | | | | | Its helper function, anv_surface_get_subresource_layout(), was not very helpful. So fold it into the main function. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/image: Refactor choice of isl_tiling_flags_tChad Versace2017-11-091-13/+31
| | | | | | | | | | Instead of choosing the tiling flags inside make_surface(), which is called once per aspect in a loop, and which chooses the same tiling for each aspect, choose the tiling flags exactly once before entering the aspect loop. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Refactor anv_get_format_plane() - explicit unsupportedChad Versace2017-11-091-4/+4
| | | | | | | | | | | | The same local variable, 'plane_format', was returned on success *and* failure. Be more explicit in distinguishing the two cases: return 'plane_format' on success and return 'unsupported' on failure. This simplifies the diff in upcoming patches for VK_EXT_image_drm_format_modifier. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Remove anv_physical_device_get_format_properties()Chad Versace2017-11-091-23/+13
| | | | | | | | Fold its body into its sole caller, anv_GetPhysicalDeviceFormatProperties(). Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Simplify anv_physical_device_get_format_properties()Chad Versace2017-11-091-16/+9
| | | | | | | | | Now that get_image_format_properties() returns the correct VkFormatFeatureFlags, we can remove the unneeded if-branch and some local variables. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Simplify anv_get_image_format_properties()Chad Versace2017-11-091-14/+3
| | | | | | | | | Now that get_image_format_features() has a VkImageTiling parameter, we can bypass anv_physical_device_get_format_properties() and call get_image_format_features() directly. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Rename get_image_format_properties()Chad Versace2017-11-091-12/+12
| | | | | | | | | | | | | The name is misleading. It looks like vkGetPhysicalDeviceImageFormatProperties(), but it actually implement vkGetPhysicalDeviceFormatProperties. Let's rename it to what it actually does, get_image_format_features(), because it returns VkFormatFeatureFlags. For consistency, also rename get_buffer_format_properties() to get_buffer_format_features(). Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Fix get_image_format_properties() - YCbCrChad Versace2017-11-091-46/+40
| | | | | | | | | | | | Teach it to calculate the format features for YCbCr. The goal (which is completed in this patch) is to incrementally fix get_image_format_properties() to return a correct result. Previously, it returned incorrect VkFormatFeatureFlags which the caller needed clean up. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Fix get_image_format_properties() - 3-channel formatsChad Versace2017-11-091-19/+15
| | | | | | | | | | | Teach it to calculate the format features for 3-channel formats. The goal is to incrementally fix get_image_format_properties() to return a correct result. Currently, it returns incorrect VkFormatFeatureFlags which the caller must clean up. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Refactor get_image_format_properties() - Reduce paramsChad Versace2017-11-091-11/+21
| | | | | | | | | | | | Replace parameters 'enum isl_format' and 'struct anv_format_plane' with new parameter 'const struct anv_format *'. The goal is to incrementally fix get_image_format_properties() to return a correct result. Currently, it returns incorrect VkFormatFeatureFlags which the caller must clean up. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Refactor get_image_format_properties() - base_isl_formatChad Versace2017-11-091-3/+4
| | | | | | | Rename parameter 'base' to 'base_isl_format'. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Refactor get_image_format_properties() - plane_formatChad Versace2017-11-091-8/+8
| | | | | | | Rename parameter 'format' to 'plane_format'. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Refactor get_image_format_properties() - ASTCChad Versace2017-11-091-4/+5
| | | | | | | | | | | | | Teach it to calculate the format features for ASTC. The goal is to incrementally fix get_image_format_properties() to return a correct result. Currently, it returns incorrect VkFormatFeatureFlags which the caller must clean up. v2: New commit message Reviewed-by: Jason Ekstrand <[email protected]> (v1) Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Refactor get_image_format_properties() - depthstencil (v2)Chad Versace2017-11-091-16/+31
| | | | | | | | | | | | | Teach it to calculate the features of depthstencil formats. The goal is to incrementally fix get_image_format_properties() to return a correct result. Currently, it returns incorrect VkFormatFeatureFlags which the caller must clean up. v2: New commit message Reviewed-by: Jason Ekstrand <[email protected]> (v1) Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Better types for 'aspect' function paramsChad Versace2017-11-093-13/+5
| | | | | | | | | Some functions have a comment that says "Exactly one bit must be in 'aspect'". So change the type of their 'aspect' parameter from VkImageAspectFlags to VkImageAspectFlagBits. Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Refactor get_buffer_format_properties()Chad Versace2017-11-091-15/+29
| | | | | | | | | | | Make it a stand-alone function. Pre-patch, for some formats the function returned incorrect VkFormatFeatureFlags which were cleaned up by the caller. This prepares for a cleaner implementation of VK_EXT_image_drm_format_modifier. Reviewed-by: Jason Ekstrand <[email protected]>
* anv: fix build failureNicolai Hähnle2017-11-091-2/+2
| | | | Fixes: e3a8013de8ca ("util/u_queue: add util_queue_fence_wait_timeout")
* intel/nir: Use the correct indirect lowering masks in link_shadersJason Ekstrand2017-11-081-6/+4
| | | | | | | | | Previously, if we were linking a vec4 VS with a SIMD8/16 FS, we wouldn't lower indirects on the fragment shader which is wrong. Instead of using a single indirect mask, take advantage of our new little helper. Reviewed-by: Timothy Arceri <tarceri at itsqueeze.com> Cc: [email protected]
* mesa: Add new fast mtx_t mutex type for basic use casesTimothy Arceri2017-11-091-23/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While modern pthread mutexes are very fast, they still incur a call to an external DSO and overhead of the generality and features of pthread mutexes. Most mutexes in mesa only needs lock/unlock, and the idea here is that we can inline the atomic operation and make the fast case just two intructions. Mutexes are subtle and finicky to implement, so we carefully copy the implementation from Ulrich Dreppers well-written and well-reviewed paper: "Futexes Are Tricky" http://www.akkadia.org/drepper/futex.pdf We implement "mutex3", which gives us a mutex that has no syscalls on uncontended lock or unlock. Further, the uncontended case boils down to a cmpxchg and an untaken branch and the uncontended unlock is just a locked decr and an untaken branch. We use __builtin_expect() to indicate that contention is unlikely so that gcc will put the contention code out of the main code flow. A fast mutex only supports lock/unlock, can't be recursive or used with condition variables. We keep the pthread mutex implementation around as for the few places where we use condition variables or recursive locking. For platforms or compilers where futex and atomics aren't available, simple_mtx_t falls back to the pthread mutex. The pthread mutex lock/unlock overhead shows up on benchmarks for CPU bound applications. Most CPU bound cases are helped and some of our internal bind_buffer_object heavy benchmarks gain up to 10%. Signed-off-by: Kristian Høgsberg <[email protected]> Signed-off-by: Timothy Arceri <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>