summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* nir: Rework conversion opcodesJason Ekstrand2017-03-145-77/+55
| | | | | | | | | | | | | | | | | | | | | | | | The NIR story on conversion opcodes is a mess. We've had way too many of them, naming is inconsistent, and which ones have explicit sizes was sort-of random. This commit re-organizes things and makes them all consistent: - All non-bool conversion opcodes now have the explicit size in the destination and are named <src_type>2<dst_type><size>. - Integer <-> integer conversion opcodes now only come in i2i and u2u forms (i2u and u2i have been removed) since the only difference between the different integer conversions is whether or not they sign-extend when up-converting. - Boolean conversion opcodes all have the explicit size on the bool and are named <src_type>2<dst_type>. Making things consistent also allows nir_type_conversion_op to be moved to nir_opcodes.c and auto-generated using mako. This will make adding int8, int16, and float16 versions much easier when the time comes. Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Re-arrange conversion operationsJason Ekstrand2017-03-141-36/+31
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/vec4: Get rid of the type parameter from to/from_doubleJason Ekstrand2017-03-142-24/+15
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/fs: Use num_components from the SSA def in image intrinsicsJason Ekstrand2017-03-141-2/+1
| | | | | | Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* anv/blorp: Only set a clear color for resolves if fast-clearedJason Ekstrand2017-03-141-1/+2
| | | | | | | Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Connor Abbott <[email protected]> Cc: "17.0" <[email protected]>
* anv/blorp: Turn off AUX after doing a CCS_D resolveJason Ekstrand2017-03-141-0/+2
| | | | | | | | | | For render passes with multiple subpasses on gen7, we only fast-clear at the top but an input attachment use can cause us to do a resolve in the middle of the render pass. Once we've done so, we are no longer have a fast-cleared surface so we can just set aux_usage to NONE. Reviewed-by: Topi Pohjolainen <[email protected]> Cc: "17.0" <[email protected]>
* android: add '/vulkan' to libmesa_anv_entrypoints pathTapani Pälli2017-03-141-2/+2
| | | | | | | otherwise generated entrypoint headers are not found during build Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* android: add src/intel/compiler to libmesa_intel_compiler includesTapani Pälli2017-03-141-0/+1
| | | | | | | | fixes build error when brw_nir.h not found in the generated file brw_nir_trig_workarounds.c. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* anv: Add missing error-checking to anv_CreateDevice (v3)Gwan-gyeong Mun2017-03-131-9/+56
| | | | | | | | | | | | | | | | | This patch adds missing error-checking and fixes resource leak in allocation failure path on anv_CreateDevice() v2: Fixes from Jason Ekstrand's review a) Add missing destructors for all of the state pools on allocation failure path b) Add missing destructor for batch bo pools on allocation failure path v3: Fixes from Emil Velikov's review Add missing destructor for queue and scratch_pool on allocation failure path Signed-off-by: Mun Gwan-gyeong <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Use vk_outarray in vkGetPhysicalDeviceQueueFamilyPropertiesChad Versace2017-03-131-55/+18
| | | | | | No intended change in behavior. Just a refactor. Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Use vk_outarray in vkEnumeratePhysicalDevices (v2)Chad Versace2017-03-131-27/+4
| | | | | | | | | No intended change in behavior. Just a refactor. v2: Replace vk_outarray_is_incomplete() with vk_outarray_status(). For Jason. Reviewed-by: Jason Ekstrand <[email protected]>
* intel: genxml: prevent missing ; with address fields dwordsLionel Landwerlin2017-03-131-28/+26
| | | | | | | | | | | | | | | | Before this change, the generator could print this kind of things : const uint32_t v0 = __gen_uint(values->ValidBit, 0, 0) | __gen_uint(values->FaultType, 1, 2) | __gen_uint(values->SRCIDofFault, 3, 10) | __gen_uint(values->GTTSEL, 11, 1) | dw[0] = __gen_combine_address(data, &dw[0], values->VirtualAddressofFault, v0); This change fix the trailing '|'. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Use on-the-fly surface states for dynamic buffer descriptorsJason Ekstrand2017-03-137-245/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a performance problem with dynamic buffer descriptors. Because we are currently implementing them by pushing an offset into the shader and adding that offset onto the already existing offset for the UBO/SSBO operation, all UBO/SSBO operations on dynamic descriptors are indirect. The back-end compiler implements indirect pull constant loads using what basically amounts to a texelFetch instruction. For pull constant loads with constant offsets, however, we use an oword block read message which goes through the constant cache and reads a whole cache line at a time. Because of these two things, direct pull constant loads are much faster than indirect pull constant loads. Because all loads from dynamically bound buffers are indirect, the user takes a substantial performance penalty when using this "performance" feature. There are two potential solutions I have seen for this problem. The alternate solution is to continue pushing offsets into the shader but wire things up in the back-end compiler so that we use the oword block read messages anyway. The only reason we can do this because we know a priori that the dynamic offsets are uniform and 16-byte aligned. Unfortunately, thanks to the 16-byte alignment requirement of the oword messages, we can't do some general "if the indirect offset is uniform, use an oword message" sort of thing. This solution, however, is recommended for a few of reasons: 1. Surface states are relatively cheap. We've been using on-the-fly surface state setup for some time in GL and it works well. Also, dynamic offsets with on-the-fly surface state should still be cheaper than allocating new descriptor sets every time you want to change a buffer offset which is really the only requirement of the dynamic offsets feature. 2. This requires substantially less compiler plumbing. Not only can we delete the entire apply_dynamic_offsets pass but we can also avoid having to add architecture for passing dynamic offsets to the back- end compiler in such a way that it can continue using oword messages. 3. We get robust buffer access range-checking for free. Because the offset and range are baked into the surface state, we no longer need to pass ranges around and do bounds-checking in the shader. 4. Once we finally get UBO pushing implemented, it will be much easier to handle pushing chunks of dynamic descriptors if the compiler remains blissfully unaware of dynamic descriptors. This commit improves performance of The Talos Principle on ULTRA settings by around 50% and brings it nicely into line with OpenGL performance. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Stall before fast-clear operationsJason Ekstrand2017-03-131-6/+19
| | | | | | | | | | | | | | | During initial CCS bring-up, I discovered that you have to do a full CS stall prior to doing a CCS resolve as well as afterwards. It appears that the same is needed for fast-clears as well. This fixes rendering corruptions on The Talos Principle on Sky Lake GT4. The issue hasn't been demonstrated on any other hardware however, given that this appears to be a "too many things in the pipe" problem, having it be easier to reproduce on a system with more EUs makes sense. The issues with resolves is demonstrable on a GT3 or GT2 so this is probably also a problem on all GTs. Reviewed-by: Topi Pohjolainen <[email protected]> Cc: "13.0 17.0" <[email protected]>
* anv: Accurately advertise dynamic descriptor limitsJason Ekstrand2017-03-131-2/+2
| | | | | | | | | | The number of dynamic descriptors is limited by both the number of descriptors and the total number of dynamic things. Because there isn't a single "maximum dynamic things" limit, we need to divide by two so that they can create the maximum of both UBOs and SSBOs. Reviewed-by: Eduardo Lima Mitev <[email protected]> Cc: "17.0 13.0" <[email protected]>
* anv: Add a helper for working with VK_WHOLE_SIZE for buffersJason Ekstrand2017-03-134-11/+28
| | | | Reviewed-by: Plamena Manolova <[email protected]>
* aubinator/genxml: use gzipped files to store embedded genxmlLionel Landwerlin2017-03-133-17/+66
| | | | | | | | | | | | This reduces the size of the aubinator binary from ~1.4Mb to ~700Kb. With can now drop the checks on xxd in configure. v2: Fix incorrect makefile dependency (Lionel) v3: use $(PYTHON2) (Emil) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* intel: genxml: add script to generate gzipped genxmlLionel Landwerlin2017-03-132-0/+48
| | | | | | | | | | | | | v2 (from Dylan): Add main function Add missing Copyright Use print_function v3: Add actually license (Dylan) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Dylan Baker <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* intel: fix compiler buildIago Toral Quiroga2017-03-131-0/+7
| | | | | | | | | compiler/brw_vec4_gs_visitor.cpp:744:39: error: ‘GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES’ was not declared in this scope output_vertex_size_bytes <= GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES); Fixes: d0d4a5f43b4 ("i965: split EU defines to brw_eu_defines.h") Reviewed-by: Emil Velikov <[email protected]>
* intel/vulkan: Get rid of recursive makeJason Ekstrand2017-03-137-301/+305
| | | | | | | | v2 [Emil Velikov] - Various fixes and initial stab at the Android build. - Keep the generation rules/EXTRA_DIST outside the conditional Reviewed-by: Jason Ekstrand <[email protected]>
* intel/tools: Use a makefile included from intel/Makefile.amJason Ekstrand2017-03-132-37/+19
| | | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: whitespace cleanupsEmil Velikov2017-03-132-5/+0
| | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: link all tests again gtest, even test_eu_compact"Emil Velikov2017-03-132-27/+13
| | | | | | | | | | | | At the moment all the tests but test_eu_compact are actual C++ gtests. To simplify things, we can move the gtest.la to the common TEST_LIBS. As we're here, we can rename change the test extension [to .cpp] to avoid using the confusing dummy.cpp. Add a nice comment in the makefile for posterity. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Move the back-end compiler to src/intel/compilerJason Ekstrand2017-03-13112-23/+63984
| | | | | | | | | | | | | | | | | | | | | | Mostly a dummy git mv with a couple of noticable parts: - With the earlier header cleanups, nothing in src/intel depends files from src/mesa/drivers/dri/i965/ - Both Autoconf and Android builds are addressed. Thanks to Mauro and Tapani for the fixups in the latter - brw_util.[ch] is not really compiler specific, so it's moved to i965. v2: - move brw_eu_defines.h instead of brw_defines.h - remove no-longer applicable includes - add missing vulkan/ prefix in the Android build (thanks Tapani) v3: - don't list brw_defines.h in src/intel/Makefile.sources (Jason) - rebase on top of the oa patches [Emil Velikov: commit message, various small fixes througout] Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: split EU defines to brw_eu_defines.hEmil Velikov2017-03-131-1/+1
| | | | | | | | | | | | | | | | | | | Split out the EU defines from the 'generic' ones, as the former are more compiler oriented. With a later commit we'll move brw_eu_defines.h alongside the compiler infra to src/intel/. Pulling all the defines in there seems overzealous. Some defines are used by both i965 and the i965 compiler. Those are moved to brw_eu_defines.h, and annotated accordingly. The i965 users were updated to have the extre include to indicate that. With future work we might provide a better, split but for now this seems reasonable. Cc: Kenneth Graunke <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Stop including brw_context.hJason Ekstrand2017-03-131-1/+1
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* intel/isl: Stop linking libi965_compiler.la into testsJason Ekstrand2017-03-131-1/+0
| | | | | Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* vulkan/wsi: Generate wayland protocol headers separately from EGLJason Ekstrand2017-03-131-7/+0
| | | | | | | | | | | | | | | | Previously, we were depending on EGL for generating the headers and providing the protocol symbols. However, since neither Vulkan driver actually wants to link against EGL, this is kind of pointless. It also creates a weird build dependency. v2 [Jason] - Add missing wsi/ prefix, MKDIR_GEN v3 [Emil Velikov] - include BUILT_SOURCES/generation rules outside of conditional Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/wsi: Don't include wayland headersJason Ekstrand2017-03-131-3/+0
| | | | | | | | | | | Unused and we'll rework the way wayland-drm-client-protocol.h is generated with later commit. v2 [Emil] - Also remove wayland-client.h Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* genxml: remove shebang from gen_pack_header.pyEmil Velikov2017-03-101-1/+0
| | | | | | | Analogous to earlier commit(s). Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* anv: change BLOCK_POOL_MEMFD_SIZE to exactly 2GBTapani Pälli2017-03-081-1/+1
| | | | | | | | | | This is what comment above definition says and change fixes issue with 32bit build where BLOCK_POOL_MEMFD_SIZE is used as ftruncate parameter and constant currently gets converted from 4294967296 to 0. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Plamena Manolova <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Remove use of deprecated drm_intel_aub routinesChris Wilson2017-03-072-20/+18
| | | | | | | | | | | | | | With mesa/drm commit cd2f91e18db087edf93fed828e568ee53b887860 Author: Kristian Høgsberg Kristensen <[email protected]> Date: Fri Jul 31 10:47:50 2015 -0700 intel: Drop aub dumping functionality the drm_intel_aub routines are mere stubs and do nothing. Likewise remove our invocations. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv: Make the framebuffer-renderpass format assert non-fatalJason Ekstrand2017-03-071-1/+1
| | | | | | | | This should let Dota 2 run on debug builds though it will spew errors like mad. Hopefully, Valve will get this fixed sooner rather than later. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Drop the anv_validate block helperJason Ekstrand2017-03-072-13/+3
| | | | | | | | | | | Over the course of driver development, we've come up with a number of different schemes for adding giant blocks of asserts inside the driver. This one is only being used once in anv_pipeline.c and the way it's being used actually generates compiler warnings in release builds. This commit drops the anv_validate macro and just puts the contents of the one validation function in side of a "#ifdef DEBUG" guard. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Get rid of the stub() macrosJason Ekstrand2017-03-073-17/+5
| | | | | | | | | Except for a few unimplemented things on gen7, we don't really have stubs anymore so we should drop this. This commit replaces the few gen7 stub() calls with explicitly labeled finishme's and makes the sparse binding stuff silently no-op or return a FEATURE_NOT_PRESENT error. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Remove a pointless finishmeJason Ekstrand2017-03-071-4/+0
| | | | | | We've been supporting multiple shaders per module for some time now. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Convert the HiZ finishme's to perf_warnJason Ekstrand2017-03-071-4/+4
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Add a performance warning helperJason Ekstrand2017-03-072-0/+27
| | | | | | | This acts identically to anv_finishme except that it only dumps out these nice log messages if you run with INTEL_DEBUG=perf. Reviewed-by: Lionel Landwerlin <[email protected]>
* i965: don't require 64bit cmpxchgGrazvydas Ignotas2017-03-061-3/+11
| | | | | | | | | | | | There are still some distributions trying to support unfortunate people with old or exotic CPUs that don't have 64bit atomic operations. The only thing preventing compile of the Intel driver for them seems to be initialization of a debug variable. v2: use call_once() instead of unsafe code, as suggested by Matt Turner Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93089 Signed-off-by: Grazvydas Ignotas <[email protected]>
* anv: Advertise shaderInt64 on Broadwell and aboveJason Ekstrand2017-03-032-1/+2
| | | | Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* genxml: Fill out Gen4 and G45 XML.Kenneth Graunke2017-03-032-1/+2232
| | | | | | | | This is a work in progress - some things may still need fixing. But it should be in pretty decent shape. Signed-off-by: Kenneth Graunke <[email protected]> Signed-off-by: Jason Ekstrand <[email protected]>
* genxml: Depend on Makefile.am for generated sources.Matt Turner2017-03-021-1/+1
| | | | | | | Depending on the generated Makefile means that all generated sources are recreated after ./configure. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/image: Allow HiZ on input attachment-capable depth/stencil imagesNanley Chery2017-03-021-14/+0
| | | | | | | | | | | While an input attachment may only take on one of those two layouts, other depth/stencil attachments that use the same image may have HiZ-enabled layouts. Improves the average frame rate on a release candidate of a proprietary Vulkan benchmark by 9.94% over 3 runs on my SKL GT4. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/cmd_buffer: Centralize automatic layout transitionsNanley Chery2017-03-021-42/+12
| | | | | Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/cmd_buffer: Add attachment transitioning functionsNanley Chery2017-03-021-0/+85
| | | | | | | This is needed to transition input attachments. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/blorp: Encapsulate subpass id queryingNanley Chery2017-03-022-6/+17
| | | | | Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/cmd_buffer: Enable render pass awarenessNanley Chery2017-03-022-0/+10
| | | | | | | v2: Update cmd_state_reset (Jason Ekstrand) Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/pass: Store subpass attachment reference listNanley Chery2017-03-022-2/+13
| | | | | | | | | | We'll loop through this array when performing automatic layout transitions. v2: Adjust formatting of an assignment (Jason Ekstrand) Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/pass: Fix size of anv_render_pass:subpass_attachmentsNanley Chery2017-03-021-2/+1
| | | | | | | Don't allocate space for resolve attachments if the subpass has none. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Store the user's VkAttachmentReferenceNanley Chery2017-03-028-52/+47
| | | | | | | | We will be using the image layout. Store the full struct directly from the user. Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>