summaryrefslogtreecommitdiffstats
path: root/src/intel/Makefile.sources
Commit message (Collapse)AuthorAgeFilesLines
* anv: Implement vkCmdDispatchBaseJason Ekstrand2018-03-071-0/+1
| | | | | | | | | | This is part of the device groups extension/feature but it's a decent chunk of work in its own right so it's worth breaking into its own patch. The mechanism we use is fairly straightforward: we just push the base work group id into the shader and add it to the work group id we get from dispatch. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* intel: Split gen_device_info out into libintel_devJordan Justen2018-03-051-2/+4
| | | | | | | | | | | | Split out the device info so isl doesn't depend on intel/common. Now it will depend on the new intel/dev device info lib. This will allow the decoder in intel/common to use isl, allowing us to apply Ken's patch that removes the genxml duplication of surface formats. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* intel: add new common header gen_defines.hTapani Pälli2018-02-281-0/+1
| | | | | | Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Chris Wilson <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* anv/icl: Build anv libs for gen11Anuj Phogat2018-02-161-0/+4
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/isl/icl: Build and use gen11 surface state emit functionsAnuj Phogat2018-02-151-0/+4
| | | | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* intel/genxml/icl: Generate packing headersAnuj Phogat2018-02-151-2/+4
| | | | | | | | | Move build system changes in to one patch (Ken, Emil) Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* anv/extensions: Generate a header file with extension tablesJason Ekstrand2018-01-231-1/+2
| | | | | | This allows us better introspection into extensions. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* vulkan: move anv VK_EXT_debug_report implementation to common code.Bas Nieuwenhuizen2018-01-171-1/+0
| | | | | | | | | | For also using it in radv. I moved the remaining stubs back to anv_device.c as they were just trivial. This does not move the vk_errorf/anv_perf_warn or the object type macros, as those depend on anv types and logging. Reviewed-by: Tapani Pälli <[email protected]>
* intel/fs: Implement GRF bank conflict mitigation pass.Francisco Jerez2017-12-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unnecessary GRF bank conflicts increase the issue time of ternary instructions (the overwhelmingly most common of which is MAD) by roughly 50%, leading to reduced ALU throughput. This pass attempts to minimize the number of bank conflicts by rearranging the layout of the GRF space post-register allocation. It's in general not possible to eliminate all of them without introducing extra copies, which are typically more expensive than the bank conflict itself. In a shader-db run on SKL this helps roughly 46k shaders: total conflicts in shared programs: 1008981 -> 600461 (-40.49%) conflicts in affected programs: 816222 -> 407702 (-50.05%) helped: 46234 HURT: 72 The running time of shader-db itself on SKL seems to be increased by roughly 2.52%±1.13% with n=20 due to the additional work done by the compiler back-end. On earlier generations the pass is somewhat less effective in relative terms because the hardware incurs a bank conflict anytime the last two sources of the instruction are duplicate (e.g. while trying to square a value using MAD), which is impossible to avoid without introducing copies. E.g. for a shader-db run on SNB: total conflicts in shared programs: 944636 -> 623185 (-34.03%) conflicts in affected programs: 853258 -> 531807 (-37.67%) helped: 31052 HURT: 19 And on BDW: total conflicts in shared programs: 1418393 -> 987539 (-30.38%) conflicts in affected programs: 1179787 -> 748933 (-36.52%) helped: 47592 HURT: 70 On SKL GT4e this improves performance of GpuTest Volplosion by 3.64% ±0.33% with n=16. NOTE: This patch intentionally disregards some i965 coding conventions for the sake of reviewability. This is addressed by the next squash patch which introduces an amount of (for the most part boring) boilerplate that might distract reviewers from the non-trivial algorithmic details of the pass. The following patch is squashed in: SQUASH: intel/fs/bank_conflicts: Roll back to the nineties. Acked-by: Matt Turner <[email protected]>
* i965: Rename intel_asm_annotation -> brw_disasm_infoMatt Turner2017-11-171-3/+3
| | | | | | | It was the only file named intel_* in the compiler. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv: Implement VK_ANDROID_native_buffer (v9)Chad Versace2017-10-181-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This implementation is correct (afaict), but takes two shortcuts regarding the import/export of Android sync fds. Shortcut 1. When Android calls vkAcquireImageANDROID to import a sync fd into a VkSemaphore or VkFence, the driver instead simply blocks on the sync fd, then puts the VkSemaphore or VkFence into the signalled state. Thanks to implicit sync, this produces correct behavior (with extra latency overhead, perhaps) despite its ugliness. Shortcut 2. When Android calls vkQueueSignalReleaseImageANDROID to export a collection of wait semaphores as a sync fd, the driver instead submits the semaphores to the queue, then returns sync fd -1, which informs the caller that no additional synchronization is needed. Again, thanks to implicit sync, this produces correct behavior (with extra batch submission overhead) despite its ugliness. I chose to take the shortcuts instead of properly importing/exporting the sync fds for two reasons: Reason 1. I've already tested this patch with dEQP and with demos apps. It works. I wanted to get the tested patches into the tree now, and polish the implementation afterwards. Reason 2. I want to run this on a 3.18 kernel (gasp!). In 3.18, i915 supports neither Android's sync_fence, nor upstream's sync_file, nor drm_syncobj. Again, I tested these patches on Android with a 3.18 kernel and they work. I plan to quickly follow-up with patches that remove the shortcuts and properly import/export the sync fds. Non-Testing =========== I did not test at all using the Android.mk buildsystem. I may have broke it. Please test and review that. Testing ======= I tested with 64-bit ARC++ on a Skylake Chromebook and a 3.18 kernel. The following pass (as of patchset v9): - a little spinning cube demo APK - several Sascha demos - dEQP-VK.info.* - dEQP-VK.api.wsi.android.* (except dEQP-VK.api.wsi.android.swapchain.*.image_usage, because dEQP wants to create swapchains with VK_IMAGE_USAGE_STORAGE_BIT) - dEQP-VK.api.smoke.* - dEQP-VK.api.info.instance.* - dEQP-VK.api.info.device.* v2: - Reject VkNativeBufferANDROID if the dma-buf's size is too small for the VkImage. - Stop abusing VkNativeBufferANDROID by passing it to vkAllocateMemory during vkCreateImage. Instead, directly import its dma-buf during vkCreateImage with anv_bo_cache_import(). [for jekstrand] - Rebase onto Tapani's VK_EXT_debug_report changes. - Drop `CPPFLAGS += $(top_srcdir)/include/android`. The dir does not exist. v3: - Delete duplicate #include "anv_private.h". [per Tapani] - Try to fix the Android-IA build in Android.vulkan.mk by following Tapani's example. v4: - Unset EXEC_OBJECT_ASYNC and set EXEC_OBJECT_WRITE on the imported gralloc buffer, just as we do for all other winsys buffers in anv_wsi.c. [found by Tapani] v5: - Really fix the Android-IA build by ensuring that Android.vulkan.mk uses Mesa' vulkan.h and not Android's. Insert -I$(MESA_TOP)/include before -Iframeworks/native/vulkan/include. [for Tapani] - In vkAcquireImageANDROID, submit signal operations to the VkSemaphore and VkFence. [for zhou] v6: - Drop copy-paste duplication in vkGetSwapchainGrallocUsageANDROID(). [found by zhou] - Improve comments in vkGetSwapchainGrallocUsageANDROID(). v7: - Fix vkGetSwapchainGrallocUsageANDROID() to inspect its VkImageUsageFlags parameter. [for tfiga] - This fix regresses dEQP-VK.api.wsi.android.swapchain.*.image_usage because dEQP wants to create swapchains with VK_IMAGE_USAGE_STORAGE_BIT. v8: - Drop unneeded goto in vkAcquireImageANDROID. [for tfiga] v8.1: (minor changes) - Drop errant hunks added by rerere in anv_device.c. - Drop explicit mention of VK_ANDROID_native_buffer in anv_entrypoints_gen.py. [for jekstrand] v9: - Isolate as much Android code as possible, moving it from anv_image.c to anv_android.c. Connect the files with anv_image_from_gralloc(). Remove VkNativeBufferANDROID params from all anv_image.c funcs. [for krh] - Replace some intel_loge() with vk_errorf() in anv_android.c. - Use © in copyright line. [for krh] Reviewed-by: Tapani Pälli <[email protected]> (v5) Reviewed-by: Kristian H. Kristensen <[email protected]> (v9) Reviewed-by: Jason Ekstrand <[email protected]> (v9) Cc: zhoucm1 <[email protected]> Cc: Tomasz Figa <[email protected]>
* intel: Add simple logging façade for Android (v2)Chad Versace2017-10-171-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I'm bringing up Vulkan in the Android container of Chrome OS (ARC++). On Android, stdio goes to /dev/null. On Android, remote gdb is even more painful than the usual remote gdb. On Android, nothing works like you expect and debugging is hell. I need logging. This patch introduces a small, simple logging API that can easily wrap Android's API. On non-Android platforms, this logger does nothing fancy. It follows the time-honored Unix tradition of spewing everything to stderr with minimal fuss. My goal here is not perfection. My goal is to make a minimal, clean API, that people hate merely a little instead of a lot, and that's good enough to let me bring up Android Vulkan. And it needs to be fast, which means it must be small. No one wants to their game to miss frames while aiming a flaming bow into the jaws of an angry robot t-rex, and thus become t-rex breakfast, because some fool had too much fun desiging a bloated, ideal logging API. If people like it, perhaps we should quickly promote it to src/util. The API looks like this: #define INTEL_LOG_TAG "intel-vulkan" #define DEBUG intel_logd("try hard thing with foo=%d", foo); n = try_foo(...); if (n < 0) { intel_loge("%s:%d: foo failed bigtime", __FILE__, __LINE__); return VK_ERROR_DEVICE_LOST; } And produces this on non-Android: intel-vulkan: debug: try hard thing with foo=93 intel-vulkan: error: anv_device.c:182: foo failed bigtime v2: Fix meson build. [for dcbaker] Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: Make brw_nir_lower_intrinsics compute-specificJason Ekstrand2017-10-121-1/+1
| | | | | | | | | It's already only ever called from brw_compile_cs and only handles compute intrinsics. Let's just make it CS-specific. We can always make it handle other stages again later if we want. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv: add nir lowering pass for ycbcr texturesLionel Landwerlin2017-10-061-0/+1
| | | | | | | | | | | | | | | | | | This pass implements all the implicit conversions required by the VK_KHR_sampler_ycbcr_conversion specification. It also inserts plane sources onto sampling instructions that we then let the pipeline layout pass deal with, when mapping things correctly to descriptors. v2: Add new file to meson build (Lionel) Use nir_frcp() rather than (1.0f / x) (Jason) Reuse nir_tex_instr_dest_size() rather than handwritten one (Jason) Return progress (Jason) Account for array of samplers (Jason) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel: automake: add isl_genX_priv.h in the source listJuan A. Suarez Romero2017-09-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes: CC isl/isl_format_layout.lo In file included from ../../../../src/intel/isl/isl_storage_image.c:24:0: ../../../../src/intel/isl/isl_priv.h:170:29: fatal error: isl_genX_priv.h: No such file or directory compilation terminated. Makefile:2936: recipe for target 'isl/isl_storage_image.lo' failed make[5]: *** [isl/isl_storage_image.lo] Error 1 make[5]: *** Waiting for unfinished jobs.... In file included from ../../../../src/intel/isl/isl.c:36:0: ../../../../src/intel/isl/isl_priv.h:170:29: fatal error: isl_genX_priv.h: No such file or directory compilation terminated. make[5]: *** [isl/isl.lo] Error 1 Makefile:2936: recipe for target 'isl/isl.lo' failed make[4]: *** [all] Error 2 when running `make distcheck`. v2: Fix commit title (Emil) Reviewed-by: Emil Velikov <[email protected]>
* anv: implementation of VK_EXT_debug_report extensionTapani Pälli2017-09-121-0/+1
| | | | | | | | | | | | | | | | | | | Patch adds required functionality for extension to manage a list of application provided callbacks and handle debug reporting from driver and application side. v2: remove useless helper anv_debug_report_call add locking around callbacks list use vk_alloc2, vk_free2 refactor CreateDebugReportCallbackEXT fix bugs found with crucible testing v3: provide ANV_FROM_HANDLE and use it misc fixes for issues Jason found use vk_find_struct_const for finding ctor_cb Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Extract functions dealing with register types to separate fileMatt Turner2017-08-211-0/+2
| | | | | | | | | | I'm going to encapsulate all of the logic dealing with register types in this file. Rename the parameters for the hardware encodings from type -> hw_type at the same time. Reviewed-by: Scott D Phillips <[email protected]>
* intel: move gen_decoder.* back to COMMON_FILESTapani Pälli2017-08-021-4/+2
| | | | | | | | this change reverts commit 4f695731, we want to be able to build with -DDEBUG and gen_decoder on Android. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* anv: Autogenerate extension query and lookupJason Ekstrand2017-08-011-1/+2
| | | | | | | | | | | | | | | | | | | | | | | As time goes on, extension advertising is going to get more complex. Today, we either implement an extension or we don't. However, in the future, whether or not we advertise an extension will depend on kernel or hardware features. This commit introduces a python codegen framework that generates the anv_EnumerateFooExtensionProperties functions as well as a pair of anv_foo_extension_supported functions for querying for the support of a given extension string. Each extension has an "enable" predicate that is any valid C expression. For device extensions, the physical device is available as "device" so the expression could be something such as "device->has_kernel_feature". For instance extensions, the only option is VK_USE_PLATFORM defines. This mechanism also means that we have a single one-line-per-entry table for all extension declarations instead of the two tables we had in anv_device.c and the one we had in anv_entrypoints_gen.py. The Python code is smart and uses the XML to determine whether an extension is an instance extension or device extension. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/blorp: ship blorp_genX_exec.h within the tarballEmil Velikov2017-07-241-0/+1
| | | | | Fixes: c9cb37b2a6c ("intel/blorp: Add a partial resolve pass for MCS") Signed-off-by: Emil Velikov <[email protected]>
* i965: Select ranges of UBO data to be uploaded as push constants.Kenneth Graunke2017-07-131-0/+1
| | | | | | | | | | | | | | | This adds a NIR pass that decides which portions of UBOS we should upload as push constants, rather than pull constants. v2: Switch to uint16_t for the UBO block number, because we may have a lot of them in Vulkan (suggested by Jason). Add more comments about bitfield trickery (requested by Matt). v3: Skip vec4 stages for now...I haven't finished wiring up support in the vec4 backend, and so pushing the data but not using it will just be wasteful. Reviewed-by: Matt Turner <[email protected]>
* intel/isl: Add basic modifier introspectionJason Ekstrand2017-07-121-0/+1
| | | | | Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* intel: Move the DRM uapi headers to a non-Intel location.Eric Anholt2017-07-121-6/+0
| | | | | | | | | | | | I want to remove vc4's dependency on headers from libdrm as well, but storing multiple copies of drm_fourcc.h in our tree would be silly. v2: Update Android.mk as well, move distcheck drm*.h references to top-level noinst_HEADERS. Reviewed-by: Lionel Landwerlin <[email protected]> (v1) Reviewed-by: Daniel Stone <[email protected]> (v1) Reviewed-by: Rob Herring <[email protected]>
* intel: Move clflush helpers from anv to common/gen_clflush.h.Kenneth Graunke2017-07-101-0/+1
| | | | | | | | | I want to use these in the OpenGL driver as well. v2: Add to COMMON_FILES in Makefile.sources (caught by Emil) Reviewed-by: Daniel Vetter <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* anv/i965: drop libdrm_intel dependency completelyLionel Landwerlin2017-06-271-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | With Ken's work to drop the library dependency on libdrm_intel, we now only depend on libdrm for the kernel uapi headers it provides. It seems like we're better off just embeddeding those headers ourselves, making the lives of people developping news features tightly integrated with the kernel a tiny bit easier. This change also makes it a bit more obvious what cflags/libs are required by the i915 drivers vs i965, by renaming INTEL_CFLAGS/LIBS into I915_CFLAGS/LIBS. Headers were generated from drm-tip on the following commit : commit 6d61e70ccc21606ffb8a0a03bd3aba24f659502b Merge: 338ffbf7cb5e c0bc126f97fb Author: Dave Airlie <[email protected]> Date: Tue Jun 27 07:24:49 2017 +1000 Backmerge tag 'v4.12-rc7' into drm-next v2: Use installed files from the kernel (Daniel Vetter) v3: Use headers from drm-next rather than drm-tip (Dave/Daniel) Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Enable vulkan build for gen10Anuj Phogat2017-06-221-0/+4
| | | | | | | | This patch just enables building Vulkan libs for gen10. We still don't have gen 10 support enabled on Vulkan. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/cnl: Wire up Mesa build files for gen10Anuj Phogat2017-06-091-2/+8
| | | | | | | | V2: Remove isl_gen10.c and isl_gen10.h Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* i965: Move clip program compilation to the compilerJason Ekstrand2017-05-261-0/+7
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Move SF compilation to the compilerJason Ekstrand2017-05-261-0/+1
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* anv/pipeline: Add shader lowering for multiviewJason Ekstrand2017-05-031-0/+1
| | | | | | | | v2 (Jason Ekstrand): - Take a view_mask rather than a whole subpass - Build the view mask into the VS shader key Reviewed-by: Iago Toral Quiroga <[email protected]>
* anv: Move queues, events, and semaphores to their own fileJason Ekstrand2017-04-271-0/+1
| | | | | | | Things are about to get more complicated, especially as far as semaphores are concerned. Reviewed-by: Chad Versace <[email protected]>
* i965/fs: rename lower_d2x to lower_conversionsSamuel Iglesias Gonsálvez2017-04-141-1/+1
| | | | | | | | v2: - Change the name to lower_conversions. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* intel/isl: Add support for emitting depth/stencil/hizJason Ekstrand2017-04-101-0/+7
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* intel: genxml: compress all gen files into oneLionel Landwerlin2017-03-311-8/+2
| | | | | | | | | | | | | Combining all the files into a single string didn't make any difference in the size of the aubinator binary. With this change we now also embed gen4/4.5/5 descriptions, which increases the aubinator size by ~16Kb. v2 (Lionel): rebase makefiles Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* genxml: New generated header genX_bits.h (v6)Chad Versace2017-03-281-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | genX_bits.h contains the sizes of bitfields in genxml instructions, structures, and registers. It also defines some functions to query those sizes. isl_surf_init() will use the new header to validate that requested pitches fit in their destination bitfields. What's currently in genX_bits.h: - Each CONTAINER::Field from gen*.xml that has a bitsize has a macro in genX_bits.h: #define GEN{N}_CONTAINER_Field_bits {bitsize} - For each set of macros whose name, after stripping the GEN prefix, is the same, genX_bits.h contains a query function: static inline uint32_t __attribute__((pure)) CONTAINER_Field_bits(const struct gen_device_info *devinfo); v2 (Chad Versace): - Parse the XML instead of scraping the generated gen*_pack.h headers. v3 (Dylan Baker): - Port to Mako. v4 (Jason Ekstrand): - Make the _bits functions take a gen_device_info. v5 (Chad Versace): - Fix autotools out-of-tree build. - Fix Android build. Tested with git://github.com/android-ia/manifest. - Fix macro names. They were all missing the "_bits" suffix. - Fix macros names more. Remove all double-underscores. - Unindent all generated code. (It was floating in a sea of whitespace). - Reformat header to appear human-written not machine-generated. - Sort gens from high to low. Newest gens should come first because, when we read code, we likely want to read the gen8/9 code and ignore the gen4 code. So put the gen4 code at the bottom. - Replace 'const' attributes with 'pure', because the functions now have a pointer parameter. - Add --cpp-guard flag. Used by Android. - Kill class FieldCollection. After Jason's rewrite, it was just a dict. v6 (Chad Versace): - Replace `key not in d.keys()` with `key not in d`. [for dylan] Co-authored-by: Dylan Baker <[email protected]> Co-authored-by: Jason Ekstrand <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (v5) Reviewed-by: Dylan Baker <[email protected]> (v6)
* genxml: Define GENXML_XML_FILES in Makefile.sourcesChad Versace2017-03-241-0/+10
| | | | | | The future header genX_bits.h will depend on GENXML_XML_FILES. Reviewed-by: Emil Velikov <[email protected]>
* intel: move gen_decoder.* to DECODER_FILESTapani Pälli2017-03-231-2/+4
| | | | | | | | | | patch adds DECODER_FILES for libintel_common, this is so that platforms such as Android not currently using this functionality can opt out. Fixes: 7d84bb3 ("intel: Move tools/decoder.[ch] to common/gen_decoder.[ch].") Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* intel: Move tools/decoder.[ch] to common/gen_decoder.[ch].Kenneth Graunke2017-03-211-0/+2
| | | | | | | This way they become part of libintel_common.la so I can use them in the i965 driver. Reviewed-by: Emil Velikov <[email protected]>
* anv: Use on-the-fly surface states for dynamic buffer descriptorsJason Ekstrand2017-03-131-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a performance problem with dynamic buffer descriptors. Because we are currently implementing them by pushing an offset into the shader and adding that offset onto the already existing offset for the UBO/SSBO operation, all UBO/SSBO operations on dynamic descriptors are indirect. The back-end compiler implements indirect pull constant loads using what basically amounts to a texelFetch instruction. For pull constant loads with constant offsets, however, we use an oword block read message which goes through the constant cache and reads a whole cache line at a time. Because of these two things, direct pull constant loads are much faster than indirect pull constant loads. Because all loads from dynamically bound buffers are indirect, the user takes a substantial performance penalty when using this "performance" feature. There are two potential solutions I have seen for this problem. The alternate solution is to continue pushing offsets into the shader but wire things up in the back-end compiler so that we use the oword block read messages anyway. The only reason we can do this because we know a priori that the dynamic offsets are uniform and 16-byte aligned. Unfortunately, thanks to the 16-byte alignment requirement of the oword messages, we can't do some general "if the indirect offset is uniform, use an oword message" sort of thing. This solution, however, is recommended for a few of reasons: 1. Surface states are relatively cheap. We've been using on-the-fly surface state setup for some time in GL and it works well. Also, dynamic offsets with on-the-fly surface state should still be cheaper than allocating new descriptor sets every time you want to change a buffer offset which is really the only requirement of the dynamic offsets feature. 2. This requires substantially less compiler plumbing. Not only can we delete the entire apply_dynamic_offsets pass but we can also avoid having to add architecture for passing dynamic offsets to the back- end compiler in such a way that it can continue using oword messages. 3. We get robust buffer access range-checking for free. Because the offset and range are baked into the surface state, we no longer need to pass ranges around and do bounds-checking in the shader. 4. Once we finally get UBO pushing implemented, it will be much easier to handle pushing chunks of dynamic descriptors if the compiler remains blissfully unaware of dynamic descriptors. This commit improves performance of The Talos Principle on ULTRA settings by around 50% and brings it nicely into line with OpenGL performance. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/vulkan: Get rid of recursive makeJason Ekstrand2017-03-131-0/+65
| | | | | | | | v2 [Emil Velikov] - Various fixes and initial stab at the Android build. - Keep the generation rules/EXTRA_DIST outside the conditional Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Move the back-end compiler to src/intel/compilerJason Ekstrand2017-03-131-0/+89
| | | | | | | | | | | | | | | | | | | | | | Mostly a dummy git mv with a couple of noticable parts: - With the earlier header cleanups, nothing in src/intel depends files from src/mesa/drivers/dri/i965/ - Both Autoconf and Android builds are addressed. Thanks to Mauro and Tapani for the fixups in the latter - brw_util.[ch] is not really compiler specific, so it's moved to i965. v2: - move brw_eu_defines.h instead of brw_defines.h - remove no-longer applicable includes - add missing vulkan/ prefix in the Android build (thanks Tapani) v3: - don't list brw_defines.h in src/intel/Makefile.sources (Jason) - rebase on top of the oa patches [Emil Velikov: commit message, various small fixes througout] Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Move intel_debug.h to intel/common/gen_debug.hJason Ekstrand2017-03-011-0/+2
| | | | | | | | | | This is shared between the Vulkan and GL drivers as it's a requirement of the back-end compiler. However, it doesn't really belong in the compiler. We rename the file to match the prefix of the other stuff in common and because libdrm defines an intel_debug.h and this avoids a pile of possible name conflicts. Reviewed-by: Anuj Phogat <[email protected]>
* intel: Share URB configuration code between GL and Vulkan.Kenneth Graunke2016-11-191-0/+1
| | | | | | | | | This code is far too complicated to cut and paste. v2: Update the newly added genX_gpu_memcpy.c; const a few things. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/genxml: fix building rules for aubinator required headersMauro Rossi2016-10-111-1/+3
| | | | | | | | | | | | New generated headers were introduced by commit 63a366a "intel: aubinator: generate a standalone binary" Android does not need aubinator yet, so in order to avoid building error, aubinator required new genxml headers are defined in a separate list. If required, building rules for Android will be added later. [Emil Velikov: don't use a _HEADERS variable name (causes warnings)] Signed-off-by: Emil Velikov <[email protected]>
* intel: aubinator: generate a standalone binaryLionel Landwerlin2016-10-081-1/+6
| | | | | | | | | | | | | | | | | | | | | Embed the xml files into the binary, so aubinator can be used from any location. v2: Split generation packing into another patch (Jason) Check for xxd (Jason) v3: Fix out of tree builds (Jason) Generate custom variable name rather than names generated by xxd (Lionel) v4: Move generated _xml.h files to genxml/ (Sirisha) v5: Remove newline from makefile (Jason) v6: Add comment on gen*_xml.h creation (Jason) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel: automake: reference the correct headerEmil Velikov2016-10-061-1/+2
| | | | | | | | | | The header was renamed with earlier commit, so update the Makefile.sources respectively. {vulkan/genX_multisample.h => common/gen_sample_positions.h} Fixes: c779ad3e661("intel: Move Vulkan sample positions to common code") Signed-off-by: Emil Velikov <[email protected]>
* intel: Pull the guts of gen7_l3_state.c into a shared helperJason Ekstrand2016-09-031-1/+3
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* intel: s/brw_device_info/gen_device_info/Jason Ekstrand2016-09-031-2/+2
| | | | | | | | | | | | | Generated by: sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* intel: Add a new "common" library for more code sharingJason Ekstrand2016-09-031-0/+4
| | | | | | | The first thing to go in this new library is brw_device_info. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Move blorp into src/intel/blorpJason Ekstrand2016-08-291-0/+8
| | | | | | | | | At this point, blorp is completely driver agnostic and can be safely moved into its own folder. Soon, we hope to start using it for doing blits in the Vulkan driver. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>