| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
We have a bunch of code to do this in the back-end compiler but it's
fairly specific to typed surface messages and the way we emit them.
This breaks it out into NIR were it's easier to do things a bit more
generally. It also means we can easily share the code between the vec4
and FS back-ends if we wish.
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reduces the amount of #ifdef ANDROID we'll have to have inside
the driver. Potentially offering better coverage of the android
extensions.
v2: Move anv_android.h include before anv_entrypoints.h (Tapani)
Fix autotools android build (Lionel)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit moves our storage image format conversion codegen into NIR
instead of doing it in the back-end. This has the advantage of letting
us run it through NIR's optimizer which is pretty effective at shrinking
things down. In the common case of rgba8, the number of instructions
emitted after NIR is done with it is half of what it was with the
lowering happening in the back-end. On the downside, the back-end's
lowering is able to directly use predicates and the NIR lowering has to
use IFs.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15166910 -> 15166872 (<.01%)
instructions in affected programs: 5895 -> 5857 (-0.64%)
helped: 15
HURT: 0
Clearly, we don't have that much image_load_store happening in the
shaders in shader-db....
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds support for the KHR_display extension to the anv Vulkan
driver. The driver now attempts to open the master DRM node when the
KHR_display extension is requested so that the common winsys code can
perform the necessary operations.
v2: Make sure primary fd is usable
When KHR_display is selected, we try to open the primary node
instead of the render node in case the user wants to use
KHR_display for presentation. However, if we're actually going
to end up using RandR leases, then we don't care if the
resulting fd can't be used for display, but the kernel also
prevents us from using it for drawing when someone else has
master.
v3:
Simplify addition of VK_USE_PLATFORM_DISPLAY_KHR to vulkan_wsi_args
Suggested-by: Eric Engestrom <[email protected]>
v4:
Adapt primary node usage to new wsi_device_init API
v5:
Adopt Jason Ekstrand's coding conventions
Declare variables at first use, eliminate extra whitespace between
types and names. Wrap lines to 80 columns.
Remove spurious MM_PER_PIXEL define
Suggested-by: Jason Ekstrand <[email protected]>
v6:
Open DRM master before initializing WSI layer.
The DRM master FD is passed to the WSI layer during
initialization, so we need to open the device slightly earlier
in the function.
Close DRM master in device_finish.
Use anv_gem_get_param to detect working master_fd instead of
directly using the ioctl.
Suggested-by: Jason Ekstrand <[email protected]>
v7:
Add vkCreateDisplayModeKHR. This doesn't actually create
new modes, it only looks to see if the requested parameters
matches an existing mode and returns that.
Suggested-by: Jason Ekstrand <[email protected]>
Signed-off-by: Keith Packard <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
A later patch will make use of this in other places. Also, remove
dependency on undefined behavior of left-shifting a signed value.
v2: - move function into a separate header (Chris)
v3: (by Ken) Add new header to the various build systems.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Making these part of libintel_common allows us to use them in the DRI
driver. The standalone tool binaries already link against the common
library, too, so it's no harder for them.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This is part of the device groups extension/feature but it's a decent
chunk of work in its own right so it's worth breaking into its own
patch. The mechanism we use is fairly straightforward: we just push the
base work group id into the shader and add it to the work group id we
get from dispatch.
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Split out the device info so isl doesn't depend on intel/common. Now
it will depend on the new intel/dev device info lib.
This will allow the decoder in intel/common to use isl, allowing us to
apply Ken's patch that removes the genxml duplication of surface
formats.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Move build system changes in to one patch (Ken, Emil)
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
| |
This allows us better introspection into extensions.
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
For also using it in radv. I moved the remaining stubs back to
anv_device.c as they were just trivial.
This does not move the vk_errorf/anv_perf_warn or the object
type macros, as those depend on anv types and logging.
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unnecessary GRF bank conflicts increase the issue time of ternary
instructions (the overwhelmingly most common of which is MAD) by
roughly 50%, leading to reduced ALU throughput. This pass attempts to
minimize the number of bank conflicts by rearranging the layout of the
GRF space post-register allocation. It's in general not possible to
eliminate all of them without introducing extra copies, which are
typically more expensive than the bank conflict itself.
In a shader-db run on SKL this helps roughly 46k shaders:
total conflicts in shared programs: 1008981 -> 600461 (-40.49%)
conflicts in affected programs: 816222 -> 407702 (-50.05%)
helped: 46234
HURT: 72
The running time of shader-db itself on SKL seems to be increased by
roughly 2.52%±1.13% with n=20 due to the additional work done by the
compiler back-end.
On earlier generations the pass is somewhat less effective in relative
terms because the hardware incurs a bank conflict anytime the last two
sources of the instruction are duplicate (e.g. while trying to square
a value using MAD), which is impossible to avoid without introducing
copies. E.g. for a shader-db run on SNB:
total conflicts in shared programs: 944636 -> 623185 (-34.03%)
conflicts in affected programs: 853258 -> 531807 (-37.67%)
helped: 31052
HURT: 19
And on BDW:
total conflicts in shared programs: 1418393 -> 987539 (-30.38%)
conflicts in affected programs: 1179787 -> 748933 (-36.52%)
helped: 47592
HURT: 70
On SKL GT4e this improves performance of GpuTest Volplosion by 3.64%
±0.33% with n=16.
NOTE: This patch intentionally disregards some i965 coding conventions
for the sake of reviewability. This is addressed by the next
squash patch which introduces an amount of (for the most part
boring) boilerplate that might distract reviewers from the
non-trivial algorithmic details of the pass.
The following patch is squashed in:
SQUASH: intel/fs/bank_conflicts: Roll back to the nineties.
Acked-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
It was the only file named intel_* in the compiler.
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implementation is correct (afaict), but takes two shortcuts
regarding the import/export of Android sync fds.
Shortcut 1. When Android calls vkAcquireImageANDROID to import a sync
fd into a VkSemaphore or VkFence, the driver instead simply blocks on
the sync fd, then puts the VkSemaphore or VkFence into the signalled
state. Thanks to implicit sync, this produces correct behavior (with
extra latency overhead, perhaps) despite its ugliness.
Shortcut 2. When Android calls vkQueueSignalReleaseImageANDROID to export
a collection of wait semaphores as a sync fd, the driver instead
submits the semaphores to the queue, then returns sync fd -1, which
informs the caller that no additional synchronization is needed.
Again, thanks to implicit sync, this produces correct behavior (with
extra batch submission overhead) despite its ugliness.
I chose to take the shortcuts instead of properly importing/exporting
the sync fds for two reasons:
Reason 1. I've already tested this patch with dEQP and with demos
apps. It works. I wanted to get the tested patches into the tree now,
and polish the implementation afterwards.
Reason 2. I want to run this on a 3.18 kernel (gasp!). In 3.18, i915
supports neither Android's sync_fence, nor upstream's sync_file, nor
drm_syncobj. Again, I tested these patches on Android with a 3.18
kernel and they work.
I plan to quickly follow-up with patches that remove the shortcuts and
properly import/export the sync fds.
Non-Testing
===========
I did not test at all using the Android.mk buildsystem. I may have broke
it. Please test and review that.
Testing
=======
I tested with 64-bit ARC++ on a Skylake Chromebook and a 3.18 kernel.
The following pass (as of patchset v9):
- a little spinning cube demo APK
- several Sascha demos
- dEQP-VK.info.*
- dEQP-VK.api.wsi.android.*
(except dEQP-VK.api.wsi.android.swapchain.*.image_usage, because
dEQP wants to create swapchains with VK_IMAGE_USAGE_STORAGE_BIT)
- dEQP-VK.api.smoke.*
- dEQP-VK.api.info.instance.*
- dEQP-VK.api.info.device.*
v2:
- Reject VkNativeBufferANDROID if the dma-buf's size is too small for
the VkImage.
- Stop abusing VkNativeBufferANDROID by passing it to vkAllocateMemory
during vkCreateImage. Instead, directly import its dma-buf during
vkCreateImage with anv_bo_cache_import(). [for jekstrand]
- Rebase onto Tapani's VK_EXT_debug_report changes.
- Drop `CPPFLAGS += $(top_srcdir)/include/android`. The dir does not
exist.
v3:
- Delete duplicate #include "anv_private.h". [per Tapani]
- Try to fix the Android-IA build in Android.vulkan.mk by following
Tapani's example.
v4:
- Unset EXEC_OBJECT_ASYNC and set EXEC_OBJECT_WRITE on the imported
gralloc buffer, just as we do for all other winsys buffers in
anv_wsi.c. [found by Tapani]
v5:
- Really fix the Android-IA build by ensuring that Android.vulkan.mk
uses Mesa' vulkan.h and not Android's. Insert -I$(MESA_TOP)/include
before -Iframeworks/native/vulkan/include. [for Tapani]
- In vkAcquireImageANDROID, submit signal operations to the
VkSemaphore and VkFence. [for zhou]
v6:
- Drop copy-paste duplication in vkGetSwapchainGrallocUsageANDROID().
[found by zhou]
- Improve comments in vkGetSwapchainGrallocUsageANDROID().
v7:
- Fix vkGetSwapchainGrallocUsageANDROID() to inspect its
VkImageUsageFlags parameter. [for tfiga]
- This fix regresses dEQP-VK.api.wsi.android.swapchain.*.image_usage
because dEQP wants to create swapchains with
VK_IMAGE_USAGE_STORAGE_BIT.
v8:
- Drop unneeded goto in vkAcquireImageANDROID. [for tfiga]
v8.1: (minor changes)
- Drop errant hunks added by rerere in anv_device.c.
- Drop explicit mention of VK_ANDROID_native_buffer in
anv_entrypoints_gen.py. [for jekstrand]
v9:
- Isolate as much Android code as possible, moving it from anv_image.c
to anv_android.c. Connect the files with anv_image_from_gralloc().
Remove VkNativeBufferANDROID params from all anv_image.c
funcs. [for krh]
- Replace some intel_loge() with vk_errorf() in anv_android.c.
- Use © in copyright line. [for krh]
Reviewed-by: Tapani Pälli <[email protected]> (v5)
Reviewed-by: Kristian H. Kristensen <[email protected]> (v9)
Reviewed-by: Jason Ekstrand <[email protected]> (v9)
Cc: zhoucm1 <[email protected]>
Cc: Tomasz Figa <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I'm bringing up Vulkan in the Android container of Chrome OS (ARC++).
On Android, stdio goes to /dev/null. On Android, remote gdb is even more
painful than the usual remote gdb. On Android, nothing works like you
expect and debugging is hell. I need logging.
This patch introduces a small, simple logging API that can easily wrap
Android's API. On non-Android platforms, this logger does nothing fancy.
It follows the time-honored Unix tradition of spewing everything to
stderr with minimal fuss.
My goal here is not perfection. My goal is to make a minimal, clean API,
that people hate merely a little instead of a lot, and that's good
enough to let me bring up Android Vulkan. And it needs to be fast,
which means it must be small. No one wants to their game to miss frames
while aiming a flaming bow into the jaws of an angry robot t-rex, and
thus become t-rex breakfast, because some fool had too much fun desiging
a bloated, ideal logging API.
If people like it, perhaps we should quickly promote it to src/util.
The API looks like this:
#define INTEL_LOG_TAG "intel-vulkan"
#define DEBUG
intel_logd("try hard thing with foo=%d", foo);
n = try_foo(...);
if (n < 0) {
intel_loge("%s:%d: foo failed bigtime", __FILE__, __LINE__);
return VK_ERROR_DEVICE_LOST;
}
And produces this on non-Android:
intel-vulkan: debug: try hard thing with foo=93
intel-vulkan: error: anv_device.c:182: foo failed bigtime
v2: Fix meson build. [for dcbaker]
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It's already only ever called from brw_compile_cs and only handles
compute intrinsics. Let's just make it CS-specific. We can always
make it handle other stages again later if we want.
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass implements all the implicit conversions required by the
VK_KHR_sampler_ycbcr_conversion specification.
It also inserts plane sources onto sampling instructions that we then
let the pipeline layout pass deal with, when mapping things correctly
to descriptors.
v2: Add new file to meson build (Lionel)
Use nir_frcp() rather than (1.0f / x) (Jason)
Reuse nir_tex_instr_dest_size() rather than handwritten one (Jason)
Return progress (Jason)
Account for array of samplers (Jason)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes:
CC isl/isl_format_layout.lo
In file included from
../../../../src/intel/isl/isl_storage_image.c:24:0:
../../../../src/intel/isl/isl_priv.h:170:29: fatal error:
isl_genX_priv.h: No such file or directory
compilation terminated.
Makefile:2936: recipe for target 'isl/isl_storage_image.lo' failed
make[5]: *** [isl/isl_storage_image.lo] Error 1
make[5]: *** Waiting for unfinished jobs....
In file included from ../../../../src/intel/isl/isl.c:36:0:
../../../../src/intel/isl/isl_priv.h:170:29: fatal error:
isl_genX_priv.h: No such file or directory
compilation terminated.
make[5]: *** [isl/isl.lo] Error 1
Makefile:2936: recipe for target 'isl/isl.lo' failed
make[4]: *** [all] Error 2
when running `make distcheck`.
v2: Fix commit title (Emil)
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch adds required functionality for extension to manage a list of
application provided callbacks and handle debug reporting from driver
and application side.
v2: remove useless helper anv_debug_report_call
add locking around callbacks list
use vk_alloc2, vk_free2
refactor CreateDebugReportCallbackEXT
fix bugs found with crucible testing
v3: provide ANV_FROM_HANDLE and use it
misc fixes for issues Jason found
use vk_find_struct_const for finding ctor_cb
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
I'm going to encapsulate all of the logic dealing with register types in
this file.
Rename the parameters for the hardware encodings from type -> hw_type at
the same time.
Reviewed-by: Scott D Phillips <[email protected]>
|
|
|
|
|
|
|
|
| |
this change reverts commit 4f695731, we want to be able to build
with -DDEBUG and gen_decoder on Android.
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As time goes on, extension advertising is going to get more complex.
Today, we either implement an extension or we don't. However, in the
future, whether or not we advertise an extension will depend on kernel
or hardware features. This commit introduces a python codegen framework
that generates the anv_EnumerateFooExtensionProperties functions as well
as a pair of anv_foo_extension_supported functions for querying for the
support of a given extension string. Each extension has an "enable"
predicate that is any valid C expression. For device extensions, the
physical device is available as "device" so the expression could be
something such as "device->has_kernel_feature". For instance
extensions, the only option is VK_USE_PLATFORM defines.
This mechanism also means that we have a single one-line-per-entry table
for all extension declarations instead of the two tables we had in
anv_device.c and the one we had in anv_entrypoints_gen.py. The Python
code is smart and uses the XML to determine whether an extension is an
instance extension or device extension.
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Fixes: c9cb37b2a6c ("intel/blorp: Add a partial resolve pass for MCS")
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a NIR pass that decides which portions of UBOS we should
upload as push constants, rather than pull constants.
v2: Switch to uint16_t for the UBO block number, because we may
have a lot of them in Vulkan (suggested by Jason). Add more
comments about bitfield trickery (requested by Matt).
v3: Skip vec4 stages for now...I haven't finished wiring up support
in the vec4 backend, and so pushing the data but not using it
will just be wasteful.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
I want to remove vc4's dependency on headers from libdrm as well, but
storing multiple copies of drm_fourcc.h in our tree would be silly.
v2: Update Android.mk as well, move distcheck drm*.h references to
top-level noinst_HEADERS.
Reviewed-by: Lionel Landwerlin <[email protected]> (v1)
Reviewed-by: Daniel Stone <[email protected]> (v1)
Reviewed-by: Rob Herring <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I want to use these in the OpenGL driver as well.
v2: Add to COMMON_FILES in Makefile.sources (caught by Emil)
Reviewed-by: Daniel Vetter <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With Ken's work to drop the library dependency on libdrm_intel, we now
only depend on libdrm for the kernel uapi headers it provides. It
seems like we're better off just embeddeding those headers ourselves,
making the lives of people developping news features tightly
integrated with the kernel a tiny bit easier.
This change also makes it a bit more obvious what cflags/libs are
required by the i915 drivers vs i965, by renaming INTEL_CFLAGS/LIBS
into I915_CFLAGS/LIBS.
Headers were generated from drm-tip on the following commit :
commit 6d61e70ccc21606ffb8a0a03bd3aba24f659502b
Merge: 338ffbf7cb5e c0bc126f97fb
Author: Dave Airlie <[email protected]>
Date: Tue Jun 27 07:24:49 2017 +1000
Backmerge tag 'v4.12-rc7' into drm-next
v2: Use installed files from the kernel (Daniel Vetter)
v3: Use headers from drm-next rather than drm-tip (Dave/Daniel)
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This patch just enables building Vulkan libs for gen10. We
still don't have gen 10 support enabled on Vulkan.
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
V2: Remove isl_gen10.c and isl_gen10.h
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
v2 (Jason Ekstrand):
- Take a view_mask rather than a whole subpass
- Build the view mask into the VS shader key
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
| |
Things are about to get more complicated, especially as far as
semaphores are concerned.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
| |
v2:
- Change the name to lower_conversions.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Combining all the files into a single string didn't make any
difference in the size of the aubinator binary.
With this change we now also embed gen4/4.5/5 descriptions, which
increases the aubinator size by ~16Kb.
v2 (Lionel): rebase makefiles
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
genX_bits.h contains the sizes of bitfields in genxml instructions,
structures, and registers. It also defines some functions to query those
sizes.
isl_surf_init() will use the new header to validate that requested
pitches fit in their destination bitfields.
What's currently in genX_bits.h:
- Each CONTAINER::Field from gen*.xml that has a bitsize has a macro
in genX_bits.h:
#define GEN{N}_CONTAINER_Field_bits {bitsize}
- For each set of macros whose name, after stripping the GEN prefix,
is the same, genX_bits.h contains a query function:
static inline uint32_t __attribute__((pure))
CONTAINER_Field_bits(const struct gen_device_info *devinfo);
v2 (Chad Versace):
- Parse the XML instead of scraping the generated gen*_pack.h headers.
v3 (Dylan Baker):
- Port to Mako.
v4 (Jason Ekstrand):
- Make the _bits functions take a gen_device_info.
v5 (Chad Versace):
- Fix autotools out-of-tree build.
- Fix Android build. Tested with git://github.com/android-ia/manifest.
- Fix macro names. They were all missing the "_bits" suffix.
- Fix macros names more. Remove all double-underscores.
- Unindent all generated code. (It was floating in a sea of whitespace).
- Reformat header to appear human-written not machine-generated.
- Sort gens from high to low. Newest gens should come first because,
when we read code, we likely want to read the gen8/9 code and ignore
the gen4 code. So put the gen4 code at the bottom.
- Replace 'const' attributes with 'pure', because the functions now
have a pointer parameter.
- Add --cpp-guard flag. Used by Android.
- Kill class FieldCollection. After Jason's rewrite, it was just
a dict.
v6 (Chad Versace):
- Replace `key not in d.keys()` with `key not in d`. [for dylan]
Co-authored-by: Dylan Baker <[email protected]>
Co-authored-by: Jason Ekstrand <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]> (v5)
Reviewed-by: Dylan Baker <[email protected]> (v6)
|
|
|
|
|
|
| |
The future header genX_bits.h will depend on GENXML_XML_FILES.
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
patch adds DECODER_FILES for libintel_common, this is so that platforms
such as Android not currently using this functionality can opt out.
Fixes: 7d84bb3 ("intel: Move tools/decoder.[ch] to common/gen_decoder.[ch].")
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
This way they become part of libintel_common.la so I can use them in
the i965 driver.
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have a performance problem with dynamic buffer descriptors. Because
we are currently implementing them by pushing an offset into the shader
and adding that offset onto the already existing offset for the UBO/SSBO
operation, all UBO/SSBO operations on dynamic descriptors are indirect.
The back-end compiler implements indirect pull constant loads using what
basically amounts to a texelFetch instruction. For pull constant loads
with constant offsets, however, we use an oword block read message which
goes through the constant cache and reads a whole cache line at a time.
Because of these two things, direct pull constant loads are much faster
than indirect pull constant loads. Because all loads from dynamically
bound buffers are indirect, the user takes a substantial performance
penalty when using this "performance" feature.
There are two potential solutions I have seen for this problem. The
alternate solution is to continue pushing offsets into the shader but
wire things up in the back-end compiler so that we use the oword block
read messages anyway. The only reason we can do this because we know a
priori that the dynamic offsets are uniform and 16-byte aligned.
Unfortunately, thanks to the 16-byte alignment requirement of the oword
messages, we can't do some general "if the indirect offset is uniform,
use an oword message" sort of thing.
This solution, however, is recommended for a few of reasons:
1. Surface states are relatively cheap. We've been using on-the-fly
surface state setup for some time in GL and it works well. Also,
dynamic offsets with on-the-fly surface state should still be
cheaper than allocating new descriptor sets every time you want to
change a buffer offset which is really the only requirement of the
dynamic offsets feature.
2. This requires substantially less compiler plumbing. Not only can we
delete the entire apply_dynamic_offsets pass but we can also avoid
having to add architecture for passing dynamic offsets to the back-
end compiler in such a way that it can continue using oword messages.
3. We get robust buffer access range-checking for free. Because the
offset and range are baked into the surface state, we no longer need
to pass ranges around and do bounds-checking in the shader.
4. Once we finally get UBO pushing implemented, it will be much easier
to handle pushing chunks of dynamic descriptors if the compiler
remains blissfully unaware of dynamic descriptors.
This commit improves performance of The Talos Principle on ULTRA
settings by around 50% and brings it nicely into line with OpenGL
performance.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
v2 [Emil Velikov]
- Various fixes and initial stab at the Android build.
- Keep the generation rules/EXTRA_DIST outside the conditional
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mostly a dummy git mv with a couple of noticable parts:
- With the earlier header cleanups, nothing in src/intel depends
files from src/mesa/drivers/dri/i965/
- Both Autoconf and Android builds are addressed. Thanks to Mauro and
Tapani for the fixups in the latter
- brw_util.[ch] is not really compiler specific, so it's moved to i965.
v2:
- move brw_eu_defines.h instead of brw_defines.h
- remove no-longer applicable includes
- add missing vulkan/ prefix in the Android build (thanks Tapani)
v3:
- don't list brw_defines.h in src/intel/Makefile.sources (Jason)
- rebase on top of the oa patches
[Emil Velikov: commit message, various small fixes througout]
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This is shared between the Vulkan and GL drivers as it's a requirement
of the back-end compiler. However, it doesn't really belong in the
compiler. We rename the file to match the prefix of the other stuff in
common and because libdrm defines an intel_debug.h and this avoids a
pile of possible name conflicts.
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This code is far too complicated to cut and paste.
v2: Update the newly added genX_gpu_memcpy.c; const a few things.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
New generated headers were introduced by commit 63a366a
"intel: aubinator: generate a standalone binary"
Android does not need aubinator yet, so in order to avoid building error,
aubinator required new genxml headers are defined in a separate list.
If required, building rules for Android will be added later.
[Emil Velikov: don't use a _HEADERS variable name (causes warnings)]
Signed-off-by: Emil Velikov <[email protected]>
|