| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We had missed a bit of errata - PS scratch needs to be computed as if
there were 4 subslices per slice, rather than 3.
Skylake Broxton Kabylake
GT1 GT2 GT3 GT4 2x6 3x6 GT1 GT1.5 GT2 GT3 GT4
Actual Slices 1 1 2 3 1 1 1 1 1 2 3
Total Subslices 3 3 6 9 2 3 2 3 3 6 9
Subsl. for PS Scratch 4 4 8 12 4 4 4 4 4 8 12
Note that Skylake GT1-3 already worked because we allocated 64 * 9
(trying to use a value that would work on GT4, with 9 subslices),
and the actual required values were 64 * 4 or 64 * 8. However, all
others (Skylake GT4, Broxton, and Kabylake GT1-4) underallocated,
which can lead to scratch writes trashing random process memory,
and rendering corruption or GPU hangs.
Fixes GPU hangs and rendering corruption on Skylake GT4 in shaders that
spill. Particularly, dEQP-GLES31.functional.ubo.all_per_block_buffers.*
now runs successfully with no hangs and renders correctly. This may
fix problems on Broxton and Kabylake as well.
Cc: "13.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Vulkan has introduced the consept of .specVersion which can be used to
attribute changes of the said extension.
The current loader does not check the value, thus it have gone unnoticed
that the driver exposes an old version of the following extensions:
VK_KHR_xcb_surface (Rev 6)
VK_KHR_xlib_surface (Rev 6)
VK_KHR_wayland_surface (Rev 5)
- Updated the surface create function to take a pCreateInfo structure
VK_KHR_swapchain (Rev 68)
- Moved the "validity" include for vkAcquireNextImage to be in its proper
place, after the prototype and list of parameters.
...
According to the documentation:
* pname:specVersion is the version of this extension.
It is an integer, incremented with backward compatible changes.
Based on the history of vk.xml the above (latest) revision has been
available since Vulkan 1.0 so even if they were any backwards
incompatible change(s) [as hinted by the revision log] those should be
safe.
Cc: "13.0" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since our surface state buffer is shared by all batches, the kernel does a
full stall and sync with the CPU between batches every time we call
execbuf2 because it refuses to do relocations on an active buffer. Doing
them in userspace and passing the NO_RELOC flag to the kernel allows us to
perform the relocations without stalling.
This improves the performance of Dota 2 by around 30% on a Sky Lake GT2.
v2 (Jason Ekstrand):
- Better comments (Chris Wilson)
- Fixed write_reloc for correct canonical form (Chris Wilson)
v3 (Jason Ekstrand):
- Skip relocations which aren't needed
- Provide an environment variable to always use the kernel
- More comments about correctness (Chris Wilson)
v4 (Jason Ekstrand):
- More comments (Chris Wilson)
v5 (Jason Ekstrand):
- Rebase on top of moving execbuf2 setup go QueueSubmit
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ever since the early days of the Vulkan driver, we've been setting up the
lists of relocations at EndCommandBuffer time. The idea behind this was to
move some of the CPU load out of QueueSubmit which the client is required
to lock around and into command buffer building which could be done in
parallel. Then QueueSubmit basically just becomes a bunch of execbuf2
calls.
Technically, this works. However, when you start to do more in QueueSubmit
than just execbuf2, you start to run into problems. In particular, if a
block pool is resized between EndCommandBuffer and QueueSubmit, the list of
anv_bo's and the execbuf2 object list can get out of sync. This can cause
problems if, for instance, you wanted to do relocations in userspace.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original reason for putting it in the batch_bo was to allow primaries
to share it across secondaries or something like that. However, the
relocation lists in secondary command buffers are are always left alone and
copied into the primary command buffer's relocation list. This means that
the offset really applies at the command buffer level and putting it in the
batch_bo doesn't make sense. This fixes a couple of potential bugs around
re-submission of command buffers that are not likely to be hit but are bugs
none the less.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds a little helper struct for storing everything we use to
build an execbuf2 call. Since the add_bo function really has nothing to do
with a command buffer, it makes sense to break it out a bit. This also
reduces some of the churn in the next commit.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The old version wasn't properly handling large addresses where we have to
sign-extend to get it into the "canonical form" expected by the hardware.
Also, the new version is capable of doing a clflush of the newly written
reloc if requested.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Since -1 is an invalid GPU address, this lets us know whether or not we
have a valid address for a buffer. We don't get a valid address until the
first time that buffer is used in an execbuf2 ioctl.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous implementation was being overly clever and using the
anv_bo::size field as its mutex. Scratch pool allocations don't happen
often, will happen at most a fixed number of times, and never happen in the
critical path (they only happen in shader compilation). We can make this
much simpler by just using the device mutex. This also means that we can
start using anv_bo_init_new directly on the bo and avoid setting fields
one-at-a-time.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
| |
This ensures that we're always setting all of the fields in anv_bo
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because our relocation processing happens at EndCommandBuffer time and
because RENDER_SURFACE_STATE objects may be shared by batches, we really
have no clue whatsoever what address is actually written to the relocation
offset in the BO. We need to stop making such claims to the kernel and
just let it relocate for us.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This puts the actual execbuf2 call in anv_batch_chain.c along with the
other relocation stuff.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This wrapper ensures that we always update all anv_bo::offset fields based
on the offsets returned by the kernel.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kristian H. Kristensen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
| |
When you fire up Dota2 on Haswell you get spammed with thousands of
"Implement Gen7 HZ ops" finishme's. The point of anv_finishme is to act as
a reminder that there is something left to implement. Printing it once
should be sufficient.
Signed-off-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Some of the details of this function are very confusing and have a long
history. We should document that history and this seems like the best
place to do it.
Signed-off-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At least on Sky Lake, after emitting 3DSTATE_CONSTANT_*, you are required
to re-emit the 3DSTATE_BINDING_TABLE_POINTERS packet for the corresponding
stage. If you don't, double-buffering may fail and you may get the wrong
constants. It turns out that you need to do this even if you have no push
constants to speak of or else the next 3DSTATE_CONSTANT packet you emit for
that stage may not work correctly.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
| |
Mesa uses limits.h elsewhere, and this makes is possible to
compile anv_allocator.c on Android.
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Such a surface is not possible on our hardware. Without this change, ISL
surface creation would fail with the next patch.
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Before we were caching the prog data but we weren't doing anything with
brw_stage_prog_data::param so anything with push constants wasn't getting
cached properly. This commit fixes that.
Signed-off-by: Jason Ekstrand <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98012
Reviewed-by: Timothy Arceri <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While we can simply calculate offsets to get to things such as the
prog_data and the key, it's much more user-friendly if there are just
pointers. Also, it's a bit more fool-proof.
While we're at it, we rework the pipeline cache API to use the
brw_stage_prog_data type directly.
Signed-off-by: Jason Ekstrand <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98012
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98012
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98012
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The case where we just want the loop to continue is INCOMPATIBLE_DRIVER
because that simply means that whatever FD we opened isn't a supported
Intel chip. Other error codes such as OUT_OF_HOST_MEMORY are actual errors
and we should be returning early in that case.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Turning this :
sampler state 0
Sampler Disable: false
Texture Border Color Mode: 0
LOD PreClamp Enable: 1
Base Mip Level: 0.000000
Mip Mode Filter: 0
Mag Mode Filter: 1
Min Mode Filter: 1
Texture LOD Bias: foo
Anisotropic Algorithm: 0
into this :
sampler state 0
Sampler Disable: false
Texture Border Color Mode: 0 (DX10/OGL)
LOD PreClamp Enable: 1 (OGL)
Base Mip Level: 0.000000
Mip Mode Filter: 0 (NONE)
Mag Mode Filter: 1 (LINEAR)
Min Mode Filter: 1 (LINEAR)
Texture LOD Bias: foo
Anisotropic Algorithm: 0 (LEGACY)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Sirisha Gandikota<[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Sirisha Gandikota<[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Sirisha Gandikota<[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This makes more sense than OUT_OF_HOST_MEMORY. Technically, you can
recover from a failed execbuf2 but the batch you just submitted didn't
fully execute so things are in an ill-defined state. The app doesn't want
to continue from that point anyway.
Signed-off-by: Jason Ekstrand <[email protected]>
Cc: "13.0" <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
| |
We require 12 bytes of headers but in some cases we just need 4.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we were creating the shader with a NULL ralloc context and then
trusting in blorp_compile_fs to clean it up. The only problem was that
blorp_compile_fs didn't clean up its context properly so we were leaking.
When I went to fix that, I realized that it couldn't because it has to
return the shader binary which is allocated off of that context and used by
the caller. The solution is to make blorp_compile_fs take a ralloc
context, allocate the nir_shaders directly off that context, and clean it
all up in whatever function creates the shader and calls blorp_compile_fs.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Cc: "12.0, 13.0" <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With dealing with rectangles in compressed images, you can have a width or
height that isn't a multiple of the corresponding compression block
dimension but only if that edge of your rectangle is on the edge of the
image. When we call convert_to_single_slice, it creates an 2-D image and a
set of tile offsets into that image. When detecting the right-edge and
bottom-edge cases, we weren't including the tile offsets so the assert
would misfire. This caused crashes in a few UE4 demos
Signed-off-by: Jason Ekstrand <[email protected]>
Reported-by: "Eero Tamminen" <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98431
Cc: "13.0" <[email protected]>
Tested-by: "Eero Tamminen" <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here brw_setup_vue_interpolation() is rewritten not to use the InterpQualifier
array in gl_fragment_program which will allow us to remove it.
This change also makes the code which is only used by gen4/5 more self contained
as it now has its own gen5_fragment_program struct rather than storing the map
in brw_context. This means the interpolation map will only get processed once
and will get stored in the in memory cache rather than being processed everytime
the fs changes.
Also by calling this from the fs compile code rather than from the upload code
and using the interpolation assigned there we can get rid of the
BRW_NEW_INTERPOLATION_MAP flag.
It might not seem ideal to add a gen5_fragment_program struct however by the end
of this series we will have gotten rid of all the brw_{shader_stage}_program
structs and replaced them with a generic brw_program struct so there will only
be two program structs which is better than what we have now.
V2: Don't remove BRW_NEW_INTERPOLATION_MAP from dirty_bit_map until the following
patch to fix build error.
V3 - Suggestions by Jason:
- name struct gen4_fragment_program rather than gen5_fragment_program
- don't use enum with memset()
- create interp mode set helper and simplify logic to call it
- add assert when calling function to show prog will never be NULL for
gen4/5 i.e. no Vulkan
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
When restoring something from shader cache we won't have and don't
want to create a nir_shader this change detaches the two.
There are other advantages such as being able to reuse the
shader info populated by GLSL IR.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
git history shows "abi_versions" was used from the outset.
Cc: <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98415
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
| |
With the isl_format_supports* helpers, we can now conveniently
report support for this format on Cherry View.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92925
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Vulkan has a multi-arch problem... The idea behind the Vulkan loader is
that you have a little json file on your disk that tells the loader where
to find drivers. The loader looks for these json files in standard
locations, and then goes and loads the my_driver.so's that they specify.
This allows you as a driver implementer to put their driver wherever on the
disk they want so long as the ICD points in the right place.
For a multi-arch system, however, you may have multiple libvulkan_intel.so
files installed that the loader needs to pick depending on architecture.
Since the ICD file format does not specify any architecture information,
you can't tell the loader where to find the 32-bit version vs. the 64-bit
version. The way that packagers have been dealing with this is to place
libvulkan_intel.so in the top level lib directory and provide just a name
(and no path) to the loader. It will then use the regular system search
paths and find the correct driver. While this solution works fine for
distro-installed Vulkan drivers, it doesn't work so well for user-installed
drivers because they may put it in /opt or $HOME/.local or some other more
exotic location. In this case, you can't use an ICD json file with just a
library name because it doesn't know where to find it; you also have to add
that to your library lookup path via LD_LIBRARY_PATH or similar.
This patch handles both use-cases by taking advantage of the fact that the
loader dlopen()s each of the drivers and, if one dlopen() calls fails, it
silently continues on to open other drivers. By suffixing the icd file, we
can provide two different json files: intel_icd.x86_64.json and
intel_icd.i686.json with different paths. Since dlopen() will only succeed
on the libvulkan_intel.so of the right arch, the loader will happily ignore
the others and load that one. This allows us to properly handle multi-arch
while still providing a full path so user installs will work fine.
I tested this on my Fedora 25 machine with 32 and 64-bit builds of our
Vulkan driver installed and 32 and 64-bit builds of crucible. It seems to
work just fine.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
|
|
|
| |
I can't see this being used anywhere.
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
This moves the shared code to a common subdirectory
and makes anv linked to that code instead of the copy
it was using.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
the WSI code should be now be clean for sharing.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Next task is to rename all the anv_ out of this,
and move to a common location
Reviewed-by: Jason Ekstrand <[email protected]>
|