| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Now that we're tracking aux properly per-slice, we can enable this for
applications which actually care.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit completely reworks aux tracking. This includes a number of
somewhat distinct changes:
1) Since we are no longer fast-clearing multiple slices, we only need
to track one fast clear color and one fast clear type.
2) We store two bits for fast clear instead of one to let us
distinguish between zero and non-zero fast clear colors. This is
needed so that we can do full resolves when transitioning to
PRESENT_SRC_KHR with gen9 CCS images where we allow zero clear
values in all sorts of places we wouldn't normally.
3) We now track compression state as a boolean separate from fast clear
type and this is tracked on a per-slice granularity.
The previous scheme had some issues when it came to individual slices of
a multi-LOD images. In particular, we only tracked "needs resolve"
per-LOD but you could do a vkCmdPipelineBarrier that would only resolve
a portion of the image and would set "needs resolve" to false anyway.
Also, any transition from an undefined layout would reset the clear
color for the entire LOD regardless of whether or not there was some
clear color on some other slice.
As far as full/partial resolves go, he assumptions of the previous
scheme held because the one case where we do need a full resolve when
CCS_E is enabled is for window-system images. Since we only ever
allowed X-tiled window-system images, CCS was entirely disabled on gen9+
and we never got CCS_E. With the advent of Y-tiled window-system
buffers, we now need to properly support doing a full resolve of images
marked CCS_E.
v2 (Jason Ekstrand):
- Fix an bug in the compressed flag offset calculation
- Treat 3D images as multi-slice for the purposes of resolve tracking
v3 (Jason Ekstrand):
- Set the compressed flag whenever we fast-clear
- Simplify the resolve predicate computation logic
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Even though the blorp pass looks a bit on the sketchy side, the end
result in the Vulkan driver is very nice. Instead of having this weird
case where you do a fast clear and then maybe have to resolve, we just
do the ambiguate and are done with it. The ambiguate does exactly what
we want of setting all the CCS values to 0 which puts it into the
pass-through state.
This should also improve performance a bit in certain cases. For
instance, if we did a transition from UNDEFINED to GENERAL for a surface
that doesn't have CCS enabled all the time, we would end up doing a
fast-clear and then a full resolve which ends up touching every byte in
the main surface as well as the CCS. With the ambiguate pass, that
transition only touches the CCS.
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that this isn't a multi-case if and it's just the one case, it's a
bit clearer if the condition is just part of the if instead of being
pulled out into a boolean variable.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass performs an "ambiguate" operation on a CCS-compressed surface
by manually writing zeros into the CCS. On gen8+, ISL gives us a fairly
detailed notion of how the CCS is laid out so this is fairly simple to
do. On gen7, the CCS tiling is quite crazy but that isn't an issue
because we can only do CCS on single-slice images so we can just blast
over the entire CCS buffer if we want to.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current strategy we use for managing resolves has an issues where we
track clear colors and the need for resolves per-LOD but we still allow
resolves of only a subset of the slices in any given LOD and doing so
sets the "needs resolve" flag for that LOD to false while leaving the
remaining layers unresolved. This patch is only the first step and does
not, by itself fix anything. However, it's fairly self-contained and
splitting it out means any performance regressions should bisect to this
nice obvious commit rather than to the giant "rework aux tracking"
commit.
Nanley and I did some testing and none of the applications we tested
even tried to fast-clear anything other than the first slice of an
image. The test was done by adding a printf right before we call
blorp_fast_clear if we were every going to touch any slice other than
the first with a fast-clear. Due to the way the original code was
structured, this would not have included applications which only cleared
a subset of layers. The applications tested were:
* All Sascha Willems demos
* Aztec Ruins
* Dota 2
* The Talos Principle
* Mad Max
* Warhammer 40,000: Dawn of War III
* Serious Sam Fusion 2017: BFE
While not the full list of shipping applications, it's a pretty good
spread and covers most of the engines we've seen running on our driver.
If this is ever shown to be a performance problem in the future, we can
reconsider our strategy.
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Currently, this helper does nothing but we call it every place where an
image is written through the render pipeline. This will allow us to
properly mark the aux state so that we can handle resolves correctly.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
| |
This is copied and pasted from the similar macro we added to ISL.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This moves it to being based on layout_to_aux_usage instead of being
hard-coded based on bits of a priori knowledge of how transitions
interact with layouts. This conceptually simplifies things because
we're now using layout_to_aux_usage and layout_supports_fast_clear to
make resolve decisions so changes to those functions will do what one
expects.
There is a potential bug with window system integration on gen9+ where
we wouldn't do a resolve when transitioning to the PRESENT_SRC layout
because we just assume that everything that handles CCS_E can handle it
all the time. When handing a CCS_E image off to the window system, we
may need to do a full resolve if the window system does not support the
CCS_E modifier. The only reason why this hasn't been a problem yet is
because we don't support modifiers in Vulkan WSI and so we always get X
tiling which implies no CCS on gen9+. This patch doesn't actually fix
that bug yet but it takes us the first step in that direction by making
us actually pick the correct resolve op. In order to handle all of the
cases, we need more detailed aux tracking.
v2 (Jason Ekstrand):
- Make a few more things const
- Use the anv_fast_clear_support enum
v3 (Jason Ekstrand):
- Move an assert and add a better comment
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
| |
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2 (Jason Ekstrand):
- Return an enum instead of a boolean
v3 (Jason Ekstrand):
- Return ANV_FAST_CLEAR_NONE instead of false (Topi)
- Rename ANV_FAST_CLEAR_ANY to ANV_FAST_CLEAR_DEFAULT_VALUE
- Add documentation for the enum values
v4 (Jason Ekstrand):
- Remove a dead comment
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
| |
This got lost in all of the aspect vs. plane rebasing of YCBCR.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
| |
If the function gets passed ANV_AUX_USAGE_DEFAULT, it still has the old
behavior of setting ISL_AUX_USAGE_NONE for depth/stencil which is what
we want for blits/copies.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This replaces image_fast_clear and ccs_resolve with two new helpers that
simply perform an isl_aux_op whatever that may be on CCS or MCS. This
is a bit cleaner as it separates performing the aux operation from which
blorp helper we have to call to do it.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Right now, we have different entrypoints and enums in blorp for these
different operations. This provides us a central enum which we can
begin to transition to.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a build option to control building some of the misc tools we
have. Also set the executables to install, presumably you want
that if you're asking for the build.
v2: set 'install:' to the with_tools value, not true (Jordan)
handle 'all' in a the comma list (Dylan)
Add freedreno's tools (Dylan)
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
| |
This has been unused since 8761a04d0d93.
Reviewed-by: Elie Tournier <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The loop goes through the list of enabled extensions marking them as
enabled in the list, but this relies on every other extension being
initialized to false by default.
This bug would make us, for example, advertise certain device extension
entry points as available even when the corresponding extensions had
not been enabled.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Fixes: abc62282b5c "anv: Add a per-device table of enabled extensions"
Cc: "18.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise loop unrolling will fail to see the actual cost of
the unrolling operations when the loop body contains 64-bit integer
instructions, and very specially when the divmod64 lowering applies,
since its lowering is quite expensive.
Without this change, some in-development CTS tests for int64
get stuck forever trying to register allocate a shader with
over 50K SSA values. The large number of SSA values is the result
of NIR first unrolling multiple seemingly simple loops that involve
int64 instructions, only to then lower these instructions to produce
a massive pile of code (due to the divmod64 lowering in the unrolled
instructions).
With this change, loop unrolling will see the loops with the int64
code already lowered and will realize that it is too expensive to
unroll.
v2: Run nir_algebraic first so we can hopefully get rid of some of
the int64 instructions before we even attempt to lower them.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
Ken called this out in review, but it seems I forgot to make the change.
I noticed that the control flow annotations in the fragment shader
disassembly of tests/shaders/glsl-fs-loop-continue.shader_test were not
correct, and moving this line to the correct place fixes it.
|
|
|
|
|
|
| |
The count field is in terms of dwords and not bytes. In
7d4007d58ab7c0c1796e116b55814f8be4e699a9, I fixed one instance
of this but missed another.
|
|
|
|
| |
Trivial. DS is TES, HS is TCS.
|
|
|
|
|
|
|
|
|
|
| |
If we ever hit this edge-case, it can theoretically cause problem for
CNL because we could end up changing render targets without re-emitting
3DSTATE_MULTISAMPLE which is part of the pipeline. Just get rid of the
edge case.
Cc: [email protected]
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a shader only writes to an output via a constant initializer we
need to lower it before we call nir_remove_dead_variables so that
this pass sees the stores from the initializer and doesn't kill the
output.
Fixes test failures in new work-in-progress CTS tests:
dEQP-VK.spirv_assembly.instruction.graphics.variable_init.output_vert
dEQP-VK.spirv_assembly.instruction.graphics.variable_init.output_frag
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Emit it on all platforms since gen7.
Signed-off-by: Rafael Antognolli <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allows nir drivers to either use a single or dual locations for
vs double inputs.
i965 uses dual locations for both OpenGL and Vulkan drivers, for
now gallium OpenGL drivers only use a single location.
The following patch will also make use of this option when
calling nir_shader_gather_info().
Reviewed-by: Karol Herbst <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
First we move double_inputs_read into a vs struct in the union,
double_inputs_read is only used for vs inputs so this will
save space and also allows us to add a new double_inputs field.
We add the new field because c2acf97fcc9b changed the behaviour
of double_inputs_read, and while it's no longer used to track
actual reads in i965 we do still want to track this for gallium
drivers.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
I got reviews and fixed the patches locally, but ended up merging the
ones that I sent originally to the list. This patch fixes those
mistakes.
Fixes: 78c125af3904c539ea69bec2dd9fdf7a5162854f
Signed-off-by: Rafael Antognolli <[email protected]>
Cc: Jason Ekstrand <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The GPU hang caused by push constants is apparently fixed, so let's
enable them again.
Signed-off-by: Rafael Antognolli <[email protected]>
Cc: "18.0" <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to the GL driver, ignore 3DSTATE_CONSTANT_* packets when doing a
context restore.
Signed-off-by: Rafael Antognolli <[email protected]>
Cc: Jason Ekstrand <[email protected]>
Cc: "18.0" <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
It no longer has any users.
Suggested-by: Jason Ekstrand <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We need to access the pipeline layout to compute correct dynamic
offsets for dyamic UBO/SSBO descriptors when we emit draw commands.
Instead of taking it from the pipeline object, store the layout
in the command buffer pipeline state.
Suggested-by: Jason Ekstrand <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Vulkan spec states that VkPipelineLayout objects must not be
destroyed while any command buffer that uses them is in the recording
state, but it permits them to be destroyed otherwise. This means that
applications are allowed to free pipeline layouts after command recording
is finished even if there are pipeline objects that still exist and were
created with these layouts.
There are two solutions to this, one is to use reference counting on
pipeline layout objects. The other is to avoid holding references to
pipeline layouts where they are not really needed.
This patch takes a step towards the second option by making the
pipeline shader compile code take pipeline layout from the
VkGraphicsPipelineCreateInfo provided rather than the pipeline
object.
A follow-up patch will remove any remaining uses of the layout field
so we can remove it from the pipeline object and avoid the need
for reference counting.
v2: Use ANV_FROM_HANDLE, remove unnecessary braces (Jason)
Suggested-by: Jason Ekstrand <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The spec states that descriptor set layouts can be destroyed almost
at any time:
"VkDescriptorSetLayout objects may be accessed by commands that
operate on descriptor sets allocated using that layout, and those
descriptor sets must not be updated with vkUpdateDescriptorSets
after the descriptor set layout has been destroyed. Otherwise,
descriptor set layouts can be destroyed any time they are not in
use by an API command."
v2: allocate off the device allocator with DEVICE scope (Jason)
Fixes the following work-in-progress CTS tests:
dEQP-VK.api.descriptor_set.descriptor_set_layout_lifetime.graphics
dEQP-VK.api.descriptor_set.descriptor_set_layout_lifetime.compute
Suggested-by: Jason Ekstrand <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Without this, we may end up dereferencing blend before we check for
binding->index != UINT32_MAX. However, Vulkan allows the blend state to
be NULL so long as you don't have any color attachments. This fixes a
segfault when running The Talos Principal.
Fixes: 12f4e00b69e724a23504b7bd3958fb75dc462950
Cc: [email protected]
Reviewed-by: Alex Smith <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Sort the output to ensure build reproducibility
Signed-off-by: Maxin B. John <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Fixes: 0ab04ba979b ("anv: Use python to generate ICD json files")
Reviewed-by: Dylan Baker <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
18fde36ced4279f2577097a1a7d31b55f2f5f141 changed the way temporary
registers were allocated in lower_integer_multiplication so that we
allocate regs_written(inst) space and keep the stride of the original
destination register. This was to ensure that any MUL which originally
followed the CHV/BXT integer multiply regioning restrictions would
continue to follow those restrictions even after lowering. This works
fine except that I forgot to reset the register file to VGRF so, even
though they were assigned a number from alloc.allocate(), they had the
wrong register file. This caused some GLES 3.0 CTS tests to start
failing on Sandy Bridge due to attempted reads from the MRF:
ES3-CTS.functional.shaders.precision.int.highp_mul_fragment.snbm64
ES3-CTS.functional.shaders.precision.int.mediump_mul_fragment.snbm64
ES3-CTS.functional.shaders.precision.int.lowp_mul_fragment.snbm64
ES3-CTS.functional.shaders.precision.uint.highp_mul_fragment.snbm64
ES3-CTS.functional.shaders.precision.uint.mediump_mul_fragment.snbm64
ES3-CTS.functional.shaders.precision.uint.lowp_mul_fragment.snbm64
This commit remedies this problem by, instead of copying inst->dst and
overwriting nr, just make a new register and set the region to match
inst->dst.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103626
Fixes: 18fde36ced4279f2577097a1a7d31b55f2f5f141
Cc: "17.3" <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
Fixes: dd088d4bec7 ("anv/extensions: Generate a header file with
extension tables")
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
| |
The count field is in terms of dwords and not bytes.
|
|
|
|
|
|
|
|
| |
Looks like checking both sources was intended, instead of the first one
twice. Found with Coccinelle, coccinellery/xand/xand.cocci semantic patch.
Signed-off-by: Grazvydas Ignotas <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
| |
Tested-by: Józef Kucia <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
Cc: "18.0" <[email protected]>
|
|
|
|
|
|
|
|
| |
While we're here, make it an anv_address.
Tested-by: Józef Kucia <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
Cc: "18.0" <[email protected]>
|
|
|
|
|
|
| |
Tested-by: Józef Kucia <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
Cc: "18.0" <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We were already doing this for some packets to keep the lines shorter.
We may as well just do it for all of them.
Tested-by: Józef Kucia <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
Cc: "18.0" <[email protected]>
|