| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows us to use the link-optimized shader for determining binding
table layouts and, more importantly, URB layouts. For apps running on
DXVK, this is extremely important as DXVK likes to declare max-size
inputs and outputs and this lets is massively shrink our URB space
requirements.
VkPipeline-db results (Batman pipelines only) on KBL:
total instructions in shared programs: 820403 -> 790008 (-3.70%)
instructions in affected programs: 273759 -> 243364 (-11.10%)
helped: 622
HURT: 42
total spills in shared programs: 8449 -> 5212 (-38.31%)
spills in affected programs: 3427 -> 190 (-94.46%)
helped: 607
HURT: 2
total fills in shared programs: 11638 -> 6067 (-47.87%)
fills in affected programs: 5879 -> 308 (-94.76%)
helped: 606
HURT: 3
Looking at shaders by hand, it makes the URB between TCS and TES go from
containing 32 per-vertex varyings per tessellation shader pair to a more
reasonable 8-12. For a 3-vertex patch, that's at least half the URB
space no matter how big the patch section is.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We want these to be set as close to the final compile as possible so
that they are guaranteed to happen after nir_shader_gather_info is
called. The next commit is going to move nir_shader_gather_info to
after the linking step which makes this necessary.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This commit makes three changes. One is to only walk the descriptors once
and set bind map sizes at the same time as filling out the entries. The
second is to make the pass additive so that we can put stuff in the bind
map before applying the pipeline layout. Third, we switch to using
designated initializers.
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Because lower_ycbcr gets called before apply_pipeline_layout, the
indices are all logical and the binding layout HW size is actually too
big for the bounds check. We should just use the regular logical array
size instead.
Fixes: f3e91e78a33 "anv: add nir lowering pass for ycbcr textures"
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
`device` is used 2 lines below, even visible in the diff context printed.
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 4434591bf56a6b0 caused substantially more URB messages in
geometry and tessellation shaders. Before we can really enable this
sort of optimization, We either need some way of combining them back
together into vectors or we need to do cross-stage vector element
elimination without splitting everything into scalars.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510
Fixes: 4434591bf56a6 "intel/nir: Call nir_lower_io_to_scalar_early"
Acked-by: Kenneth Graunke <[email protected]>
Tested-by: Mark Janes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One of the reasons we didn't notice that R24_UNORM_X8_TYPELESS
destinations were broken was that an earlier layer was swapping it
out for B8G8R8A8_UNORM. That made Z24X8 -> Z24X8 blits work.
However, R32_FLOAT -> R24_UNORM_X8_TYPELESS was still totally broken.
The old code only considered one format at a time, without thinking
that format conversion may need to occur.
This patch moves the translation out to a place where it can consider
both formats. If both are Z24X8, we continue using B8G8R8A8_UNORM to
avoid having to do shader math workarounds. If we have a Z24X8
destination, but a non-matching source, we use our shader hacks to
actually render to it properly.
Fixes: 804856fa5735164cc0733ad0ea62adad39b00ae2 (intel/blorp: Handle more exotic destination formats)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The hardware doesn't support rendering to R24_UNORM_X8_TYPELESS, so
Jason decided to fake it with a bit of shader math and R32_UNORM RTs.
The only problem is that R32_UNORM isn't renderable either...so we've
just traded one bad format for another.
This patch makes us use R32_UINT instead.
Fixes: 804856fa5735164cc0733ad0ea62adad39b00ae2 (intel/blorp: Handle more exotic destination formats)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
The Vulkan 1.1.82 spec flipped the order to better match D3D.
Cc: [email protected]
Reviewed-by: Iago Toral Quiroga <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that all the build scripts are compatible with both Python 2 and 3,
we can flip the switch and tell Meson to use the latter.
Since Meson already depends on Python 3 anyway, this means we don't need
two different Python stacks to build Mesa.
Signed-off-by: Mathieu Bridon <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the SIMD16 Gen4-5 fragment shader payload contains source depth
(g2-3), destination stencil (g4), and destination depth (g5-6), the
single register of stencil makes the destination depth unaligned.
We were generating this instruction in the RT write payload setup:
mov(16) m14<1>F g5<8,8,1>F { align1 compr };
which is illegal, instructions with a source region spanning more than
one register need to be aligned to even registers. This is because the
hardware implicitly does (nr | 1) instead of (nr + 1) when splitting the
compressed instruction into two mov(8)'s.
I believe this would cause the hardware to load g5 twice, replicating
subspan 0-1's destination depth to subspan 2-3. This showed up as 2x2
artifact blocks in both TIS-100 and Reicast.
Normally, we rely on the register allocator to even-align our virtual
GRFs. But we don't control the payload, so we need to lower SIMD widths
to make it work. To fix this, we teach lower_simd_width about the
restriction, and then call it again after lower_load_payload (which is
what generates the offending MOV).
Fixes: 8aee87fe4cce0a883867df3546db0e0a36908086 (i965: Use SIMD16 instead of SIMD8 on Gen4 when possible.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107212
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=13728
Reviewed-by: Jason Ekstrand <[email protected]>
Tested-by: Diego Viola <[email protected]>
|
|
|
|
|
|
|
|
| |
Cc: Jason Ekstrand <[email protected]>
Fixes: 5b196f39bddc689742d3 "anv/pipeline: Compile to NIR in compile_graphics"
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
Fixes: 6a60beba4089315685b8 "intel/tools: Add an error state to aub translator"
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Python 3 doesn't call objects __cmp__() methods any more to compare
them. Instead, it requires implementing the rich comparison methods
explicitly: __eq__(), __ne(), __lt__(), __le__(), __gt__() and __ge__().
Fortunately Python 2 also supports those.
This commit only implements the comparison methods which are actually
used by the build scripts.
Signed-off-by: Mathieu Bridon <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107487
Fixes: 4334196ab325c6w ("intel: tools: simplify meson build")
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
| |
This change helps with some of the dEQP-VK.wsi.android.* tests that
try to create swapchain with using such formats.
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
| |
Remove the if tools condition and just put it through the install:
parameter.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Since we don't support streaming an aub file, we can drop the decoding
status enum.
v2: include stdbool (Eric)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Up to now we've been lucky that the buffer returned was always exactly
at the address we requested.
Fixes: 144b40db5411 ("intel: aubinator: drop the 1Tb GTT mapping")
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tarball distribution is done through "make distcheck". We include the
meson targets also into autotools so they won't fail when building
from the tarball.
Fixes: 6a60beba408 ("intel/tools: Add an error state to aub translator")
Cc: Jason Ekstrand <[email protected]>
Cc: Lionel Landwerlin <[email protected]>
Cc: Dylan Baker <[email protected]>
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
| |
This appears to help the Aztec Ruins benchmark by about 2% on my Kaby
Lake gt2 laptop.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This leaves us with a series of little anv_pipeline_compile_* functions
which each take a compiler object, a mem_ctx, the stage to compile, and
the previous stage for VUE linking purposes. Some of them do
interesting things but most are little more than wrappers around
brw_compile_*.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This breaks compilation up a bit into "link" and "compile". In the
"link" stage, new anv_pipeline_link_* helpers are called which are
responsible for setting up the binding table and doing anything needed
to properly link with the next stage in the pipeline if one exists.
They are called in reverse order starting with the fragment shader so
you can assume linking in later stages is already done.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
| |
This pulls the SPIR-V to NIR step out into common code.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
| |
We can set active_stages much more directly and then it's just candy
around setting pipeline->stages[stage].
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Instead of hashing each stage separately (and TES and TCS together), we
hash the entire pipeline. This means we'll get fewer cache hits if
they, for instance, re-use the same VS over and over again but it also
means we can now safely do cross-stage optimizations.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
Instead of having each anv_pipeline_compile_* function populate the
shader key, make it part of the anv_pipeline_stage struct and fill it
out up-front.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During code review, Jason pointed out that:
2b3064c0731 "i965, anv: Use INTEL_DEBUG for disk_cache driver flags"
Didn't account for INTEL_SCALER_* environment variables.
To fix this, let the compiler return the disk_cache driver flags.
Another possible fix would be to pull the INTEL_SCALER_* into
INTEL_DEBUG bits, but as we are currently using 41 of 64 bits, I
didn't think it was a good use of 4 more of these bits. (5 since
INTEL_PRECISE_TRIG needs to be accounted for as well.)
Cc: Jason Ekstrand <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shader time hard codes an index of the shader time buffer within the
gen program.
In order to support shader time in the disk shader cache, we'd need to
add the shader time index into the program key. This should work, but
probably is not worth it for this particular debug feature.
Therefore, let's just disable the disk shader cache if the shader time
debug feature is used.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106382
Fixes: 96fe36f7acc "i965: Enable disk shader cache by default"
Cc: Eero Tamminen <[email protected]>
Cc: Kenneth Graunke <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
They don't really do anything interesting, but it's more consistent this
way.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Instead of just looking at the number of color attachments, look at
which ones are actually used by the subpass. This lets us potentially
throw away chunks of the fragment shader. In DXVK, for example, all
subpasses have 8 attachments and most are VK_ATTACHMENT_UNUSED so this
is very helpful in that case.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The back-end compiler emits the number of color writes specified by
wm_prog_key::nr_color_regions regardless of what nir_store_outputs we
have. Once we've gone through and figured out which render targets
actually exist and are written by the shader, we should restrict the key
to avoid extra RT write messages.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
With the new deref instructions, we have to keep the modes consistent
between the derefs and the variables they reference. Since we remove
outputs by changing them to local variables, we need to run the fixup
pass to fix the modes.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shader-db results on Kaby Lake:
total instructions in shared programs: 15166953 -> 15073611 (-0.62%)
instructions in affected programs: 2390284 -> 2296942 (-3.91%)
helped: 16469
HURT: 505
total loops in shared programs: 4954 -> 4951 (-0.06%)
loops in affected programs: 3 -> 0
helped: 3
HURT: 0
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The NIR nir_lower_io_arrays_to_elements pass attempts to split I/O
variables which are arrays or matrices into a sequence of separate
variables. This can help link-time optimization by allowing us to
remove varyings at a more granular level.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15177645 -> 15168494 (-0.06%)
instructions in affected programs: 79857 -> 70706 (-11.46%)
helped: 392
HURT: 0
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise, only the first vec4 of a matrix or other complex type will
get marked as flat and we'll interpolate the others. This was caught by
a dEQP test which started failing because it did a SSO vs. non-SSO
comparison. Previously, we did the interpolation wrong consistently in
both versions. However, with one of Tim Arceri's NIR linkingpatches, we
started splitting the matrix input into vectors at link time in the
non-SSO version and it started getting correctly interpolated which
didn't match the broken SSO version. As of this commit, they both get
correctly interpolated.
Fixes: e61cc87c757f8bc "i965/fs: Add a flat_inputs field to prog_data"
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|