| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
This change helps with some of the dEQP-VK.wsi.android.* tests that
try to create swapchain with using such formats.
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
| |
Remove the if tools condition and just put it through the install:
parameter.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Since we don't support streaming an aub file, we can drop the decoding
status enum.
v2: include stdbool (Eric)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
| |
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Up to now we've been lucky that the buffer returned was always exactly
at the address we requested.
Fixes: 144b40db5411 ("intel: aubinator: drop the 1Tb GTT mapping")
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tarball distribution is done through "make distcheck". We include the
meson targets also into autotools so they won't fail when building
from the tarball.
Fixes: 6a60beba408 ("intel/tools: Add an error state to aub translator")
Cc: Jason Ekstrand <[email protected]>
Cc: Lionel Landwerlin <[email protected]>
Cc: Dylan Baker <[email protected]>
Signed-off-by: Andres Gomez <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
| |
This appears to help the Aztec Ruins benchmark by about 2% on my Kaby
Lake gt2 laptop.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This leaves us with a series of little anv_pipeline_compile_* functions
which each take a compiler object, a mem_ctx, the stage to compile, and
the previous stage for VUE linking purposes. Some of them do
interesting things but most are little more than wrappers around
brw_compile_*.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This breaks compilation up a bit into "link" and "compile". In the
"link" stage, new anv_pipeline_link_* helpers are called which are
responsible for setting up the binding table and doing anything needed
to properly link with the next stage in the pipeline if one exists.
They are called in reverse order starting with the fragment shader so
you can assume linking in later stages is already done.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
| |
This pulls the SPIR-V to NIR step out into common code.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
| |
We can set active_stages much more directly and then it's just candy
around setting pipeline->stages[stage].
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Instead of hashing each stage separately (and TES and TCS together), we
hash the entire pipeline. This means we'll get fewer cache hits if
they, for instance, re-use the same VS over and over again but it also
means we can now safely do cross-stage optimizations.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
| |
Instead of having each anv_pipeline_compile_* function populate the
shader key, make it part of the anv_pipeline_stage struct and fill it
out up-front.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During code review, Jason pointed out that:
2b3064c0731 "i965, anv: Use INTEL_DEBUG for disk_cache driver flags"
Didn't account for INTEL_SCALER_* environment variables.
To fix this, let the compiler return the disk_cache driver flags.
Another possible fix would be to pull the INTEL_SCALER_* into
INTEL_DEBUG bits, but as we are currently using 41 of 64 bits, I
didn't think it was a good use of 4 more of these bits. (5 since
INTEL_PRECISE_TRIG needs to be accounted for as well.)
Cc: Jason Ekstrand <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shader time hard codes an index of the shader time buffer within the
gen program.
In order to support shader time in the disk shader cache, we'd need to
add the shader time index into the program key. This should work, but
probably is not worth it for this particular debug feature.
Therefore, let's just disable the disk shader cache if the shader time
debug feature is used.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106382
Fixes: 96fe36f7acc "i965: Enable disk shader cache by default"
Cc: Eero Tamminen <[email protected]>
Cc: Kenneth Graunke <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
They don't really do anything interesting, but it's more consistent this
way.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
| |
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Instead of just looking at the number of color attachments, look at
which ones are actually used by the subpass. This lets us potentially
throw away chunks of the fragment shader. In DXVK, for example, all
subpasses have 8 attachments and most are VK_ATTACHMENT_UNUSED so this
is very helpful in that case.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The back-end compiler emits the number of color writes specified by
wm_prog_key::nr_color_regions regardless of what nir_store_outputs we
have. Once we've gone through and figured out which render targets
actually exist and are written by the shader, we should restrict the key
to avoid extra RT write messages.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
| |
With the new deref instructions, we have to keep the modes consistent
between the derefs and the variables they reference. Since we remove
outputs by changing them to local variables, we need to run the fixup
pass to fix the modes.
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shader-db results on Kaby Lake:
total instructions in shared programs: 15166953 -> 15073611 (-0.62%)
instructions in affected programs: 2390284 -> 2296942 (-3.91%)
helped: 16469
HURT: 505
total loops in shared programs: 4954 -> 4951 (-0.06%)
loops in affected programs: 3 -> 0
helped: 3
HURT: 0
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The NIR nir_lower_io_arrays_to_elements pass attempts to split I/O
variables which are arrays or matrices into a sequence of separate
variables. This can help link-time optimization by allowing us to
remove varyings at a more granular level.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15177645 -> 15168494 (-0.06%)
instructions in affected programs: 79857 -> 70706 (-11.46%)
helped: 392
HURT: 0
Reviewed-by: Timothy Arceri <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise, only the first vec4 of a matrix or other complex type will
get marked as flat and we'll interpolate the others. This was caught by
a dEQP test which started failing because it did a SSO vs. non-SSO
comparison. Previously, we did the interpolation wrong consistently in
both versions. However, with one of Tim Arceri's NIR linkingpatches, we
started splitting the matrix input into vectors at link time in the
non-SSO version and it started getting correctly interpolated which
didn't match the broken SSO version. As of this commit, they both get
correctly interpolated.
Fixes: e61cc87c757f8bc "i965/fs: Add a flat_inputs field to prog_data"
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Gen10+ has an additional bit in MI_BATCH_BUFFER_END to signal the end
of the context image.
We select the largest size for the context image regardless of the
generation.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In both Python 2 and 3, zlib.Compress.compress() takes a byte string,
and returns a byte string as well.
In Python 2, the script was working because:
1. string literalls were byte strings;
2. opening a file in unicode mode, reading from it, then passing the
unicode string to compress() would automatically encode to a byte
string;
On Python 3, the above two points are not valid any more, so:
1. zlib.Compress.compress() refuses the passed unicode string;
2. compressed_data, defined as an empty unicode string literal, can't be
concatenated with the byte string returned by compress();
This commit fixes this by explicitly using byte strings where
appropriate, so that the script works on both Python 2 and 3.
Signed-off-by: Mathieu Bridon <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The XML parser wants byte strings, not unicode strings.
In both Python 2 and 3, opening a file without specifying the mode will
open it for reading in text mode ('r').
On Python 2, the read() method of the file object will return byte
strings, while on Python 3 it will return unicode strings.
Explicitly specifying the binary mode ('rb') makes the behaviour
identical in both Python 2 and 3, returning what the XML parser
expects.
Signed-off-by: Mathieu Bridon <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In Python 2, iterating over a byte-string yields single-byte strings,
and we can pass them to ord() to get the corresponding integer.
In Python 3, iterating over a byte-string directly yields those
integers.
Transforming the byte string into a bytearray gives us a list of the
integers corresponding to each byte in the string, removing the need to
call ord().
This makes the script compatible with both Python 2 and 3.
Signed-off-by: Mathieu Bridon <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
| |
Fixes VK-GL-CTS CL#2567
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
The hardware doesn't support byte immediates, so similar to setup_imm_df()
for doubles, these helpers work by loading the constant value into a
VGRF.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gen 11 workarounds table #2056 WABTPPrefetchDisable suggests to
disable prefetching of binding tables for ICLLP A0 and B0
steppings. It fixes multiple gpu hangs in
ext_framebuffer_multisample* tests on ICLLP B0 h/w.
Anuj: Add comments and commit message.
Add gen 11 checks in the code.
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
| |
The pass can create a temporary result for the instruction and then
moves from it to the original destination, however, if the original
instruction was predicated, the mov has to be predicated as well.
Reviewed-by: Jose Maria Casanova Crespo <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Until now, we had separate passes for lowering gl_PatchVerticesIn to
a statically known constant (for TES inputs when linked against a TCS),
and a uniform in the other cases. Annoyingly, one had to be run before
nir_lower_system_values, and the other afterward. This simplified the
passes, but made life painful for the callers.
This patch combines both into a single pass. If you give it a non-zero
static count, it uses that. If you give it Mesa state slots, it turns
it back into a built-in uniform. Otherwise, it does nothing.
This also moves the i965 uniform lowering out to shared code.
v2: Make token arrays const.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
| |
These are lowered by brw_nir_lower_vs_inputs(). If they weren't, we
would have already hit the unreachable() in emit_system_values_block().
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The various base addresses are simply addresses. There may or may not
be a buffer located at those addresses. So, it doesn't make much sense
to request one. Just save the raw address so we can add it later, when
asking about BOs at the final <base + offset> address.
Suggested-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Normally, i965 programs STATE_BASE_ADDRESS every batch, and puts all
state for a given base in a single buffer.
I'm working on a prototype which emits STATE_BASE_ADDRESS only once at
startup, where each base address is a fixed 4GB region of the PPGTT.
State may live in many buffers in that 4GB region, even if there isn't
a buffer located at the actual base address itself.
To handle this, we need to save the STATE_BASE_ADDRESS values across
multiple batches, rather than assuming we'll see the command each time.
Then, each time we see a pointer, we need to ask the driver for the BO
map for that data. (We can't just use the map for the base address, as
state may be in multiple buffers, and there may not even be a buffer
at the base address to map.)
v2: Fix things caught in review by Lionel:
- Drop bogus bind_bo.size check.
- Drop "get the BOs again" code - we just get the BOs as needed
- Add a message about interface descriptor data being unavailable
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
CovID: 1438132
Fixes: a99c9e63a07477634ab73 "anv: finish the binding_table_pool on
destroyDevice when use_softpin"
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Jose Maria Casanova Crespo <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
|
|
|
|
|
|
|
|
| |
We might fail on master node drm fd because we won't have the right
permissions.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|