| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Meta was using it before it was set. I suspect we typically don't
want to dump meta shaders, so just set it to false in the beginning.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
Debugging a shader-db reported cycle count regression from the tex
coalescing, I eventually figured out that the texture latencies were
totally bogus. Really fixing it will probably involve mirroring
vc4_qir_schedule.c's texture fifo management here.
|
|
|
|
|
|
|
|
|
|
|
| |
This isn't as complete as I would like (can't merge interpolation because
of the implicit r5 dependency, doesn't work with control flow), but this
was cheap and easy.
Improves 3DMMES Taiji performance by 1.15353% +/- 0.299896% (n=29, 16)
total instructions in shared programs: 99810 -> 99059 (-0.75%)
instructions in affected programs: 10705 -> 9954 (-7.02%)
|
|
|
|
|
| |
For texturing, there won't be a fixed limit on how many writes there are,
so we need to compute uses up front.
|
|
|
|
|
| |
The dead code elimination wants it to be safe, and I actually got
segfaults due to it being unsafe with the new coalescing pass.
|
|
|
|
|
|
| |
The VPM write logic will be basically the same as the texture coordinate
write logic we need, and it's not really related to the VPM read logic
other than the reuse of the use_count array.
|
|
|
|
|
| |
For now we're still just generating MOVs, but this will let us fold into
other ops in the future. No difference on shader-db.
|
|
|
|
|
|
| |
Every caller was dereffing the qinst, and this will let us make the number
of sources vary depending on the destination of the qinst so that we can
have general ALU ops that store to tex_[strb] and get an implicit uniform.
|
|
|
|
|
|
| |
This may have made a tiny bit of sense when we had one 4-arg inst per
shader, but if we only ever put 2 things in, having a pointer to 2 things
almost every instruction is pointless indirection.
|
|
|
|
|
| |
This was used originally for unorm4x8 packs, but we now represent those as
a series of packed movs.
|
|
|
|
|
|
|
|
| |
This matches what NVIDIA and AMD hardware expose, as well as what Intel
hardware supports.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
This matches the capabilities of the hardware.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Offsets that don't fit into 4 bits need to force gather_po to be
selected. Adjust the logic so that this happens.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we had an OFFSET_VALUE source for logical texture instructions
that was intended to mean exactly what it says, "offset". In reality, we
only fully used it for tg4 offsets. We used offset_value.file == IMM to
mean, "you have a constant offset, go look in instr->offset" and didn't
actually use the contents of the register at all in that case except for
in nir_emit_texture where we used it as a temporary before we copy it into
instr->offset.
This commit renames OFFSET_VALUE to TG4_OFFSET and restricts its usage to
indirect tg4 offsets only. The nir_emit_texture code is refactored so that
we explicitly build a header_bits value which is placed in instr->offset
and the constant offset values (both for tg4 and regular texture
operations) are used to construct header_bits and don't go through the
offset source at all. Finally, we stop passing offset_value in to
lower_sampler_logical_send_gen5 because we can't do indirect offsets until
gen7 anyway.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Not sure about the deprecation path, but this intrinsic
can be lowered to SMEM loads. This results in a significant
Talos performance improvement.
v2: Fix for LLVM attribute changes.
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
07fe2d565b introduced a big hack in order to return
NumSubroutineUniforms when querying ACTIVE_RESOURCES for
<shader>_SUBROUTINE_UNIFORM interfaces. However this is the
wrong fix we are meant to be returning the number of active
resources i.e. the count of subroutine uniforms in the
resource list which is what the code was previously doing,
anything else will cause trouble when trying to retrieve
the resource properties based on the ACTIVE_RESOURCES count.
The real problem is that NumSubroutineUniforms was counting
array elements as separate uniforms but the innermost array
is always considered a single uniform so we fix that count
instead which was counted incorrectly in 7fa0250f9.
Idealy we could probably completely remove
NumSubroutineUniforms and just compute its value when needed
from the resource list but this works for now.
Reviewed-by: Alejandro Piñeiro <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
Cc: 13.0 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The 1-D special case doesn't actually apply to depth or HiZ. I discovered
this while converting BLORP over to genxml and ISL. The reason is that the
1-D special case only applies to the new Sky Lake 1-D layout which is only
used for LINEAR 1-D images. For tiled 1-D images, such as depth buffers,
the old gen4 2-D layout is used and the QPitch should be in rows.
Reviewed-by: Nanley Chery <[email protected]>
Cc: "13.0" <[email protected]>
|
|
|
|
| |
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
|
|
|
| |
This was already piped through in the CmdDraw(Indexed)Indirect handling.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
The gen7/8_cmd_buffer logic already sets the clamp, and it's piped
through via the dynamic state.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This matches maxImageArrayLayers, as well as the same setting in the GL
frontend.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
These are all regularly available in desktop GL, so the backend fully
supports them.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
This appears to be fully supported already.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
Reported-by: Ilia Mirkin <[email protected]>
Cc: "13.0" <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reported-by: Ilia Mirkin <[email protected]>
Cc: "13.0" <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This commit does two things. One is to pull useful and/or interesting
information from the AUB file header and display it as a header above your
decoded batches. Second, it is now capable of pulling the PCI ID from the
AUB file comment left by intel_aubdump. This removes the need to use the
--gen flag all the time.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
This requires that a few more state bits become global.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
This makes it just store the pci_id instead of a struct pointer
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We were reading from the "comment size" dword and incrementing by that
amount. This never caused a problem because that field was always zero.
However, experimenting with actual aub file comments indicates, the
simulator seems to include the comment size in the packet size provided in
the header. We should do the same.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
| |
The helper automatically handles masking for us so we don't have to worry
about whether or not something is in the bottom bits.
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
| |
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
This new helper is automatically handles 32 vs. 48-bit GTT issues. It also
handles 48-bit canonical addresses on Broadwell and above.
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original aubinator that Kristian wrote had a bug in the handling of
MI_BATCH_BUFFER_START that propagated into the version in upstream mesa.
In particular, it ignored the "2nd level" bit which tells you whether this
MI_BATCH_BUFFER_START is a subroutine call (2nd level) or a goto. Since
the Vulkan driver uses batch chaining, this can lead to a very confusing
interpretation of the batches. In some cases, depending on how things are
laid out in the virtual GTT, you can even end up with infinite loops in
batch processing.
Reviewed-by: Kristian H. Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Among other things, blits would clear existing SO targets which would
cause a bunch of updates from u_blitter to be missed.
Fixes fbo-scissor-blit fbo, probably among many others.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Bruce Cherniak <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
I asked Emil to switch from 0 (success) vs. -1 (fail) to use a boolean
in my review comments. The "not" went missing. Easy mistake, but the
result is that nothing runs at all :)
Fix whitespace while we're here too.
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It fixes leakage of pthread_condattr resource on wsi_queue_init()
Cc: "13.0" <[email protected]>
Signed-off-by: Mun Gwan-gyeong <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Eduardo Lima Mitev <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This updates releasing of resource in reverse order of the anv_CreateDevice
to anv_DestroyDevice.
And it fixes resource leak in pthread_mutex, pthread_cond, anv_gem_context.
Cc: "13.0" <[email protected]>
Signed-off-by: Mun Gwan-gyeong <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
anv_queue_init() always returns VK_SUCCESS, so caller does not need
to check return value of anv_queue_init().
Signed-off-by: Mun Gwan-gyeong <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the memfd_create() and u_vector_init() fail on anv_block_pool_init(),
this patch makes to return VK_ERROR_INITIALIZATION_FAILED.
All of initialization success on anv_block_pool_init(), it makes to return
VK_SUCCESS.
CID 1394319
v2: Fixes from Emil's review:
a) Add the return type for propagating the return value to caller.
b) Changed anv_block_pool_init() to return VK_ERROR_INITIALIZATION_FAILED
on failure of initialization.
Cc: "13.0" <[email protected]>
Signed-off-by: Mun Gwan-gyeong <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
picture order count can be a negative value
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
| |
If radv_device_get_cache_uuid() fails result will be VK_SUCCESS as set
by the radv_init_wsi() call above.
Fixes: d943839 (radv: Use library mtime for cache UUID.)
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
radv_amdgpu_winsys_create() does not take ownership of the fd, thus we
end up leaking it as we return with VK_SUCCESS.
Cc: Dave Airlie <[email protected]>
Cc: "13.0" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
| |
brw_compiler_create() rzalloc-ates memory which we forgot to free.
Cc: "13.0" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Cc: "13.0" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
| |
Good bye, you shall not be missed.
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
| |
No longer used as of last commit.
Signed-off-by: Emil Velikov <[email protected]>
|