| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
It's not a problem if a BO has been allocated larger than we need it
to be.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102940
Fixes: 818b857914 ("anv: Use the BO cache for DeviceMemory allocations")
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Daniel Stone <[email protected]>
Cc: [email protected]
(cherry picked from commit c0a4f56fb9945e36a004bc38852fbb801aae3bd5)
|
|
|
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
Cc: [email protected]
(cherry picked from commit 91ba331ef4fee34085628af12a188924a28cd808)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This ensures that everything gets cleaned up properly. In particular,
it fixes a memory leak where we were leaking the push constants
structs.
Valgrind stats on
dEQP-VK.pipeline.push_constant.graphics_pipeline.range_size_128 :
Before:
HEAP SUMMARY:
in use at exit: 2,467,513 bytes in 1,305 blocks
total heap usage: 697,853 allocs, 696,530 frees, 138,466,600 bytes allocated
LEAK SUMMARY:
definitely lost: 1,068 bytes in 11 blocks
indirectly lost: 24,669 bytes in 412 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 2,441,776 bytes in 882 blocks
suppressed: 0 bytes in 0 blocks
After:
HEAP SUMMARY:
in use at exit: 2,467,381 bytes in 1,304 blocks
total heap usage: 697,853 allocs, 696,531 frees, 138,466,600 bytes allocated
LEAK SUMMARY:
definitely lost: 936 bytes in 10 blocks
indirectly lost: 24,669 bytes in 412 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 2,441,776 bytes in 882 blocks
suppressed: 0 bytes in 0 blocks
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: "17.2 17.1" <[email protected]>
(cherry picked from commit 0763f814d7b5cb4da945d9211faab47e8523fdad)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When writing to set > 0, we were just wrongly writing to set 0. This
commit fixes this by lazily allocating each set as we write to them.
We didn't go for having them directly into the command buffer as this
would require an additional ~45Kb per command buffer.
v2: Allocate push descriptors from system memory rather than in BO
streams. (Lionel)
Cc: "17.2 17.1" <[email protected]>
Fixes: 9f60ed98e501 ("anv: add VK_KHR_push_descriptor support")
Reported-by: Daniel Ribeiro Maciel <[email protected]>
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
(cherry picked from commit d296dea54e243c41c2b3fbe631f7bcb87db6ae8c)
|
|
|
|
|
|
|
|
| |
No shader-db change on Sky Lake.
Reviewed-by: Matt Turner <[email protected]>
Cc: [email protected]
(cherry picked from commit 7463d5058009d1e9d9d01292f894271a26a062b8)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shader-db results on Sky Lake:
total instructions in shared programs: 12954445 -> 12955125 (0.01%)
instructions in affected programs: 141862 -> 142542 (0.48%)
helped: 0
HURT: 626
Reviewed-by: Matt Turner <[email protected]>
Cc: [email protected]
(cherry picked from commit b91ecee04ab2a5b23ff2418b4d709252878cf894)
|
|
|
|
|
|
|
|
|
|
| |
We set a similar default value for LOD in the fs backend for TXS/TXL.
Without this we end up generating invalid MOV with a null src.
Signed-off-by: Lionel Landwerlin <[email protected]>
Cc: "17.2 17.1" <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
(cherry picked from commit d3acc240d0ced4efefd18e8c26510bd6a638c9df)
|
|
|
|
|
|
|
|
|
| |
The vkCmdFillBuffer() command fills a buffer with an uint32_t value.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: "17.1 17.2" <[email protected]>
(cherry picked from commit 15fdbf9c39105aaaae05c59a1a315b1775ac5d79)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Atomic operation sources are scalar values, but we were failing to
select the .x component of the second operand. For example,
atomicCounterCompSwapARB(counter, 5u, 10u)
would generate
mov(8) vgrf4.x:D, 5D
mov(8) vgrf5.x:D, 10D
mov(8) vgrf9.x:UD, vgrf4.xyzw:D
mov(8) vgrf9.y:UD, vgrf5.xyzw:D
which wrongly selects the .y component of vgrf5, so the actual 10u value
would get dead code eliminated. The swizzle works for the other source,
but both of them ought to be .xxxx.
Fixes the compare and swap CTS tests in:
KHR-GL45.shader_atomic_counter_ops_tests.ShaderAtomicCounterOpsExchangeTestCase
Cc: "17.2 17.1 17.0 13.0" <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
(cherry picked from commit 66342c997fb7892034e537d1456c37e6981c2fdb)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Embarassingly, someone enabled the ARB_shader_atomic_counter_ops
extension for Gen7+ but never added the intrinsics to the switch
statement in the vec4 backend, so they just hit an unreachable()
call and died.
Fixes: 40dd45d0c6aa4a9d (i965: Enable ARB_shader_atomic_counter_ops)
Cc: "17.2 17.1 17.0 13.0" <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
(cherry picked from commit a62fe340981b56fe54e49d3a6791e568f7f87554)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In Vulkan, for 'z' (depth) component, the scale and translate values
for the viewport transformation are:
pz = maxDepth - minDepth
oz = minDepth
zf = pz × zd + oz
Being zd, the third component in vertex's normalized device coordinates.
Fixes: dEQP-VK.draw.inverted_depth_ranges.*
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: [email protected]
(cherry picked from commit d2cd9deeb8db5941e99b149fa637f711198ca741)
|
|
|
|
|
|
|
|
| |
Trivial.
Cc: [email protected]
Reviewed-by: Jason Ekstrand <[email protected]>
(cherry picked from commit 65a09f98ad1411982ce68de7a2d05942d608e674)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes some crashes in the dEQP-VK.memory.requirements.core.* tests.
I'm not sure whether or not passing out-of-bound formats into the query
is supposed to be allowed but there's no harm in protecting ourselves
from it.
Reviewed-by: Lionel Landwerlin <[email protected]>
Bugzilla: https://bugs.freedesktop.org/101956
Cc: [email protected]
(cherry picked from commit 242211933a06826961709c2689a1d30f735ab7b9)
Squashed with:
anv: fix off by one in array check
`anv_formats[ARRAY_SIZE(anv_formats)]` is already one too far.
Spotted by Coverity.
CovID: 1417259
Fixes: 242211933a0682696170 "anv/formats: Nicely handle unknown VkFormat enums"
Cc: Jason Ekstrand <[email protected]>
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
(cherry picked from commit 0c7272a66c633b0b11c0b81c0f3552201d083b3a)
|
|
|
|
|
|
|
|
|
|
|
| |
v2 (Jason): Adjust directly in surf_fake_rgb_with_red()
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101910
CC: [email protected]
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
(cherry picked from commit 393ec1a5071263d300e91f43058ed3b594d41418)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BLEND_STATE packing was modified to be variable-length in:
9670124e31 genxml: Make BLEND_STATE command support variable length array.
The initial gen10.xml still had the old, fixed-length style
definition for BLEND_STATE. So gen10_upload_blend_state would
overwrite the packed BLEND_STATE_ENTRYs with its own fixed array
of all-zero entries when packing BLEND_STATE. This caused
BLEND_STATE upload to not work at all.
Fixes: aa416f515a ("i965/genxml: Add gen10.xml")
Reviewed-by: Rafael Antognolli <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
(cherry picked from commit d6539608a440f0102f88b375d20143f5f2d0c798)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also, silence an obnoxious finishme that started occurring for all
GL applications which use stencil after the i965 ISL conversion.
v2: Check against 3DSTATE_STENCIL_BUFFER's pitch bits when using
separate stencil, and 3DSTATE_DEPTH_BUFFER's bits when using
combined depth-stencil.
Cc: "17.2" <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
(cherry picked from commit 5563872dbfbf733ed56e1b367bc8944ca59b1c3e)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were calculating the total height of 2D surfaces by multiplying the
row pitch by the number of slices. This means that we actually request
slightly more space than actually needed since the padding on the last
slice is unnecessary. For tiled surfaces this is not likely to make a
difference. For linear surfaces, on the other hand, this means we may
require additional memory. In particular, this makes the i965 driver
reject EGL imports of buffers which do not have this extra padding.
Reviewed-by: Jordan Justen <[email protected]>
Cc: "17.2" <[email protected]>
(cherry picked from commit 4d27c6095e8385cccd225993452baad4d2e35420)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The docs contain a bunch of commentary about the need to pad various
surfaces out to multiples of something or other. However, all of those
requirements are about avoiding GTT errors due to missing pages when the
data port or sampler accesses slightly out-of-bounds. However, because
the kernel already fills all the empty space in our GTT with the scratch
page, we never have to worry about faulting due to OOB reads. There are
two caveats to this:
1) There is some potential for issues with caches here if extra data
ends up in a cache we don't expect due to OOB reads. However,
because we always trash the entire cache whenever we need to move
anything between cache domains, this shouldn't be an issue.
2) There is a potential issue if a surface gets placed at the very top
of the GTT by the kernel. In this case, the hardware could
potentially end up trying to read past the top of the GTT. If it
nicely wraps around at the 48-bit (or 32-bit) boundary, then this
shouldn't be an issue thanks to the scratch page. If it doesn't,
then we need to come up with something to handle it.
Up until some of the GL move to ISL, having the padding code in there
just caused us to harmlessly use a bit more memory in Vulkan. However,
now that we're using ISL sizes to validate external dma-buf images,
these padding requirements are causing us to reject otherwise valid
images due to the size of the BO being too small.
Acked-by: Kenneth Graunke <[email protected]>
Tested-by: Tapani Pälli <[email protected]>
Tested-by: Tomasz Figa <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Cc: "17.2" <[email protected]>
(cherry picked from commit c15b92ce1160d742ea431062bbe4b3e818bb2aaf)
|
|
|
|
|
|
|
|
|
|
| |
We can't sample from depth-stencil formats but on gen7 but we can sample
from depth-only formats.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102024
Reviewed-by: Juan A. Suarez Romero <[email protected]>
Cc: [email protected]
(cherry picked from commit 06d3115bb97740a4c8f36c645944a8bd0bde3f68)
|
|
|
|
|
| |
We don't want to advertise experimental extensions in actual releases.
However, there's no harm in leaving the code lying around in the tree.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If dual object compile fails (as seems to happen with virgl a
fair bit, and does piglit even have any tests for it?), we end up
not restarting the pull params, so we call
vec4_visitor::move_uniform_array_access_to_pull_constant
a second time and it runs over the ends of the alloc.
Fixes: tests/spec/glsl-1.50/execution/geometry/max-input-components.shader_test
running inside virgl on ivybridge.
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
(cherry picked from commit 271fa3a684ef0eefe99087c13d1abb099784163f)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The EU limit of 128 GRFs should allow 32 vertex elements of 4 GRFs.
However, the maximum allowed value of "Vertex URB Entry Read Length"
in SIMD8 is 15. And 15 * 8 = 120 gives us a limit of 30 vertex elements.
Because we also need to reserve a vertex buffer to upload
VertexIndex/InstanceIndex and another to upload DrawID when needed,
we can only expose 28.
Cc: "17.2" <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
(cherry picked from commit 31f1863ace73d31a579e5c36252a957818ad09cf)
|
|
|
|
|
|
| |
Cc: "17.2" <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
(cherry picked from commit a848e693efc8e2a1d355dc1076409968b374153f)
|
|
|
|
|
|
| |
Fixes: c9cb37b2a6c ("intel/blorp: Add a partial resolve pass for MCS")
Signed-off-by: Emil Velikov <[email protected]>
(cherry picked from commit 5d47dd9c2a7b3a64429aaf7863b5cdad68b683ec)
|
|
|
|
|
|
| |
This allows us to avoid some extra zeroing.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
We already have a helper for doing this in BLORP, this just moves the
logic into ISL where we can share it with other components.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
| |
This makes it much easier to edit the template and doesn't really dirty
the python all that much.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Image layouts only let us know that an image *may* be fast-cleared. For
this reason we can end up with redundant resolves. Testing has shown
that such resolves can measurably hurt performance and that predicating
them can avoid the penalty.
v2:
- Introduce additional resolve state management function (Jason Ekstrand).
- Enable easy retrieval of fast clear state fields.
v3: Use more descriptive field enums (Jason)
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
With an earlier patch from this series, resolves are additionally
performed on layout transitions. Remove the now unnecessary implicit
resolves within render passes.
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
v2: Expound on comment for the pipe controls (Jason Ekstrand).
v3:
- Cast base_layer to uint64_t to avoid overflow.
- Remove "seems" from the pipe control comment.
- Fix clamp of layer_count (Jason Ekstrand).
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Use the performance warning infrastructure to provide helpful
information when testing applications.
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
For readability, bring the assignment of CCS closer to the assignment of
NONE and MCS.
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The lifespan of the fast-clear data will surpass the render pass scope.
We need CCS_D to be enabled in order to invalidate blocks previously
marked as cleared and to sample cleared data correctly.
v2: Avoid refactoring.
v3: Allow CCS_D for subpass resolves.
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The next patch enables the use of CCS_D even when the color attachment
will not be fast-cleared. Catch the gen7 case early to simplify the
changes required.
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
v2: Rewrite functions, change location of synchronization.
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We'll be performing a GPU memcpy in more places to copy small amounts of
data. Add an alternate function that thrashes less state.
v2:
- Make a new function (Jason Ekstrand).
- Move the #define into the function.
v3:
- Update the function name (Jason).
- Update comments.
v4: Use an indirect drawing register as TEMP_REG (Jason Ekstrand).
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2: Remove ::first_subpass_layout assertion (Jason Ekstrand).
v3: Allow some fast clears in the GENERAL layout.
v4: Remove extra '||' and adjust line break (Jason Ekstrand).
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
v2: Don't pass in the command buffer (Jason Ekstrand).
v3: Remove an incorrect assertion and an if condition for gen7.
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2: Rewrite functions.
v3 (Jason Ekstrand):
- Don't set ResourceMinLOD.
- Fix clamp of level_count.
Signed-off-by: Nanley Chery <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|