| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At 232ed8980217dd65ab0925df28156f565b94b2e5 "i965/fs: Register allocator
shoudn't use grf127 for sends dest" we didn't take into account the case
of SEND instructions that are not send_from_grf. But since Gen7+ although
the backend still uses MRFs internally for sends they are finally
assigned to a GRFs.
In the case of unspills the backend assigns directly as source its
destination because it is suppose to be available. So we always have a
source-destination overlap. If the reg_allocator assigns registers that
include the grf127 we fail the validation rule that affects Gen8+
"r127 must not be used for return address when there is a src and dest
overlap in send instruction."
So this patch activates the grf127_send_hack_node for Gen8+ and if we
have any register spilled we add interferences to the destination of
the unspill operations.
We also need to avoid that opt_bank_conflicts() optimization, that runs
after the register allocation, doesn't move things around, causing the
grf127 to be used in the condition we were avoiding.
Fixes piglit test tests/spec/arb_compute_shader/linker/bug-93840.shader_test
and some shader-db crashed because of the grf127 validation rule..
v2: make sure that opt_bank_conflicts() optimization doesn't change
the use of grf127. (Caio)
Found by Caio Marcelo de Oliveira Filho
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107193
Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127 for sends dest"
Cc: 18.1 <[email protected]>
Cc: Caio Marcelo de Oliveira Filho <[email protected]>
Cc: Jason Ekstrand <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
(cherry picked from commit 62f37ee53d9d5388eecef94369893b5467349306)
|
|
|
|
|
|
|
|
|
| |
Drawable and readable need to either both be None or both be non-None.
Cc: <[email protected]>
Signed-off-by: Adam Jackson <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
(cherry picked from commit d257ec01360be05a745bbb851f08e944bcb23718)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement at brw_eu_validate the restriction from Intel Broadwell PRM,
vol 07, section "Instruction Set Reference", subsection "EUISA
Instructions", Send Message (page 990):
"r127 must not be used for return address when there is a src and
dest overlap in send instruction."
v2: Style fixes (Matt Turner)
Reviewed-by: Matt Turner <[email protected]>
Cc: 18.1 <[email protected]>
(cherry picked from commit 0e47ecb29acf8bdd213236d7306c47f8ec0e937f)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since Gen8+ Intel PRM states that "r127 must not be used for return
address when there is a src and dest overlap in send instruction."
This patch implements this restriction creating new grf127_send_hack_node
at the register allocator. This node has a fixed assignation to grf127.
For vgrf that are used as destination of send messages we create node
interfereces with the grf127_send_hack_node. So the register allocator
will never assign to these vgrf a register that involves grf127.
If dispatch_width > 8 we don't create these interferences to the because
all instructions have node interferences between sources and destination.
That is enough to avoid the r127 restriction.
This fixes CTS tests that raised this issue as they were executed as SIMD8:
dEQP-VK.spirv_assembly.instruction.graphics.8bit_storage.8struct_to_32struct.storage_buffer_*int_geom
Shader-db results on Skylake:
total instructions in shared programs: 7686798 -> 7686797 (<.01%)
instructions in affected programs: 301 -> 300 (-0.33%)
helped: 1
HURT: 0
total cycles in shared programs: 337092322 -> 337091919 (<.01%)
cycles in affected programs: 22420415 -> 22420012 (<.01%)
helped: 712
HURT: 588
Shader-db results on Broadwell:
total instructions in shared programs: 7658574 -> 7658625 (<.01%)
instructions in affected programs: 19610 -> 19661 (0.26%)
helped: 3
HURT: 4
total cycles in shared programs: 340694553 -> 340676378 (<.01%)
cycles in affected programs: 24724915 -> 24706740 (-0.07%)
helped: 998
HURT: 916
total spills in shared programs: 4300 -> 4311 (0.26%)
spills in affected programs: 333 -> 344 (3.30%)
helped: 1
HURT: 3
total fills in shared programs: 5370 -> 5378 (0.15%)
fills in affected programs: 274 -> 282 (2.92%)
helped: 1
HURT: 3
v2: Avoid duplicating register classes without grf127. Let's use a node
with a fixed assignation to grf127 and create interferences to send
message vgrf destinations. (Eric Anholt)
v3: Update reference to CTS VK_KHR_8bit_storage failing tests.
(Jose Maria Casanova)
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: 18.1 <[email protected]>
(cherry picked from commit 232ed8980217dd65ab0925df28156f565b94b2e5)
|
|
|
|
|
|
|
|
|
|
|
| |
"sampler2DRect" and "sampler2DRectShadow" are specified as
reserved from GLSL 1.1 and GLSL ES 1.0
Signed-off-by: zhaowei yuan <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106906
Reviewed-by: Eric Anholt <[email protected]>
Fixes: 34f7e761bc61 ("glsl/parser: Track built-in types using the glsl_type directly")
(cherry picked from commit 73ec437627448466b2d3da3adc74310ccd4f41e7)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we don't have PLN (gen4 and gen11+), we implement LINTERP as either
LINE+MAC or a pair of MADs. In both cases, the accumulator is written
by the first of the two instructions and read by the second. Even
though the accumulator value isn't actually ever used from a logical
instruction perspective, it is trashed so we need to make the scheduler
aware. Otherwise, the scheduler could end up re-ordering instructions
and putting a LINTERP between another an instruction which writes the
accumulator and another which tries to use that result.
Cc: [email protected]
Reviewed-by: Matt Turner <[email protected]>
(cherry picked from commit 566e6abd6d70266aea2f43ad9fefaf7718d76c57)
Rebased version provided by Jason
|
|
|
|
|
|
|
| |
Fixes: 7987d041fda0c9 ("i965/surface_state: Emit the clear color address instead of value.")
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
(cherry picked from commit 420bf14e12f7e55637cfc79da17990bb4f150288)
|
|
|
|
|
|
|
|
|
|
|
| |
Ported from i965 including the comment.
This fixes:
dEQP-EGL.functional.reusable_sync.valid.wait_server
Cc: 18.1 <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
(cherry picked from commit 0eaf069679ccf86de6739d5eaa439db075f02903)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes new piglit tests:
- glsl-1.10/execution/fs-sign-neg-abs.shader_test
- glsl-1.10/execution/fs-sign-sat-neg-abs.shader_test
- glsl-1.10/execution/vs-sign-neg-abs.shader_test
- glsl-1.10/execution/vs-sign-sat-neg-abs.shader_test
Signed-off-by: Ian Romanick <[email protected]>
Cc: [email protected]
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
(cherry picked from commit 88bd37c01060169b451ca2c3900830342d34a9a2)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is achived by copying the sign(abs(x)) optimization from the FS
backend.
On Gen7 an earlier platforms, this fixes new piglit tests:
- glsl-1.10/execution/vs-sign-neg-abs.shader_test
- glsl-1.10/execution/vs-sign-sat-neg-abs.shader_test
Signed-off-by: Ian Romanick <[email protected]>
Cc: [email protected]
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
(cherry picked from commit 9626ea497de8af5580ee3af76df79ad8083c5922)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At the time of commit 7bc6e455e23 (i965: Add support for saturating
immediates.) we thought mixed type saturates would be impossible. We
were only thinking about type converting moves from D to F, for
example. However, type converting moves w/saturate from F to DF are
definitely possible. This change minimally relaxes the restriction to
allow cases that I have been able trigger via piglit tests.
Fixes new piglit tests:
- arb_gpu_shader_fp64/execution/built-in-functions/fs-sign-sat-neg-abs.shader_test
- arb_gpu_shader_fp64/execution/built-in-functions/vs-sign-sat-neg-abs.shader_test
Signed-off-by: Ian Romanick <[email protected]>
Cc: [email protected]
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
(cherry picked from commit f8e54d02f79057f679302c06847066edc3ae7aa7)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fold_assoc() called from fold_alu_op3() can lower the number of src to 2,
which then leads to an invalid access to n.src[2]->gvalue().
This didn't seem to have caused much harm in the past, but on Fedora 28
it will crash (presumably because -D_GLIBCXX_ASSERTIONS is used, although
with libstdc++ 4.8.5 this didn't do anything, -D_GLIBCXX_DEBUG was
needed to show the issue).
An alternative fix would be to instead call fold_alu_op2() from within
fold_assoc() when the number of src is reduced and return always TRUE
from fold_assoc() in this case, with the only actual difference being
the return value from fold_alu_op3() then. I'm not sure what the return
value actually should be in this case (or whether it even can make a
difference).
https://bugs.freedesktop.org/show_bug.cgi?id=106928
Cc: [email protected]
Reviewed-by: Dave Airlie <[email protected]>
(cherry picked from commit 817efd89685efc6b5866e09cbdad01c4ff21c737)
|
|
|
|
|
|
|
|
|
| |
For merged shaders, VS as HS for example.
Cc: <[email protected]>
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
(cherry picked from commit 85865dbe0d96f18ac768b4063da94f52ee67a7fd)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In 6f5abf31466aed this code was fixed to calculate the maximum size of
an attribute in a seperate pass and then allocate the registers to
that size. However this wasn’t taking into account ranges that overlap
but don’t have the same starting location. For example:
layout(location = 0, component = 0) out float a[4];
layout(location = 2, component = 1) out float b[4];
Previously, if ‘a’ was processed first then it would allocate a
register of size 4 for location 0 and it wouldn’t allocate another
register for location 2 because it would already be covered by the
range of 0. Then if something tries to write to b[2] it would try to
write past the end of the register allocated for ‘a’ and it would hit
an assert.
This patch changes it to scan for any overlapping ranges that start
within each range to calculate the maximum extent and allocate that
instead.
Fixed Piglit’s arb_enhanced_layouts/execution/component-layout/
vs-fs-array-interleave-range.shader_test
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Fixes: 6f5abf31466 "i965: Fix output register sizes when multiple variables
share a slot."
(cherry picked from commit 2d5ddbe960f7c62a8f00d5e800925865f115970f)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current code causes:
/usr/include/c++/8/debug/safe_iterator.h:207:
Error: attempt to copy from a singular iterator.
This is due to the iterators getting invalidated, fix the
reverse iterator to use the return value from erase, and
cast it properly.
(used Mathias suggestion)
Cc: <[email protected]>
Reviewed-by: Mathias Fröhlich <[email protected]>
(cherry picked from commit 8c51caab2404c5c9f5211936d27e9fe1c0af2e7d)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to add loop terminators to the list in the order we come
across them otherwise if two or more have the same exit condition
we will select that last one rather than the first one even though
its unreachable.
This fix is for simple unrolls where we only have a single exit
point. When unrolling these type of loops the unreachable
terminators and their unreachable branch are removed prior to
unrolling. Because of the logic change we also switch some
list access in the complex unrolling logic to avoid breakage.
Fixes: 6772a17acc8e ("nir: Add a loop analysis pass")
Reviewed-by: Jason Ekstrand <[email protected]>
(cherry picked from commit 463f849097193ad20e7622ddd740fd15b96f4277)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
compatible
Otherwise we can incorrectly cmod propagate in situations like
add(8) g10<1>.xD g2<0>.xD -16D
...
cmp.ge.f0(8) null<1>D g2<0>.xD 16D
...
(+f0) sel(8) g21<1>.xyUD g14<4>.xyyyUD g18<4>.xyyyUD
Sadly, this change hurts quite a few shaders.
v2: Refactor writemask compatibility check into a separate function.
Suggested by Caio.
Ivy Bridge and Haswell had similar results. (Haswell shown)
total instructions in shared programs: 12968489 -> 12968738 (<.01%)
instructions in affected programs: 60679 -> 60928 (0.41%)
helped: 0
HURT: 249
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.22% max: 0.81% x̄: 0.46% x̃: 0.44%
95% mean confidence interval for instructions value: 1.00 1.00
95% mean confidence interval for instructions %-change: 0.44% 0.48%
Instructions are HURT.
total cycles in shared programs: 409171965 -> 409172317 (<.01%)
cycles in affected programs: 260056 -> 260408 (0.14%)
helped: 0
HURT: 176
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 0.04% max: 0.34% x̄: 0.17% x̃: 0.17%
95% mean confidence interval for cycles value: 2.00 2.00
95% mean confidence interval for cycles %-change: 0.16% 0.18%
Cycles are HURT.
Sandy Bridge
total instructions in shared programs: 10423577 -> 10423753 (<.01%)
instructions in affected programs: 40667 -> 40843 (0.43%)
helped: 0
HURT: 176
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.29% max: 0.79% x̄: 0.48% x̃: 0.42%
95% mean confidence interval for instructions value: 1.00 1.00
95% mean confidence interval for instructions %-change: 0.46% 0.51%
Instructions are HURT.
total cycles in shared programs: 146097503 -> 146097855 (<.01%)
cycles in affected programs: 503990 -> 504342 (0.07%)
helped: 0
HURT: 176
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 0.02% max: 0.36% x̄: 0.12% x̃: 0.11%
95% mean confidence interval for cycles value: 2.00 2.00
95% mean confidence interval for cycles %-change: 0.11% 0.13%
Cycles are HURT.
No changes on any other platforms.
Signed-off-by: Ian Romanick <[email protected]>
Fixes: cd635d149b2 i965/vec4: Propagate conditional modifiers from compares to adds
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
(cherry picked from commit 995d9937103771d9318124b91adfd20d7c6d5fed)
|
|
|
|
|
|
|
|
|
| |
Shaders that need special code for external samplers were broken if
they were loaded from the cache.
Cc: 18.1 <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
(cherry picked from commit 99c6cae2278011309b7ca3d4735c7b341cbb4eef)
|
|
|
|
|
|
|
|
|
|
| |
If we have to re-emit push constant data, we need to re-emit all
of it.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
CC: <[email protected]>
(cherry picked from commit 198a72220b63e812e8b853cb5caa088d93720e7d)
|
|
|
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
CC: <[email protected]>
(cherry picked from commit 6a1d8350c91eed4ab10569683902a0fea4c048c5)
|
|
|
|
|
|
|
|
|
|
|
| |
Every time we emit a new state base address we will need to re-emit our
binding tables, since they might have been emitted with a different base
state adress.
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
CC: <[email protected]>
(cherry picked from commit 1b54824687df5170e1dd5ab701b2b76da299b851)
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we just hashed the entire descriptor set layout verbatim.
This meant that a bunch of extra stuff such as pointers and reference
counts made its way into the cache. It also meant that we weren't
properly hashing in the Y'CbCr conversion information information from
bound immutable samplers.
Cc: [email protected]
Reviewed-by: Timothy Arceri <[email protected]>
(cherry picked from commit d1c778b362d3ccf203f33095bee2af45dc8cde9a)
|
|
|
|
|
|
|
|
|
| |
Cc: 18.1 <[email protected]>
(cherry picked from commit 41f80373b46604f585497086f971a43aeea7f0c1)
Conflicts fixed by Dylan
Conflicts:
src/gallium/drivers/radeonsi/si_blit.c
|
|
|
|
|
|
|
|
|
|
|
|
| |
The spec allows adding scalars with a vector or matrix. In this case
the opt was losing swizzle and size information.
This fixes a bug with Doom (2016) shaders.
Fixes: 34ec1a24d61f ("glsl: Optimize (x + y cmp 0) into (x cmp -y).")
Reviewed-by: Ian Romanick <[email protected]>
(cherry picked from commit 2a5121bf355001e2c69ba05e8d9be4ed633c7bf4)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, TargetNVC0::insnCanLoadOffset() returned whether the offset
could be set to a specific value. The IndirectPropagation pass expected
it to return whether the offset could be increased by a specific value,
which is what TargetNV50::insnCanLoadOffset() does.
Fixes: 37b67db6ae34fb6586d640a7a1b6232f091dd812
("nvc0/ir: be careful about propagating very large offsets into const load")
Signed-off-by: Rhys Perry <[email protected]>
Reviewed-by: Karol Herbst <[email protected]>
Signed-off-by: Karol Herbst <[email protected]>
(cherry picked from commit 6bb0f87c6003e1d80aa79f6a591620aecc7b031d)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 0d905597f fixed an issue with the placement of the zip and unzip
instructions. However, as a side-effect, it reversed the order in which
we were emitting the split instructions so that they went from high
group to low instead of low to high. This is fine for most things like
texture instructions and the like but certain render target writes
really want to be emitted low to high. This commit just switches the
order back around to be low to high.
Reviewed-by: Matt Turner <[email protected]>
Fixes: 0d905597f "intel/fs: Be more explicit about our placement of [un]zip"
(cherry picked from commit d5b617a28e89fda62fb6cceec10686b0bb4b4fb2)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a parallel make build issue in src/egl/drivers/dri2/
for wayland builds. Can be reproduced with:
$ rm src/egl/drivers/dri2/*.h src/egl/drivers/dri2/platform_wayland.lo
$ make -C src/egl/ drivers/dri2/platform_wayland.lo
../../../mesa-18.1.2/src/egl/drivers/dri2/platform_wayland.c:50:10: fatal error: linux-dmabuf-unstable-v1-client-protocol.h: No such file or directory
This patch adds the missing dependency.
Fixes: 02cc359372773800de817 "egl/wayland: Use linux-dmabuf interface for buffers"
Reviewed-by: Eric Engestrom <[email protected]>
[Eric: fixed up the commit title]
Signed-off-by: Eric Engestrom <[email protected]>
(cherry picked from commit d7c4ce1d1d800a4721122a20b5a289951e7f4fbc)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Vulkan spec says:
"pipelineBindPoint is a VkPipelineBindPoint indicating whether
the descriptors will be used by graphics pipelines or compute
pipelines. There is a separate set of bind points for each of
graphics and compute, so binding one does not disturb the other."
CC: <[email protected]>
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
(cherry picked from commit 7a57c827675f3bceecd3b59968e9e5b37dcafcef)
|
|
|
|
|
|
|
|
|
|
| |
Otherwise, if what gets passed into the function call is a deref chain
longer than just a variable deref, we would use the type of the entire
variable rather than the type of the thing being dereferenced.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106980
Reviewed-by: Ian Romanick <[email protected]>
(Unique to 18.1)
|
|
|
|
|
|
|
|
|
| |
Even though they don't have regular sources, they do have derefs and
those may have implied sources that should be handled.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106980
Reviewed-by: Ian Romanick <[email protected]>
(Unique to 18.1)
|
|
|
|
|
|
|
|
|
|
|
| |
While XFB has been enabled for cache, we did not serialize enough
data for the whole API to work (such as glGetProgramiv).
Fixes: 6d830940f7 "Allow shader cache usage with transform feedback"
Signed-off-by: Tapani Pälli <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106907
Reviewed-by: Jordan Justen <[email protected]>
(cherry picked from commit ab2643e4b06f63c93a57624003679903442634a8)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can not use the VUE Dereference flags combination for EOT
message under ILK and SNB because the threads are not initialized
there with initial VUE handle unlike Pre-IL.
So to avoid GPU hangs on SNB and ILK we need
to avoid usage of the VUE Dereference flags combination.
(Was tested only on SNB but according to the specification
SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6
the ILK must behave itself in the similar way)
v2: Approach to fix this issue was changed.
Instead of different EOT flags in the program end
we will create VUE every time even if GS produces no output.
v3: Clean up the patch.
Signed-off-by: Andrii Simiklit <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105399
CC: <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Tested-by: Mark Janes <[email protected]>
(cherry picked from commit 232c5d75ea8c9536a896a17c9156b8e2ad36a779)
|
|
|
|
|
|
|
|
|
|
| |
From the Vulkan spec:
"If this is a primary command buffer, then this value is ignored."
CC: <[email protected]>
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
(cherry picked from commit ba5e25ed293bc060be2ba7ed840a22458454b3cf)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the driver ends up by performing a slow depthstencil clear,
the HTILE metadata won't be initialized correctly.
This fixes random VM faults on Polaris while running CTS
with Bas's runner. This doesn't seem to regress performance.
CC: <[email protected]>
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
(cherry picked from commit 07cb1373a23042de6904e918419bfa3963695795)
|
|
|
|
|
|
| |
Cc: 18.1 <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
(cherry picked from commit a2790b134a912b84389022b96ef0ef78a7d2b83c)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the recent rework of converting the shell script to a python one
the check for actual tests was dropped.
Bring that back, since it was explicitly added considering we had a ~2
year period, during which the tests were not run.
v2: use raise Exception() over print() & return false (Dylan)
Fixes: db8cd8e36771 ("glcpp/tests: Convert shell scripts to a python
script")
Cc: Dylan Baker <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
(cherry picked from commit d589eddc8be5240632d42ae1931b0b6a82ff524c)
|
|
|
|
|
|
| |
Fixes: c366f422f0a nir: Offset vertex_id by first_vertex instead of base_vertex
Signed-off-by: Rob Clark <[email protected]>
(cherry picked from commit e1e40935b4adb60e47e90e6d83589c369a26b6e2)
|
|
|
|
|
|
|
|
| |
This improves performance for certain games.
Cc: 18.1 <[email protected]>
Tested-by: Dieter Nützel <[email protected]>
(cherry picked from commit 9322974ec716b8c3b2e326559f663ff087daa38c)
|
|
|
|
|
|
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
(cherry picked from commit b81149e258a492ed0c81058fb535f6bfdacb36da)
Conflicts:
src/amd/common/ac_gpu_info.c
Conflicts resolved by Dylan
|
|
|
|
|
|
|
|
| |
We HTILE compress stencil-only surfaces too.
CC: 18.1 <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
(cherry picked from commit 1a8501a9ddfff6c3eab47046e0e8a9dc17492bf0)
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some fprintfs were probably left unintentionally a few years ago and are
a bit of a nuisance.
Fixes: 2d3301e4d513 ("virgl: fix reference counting of prime handles")
Cc: Rob Herring <[email protected]>
Signed-off-by: Tomeu Vizoso <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
(cherry picked from commit 9b1cb50ba47346dd8fcb8f2f5d69125d33a4a66e)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shouldn't make any functional difference, just that `liblibanv_gen90.a`
will now be called `libanv_gen90.a`.
Fixes: 3218056e0eb375eeda470 "meson: Build i965 and dri stack"
Fixes: d1992255bb29054fa5176 "meson: Add build Intel "anv" vulkan driver"
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
Reviewed-by: Dylan Baker <[email protected]>
(cherry picked from commit e8eb84826e30ffcb54f69a7aec9181f969418eb2)
Trivial merge conflicts resolved by Dylan.
|
|
|
|
|
|
|
| |
Fixes: 922cd38172b8a2bc286bd "radv: implement out-of-order rasterization when it's safe on VI+"
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
(cherry picked from commit 4d08c1e7d15f7d2c0a406cf1c79314511778b38f)
|
|
|
|
|
|
|
|
|
| |
It's a bit late to round up after an integer division.
Fixes: de889794134e6245e08a2 "radv: Implement VK_AMD_shader_info"
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Alex Smith <[email protected]>
(cherry picked from commit d85fef1e34657fc082b9a763de9499d324fbeebf)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The primitive ID is NULL and this generates an invalid
select instruction which crashes because one operand is NULL.
This fixes crashes in The Long Journey Home, Quantum Break
and Just Cause 3 with DXVK.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106756
CC: <[email protected]>
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
(cherry picked from commit 5917761e3dddc968d5ccac9b282b7cb8d3da866f)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since SSBOs can be written by a different GPU thread, copy propagating a
read can cause the value to magically change. SSBO reads are also very
expensive, so doing it twice will be slower.
The same shader was helped by this patch and the previous.
Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14399119 -> 14399113 (<.01%)
instructions in affected programs: 683 -> 677 (-0.88%)
helped: 1
HURT: 0
total cycles in shared programs: 532973113 -> 532971865 (<.01%)
cycles in affected programs: 524666 -> 523418 (-0.24%)
helped: 1
HURT: 0
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Cc: [email protected]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106774
(cherry picked from commit 37bd9ccd21b860d2b5ffea7e1f472ec83b68b43b)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since SSBOs can be written by other GPU threads, copy propagating a read
can cause the value to magically change. SSBO reads are also very
expensive, so doing it twice will be slower.
Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14399120 -> 14399119 (<.01%)
instructions in affected programs: 684 -> 683 (-0.15%)
helped: 1
HURT: 0
total cycles in shared programs: 532978931 -> 532973113 (<.01%)
cycles in affected programs: 530484 -> 524666 (-1.10%)
helped: 1
HURT: 0
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Cc: [email protected]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106774
(cherry picked from commit 461a5c899c08064467abb635536381a5a5659280)
|
|
|
|
|
|
|
|
|
|
|
| |
BITSET_FFS(x) macro makes use of ARRAY_SIZE(x) macro which is
defined in util/macro.h. Include it directy to make usage more
straightforward.
Fixes: 692bd4a1ab9 ("util: replace Elements() with ARRAY_SIZE()")
Signed-off-by: Christian Gmeiner <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
(cherry picked from commit efae1279936112cefe9fa1753998993df81d6201)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This seems to have been missed in the move from autotools
This fixes the following build issue:
../src/gallium/auxiliary/vl/vl_winsys_dri.c:34:10: fatal error: X11/Xlib-xcb.h: No such file or directory
#include <X11/Xlib-xcb.h>
^~~~~~~~~~~~~~~~
Fixes: b1b65397d0c4978e36a84c0a1c98a4bd6cb9588e
("meson: Build gallium auxiliary")
Reviewed-by: Dylan Baker <[email protected]>
(cherry picked from commit 1d92d6486a7685762f480fb33893b3c3db1fd21a)
|
|
|
|
|
|
|
|
|
|
|
|
| |
GLSL 4.60 offically added this but games and older CTS suites actually
had shaders that did this, we may as well enable it everywhere.
Adding stable because it appears apps in the wild do this.
Acked-by: Timothy Arceri <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: <[email protected]>
(cherry picked from commit babd1d526be4690204964f5e0a42f5df12f7f83b)
|