| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
v2: by J.Ekstrand suggestion moved lowering of large
constants after lowering of copy_deref is done.
CC: Jason Ekstrand <[email protected]>
CC: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111450
Signed-off-by: Sergii Romantsov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This option strictly allocate the minImageCount given by the
application at swapchain creation.
This works around application that do not deal with the fact that the
implementation allocates more images than the minimum specified.
v2: Add values in default drirc (Bas)
v3: specify engine name/version (Lionel)
Signed-off-by: Lionel Landwerlin <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111522
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Cc: 19.2 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Vulkan applications can register with the following structure :
typedef struct VkApplicationInfo {
VkStructureType sType;
const void* pNext;
const char* pApplicationName;
uint32_t applicationVersion;
const char* pEngineName;
uint32_t engineVersion;
uint32_t apiVersion;
} VkApplicationInfo;
This enables the Vulkan implementations to apply workarounds based off
matching this description.
Here we add a new parameter for matching the driconfig options with
the following :
<device driver="anv">
<application engine_name_match="MyOwnEngine.*" engine_versions="10:12,40:42">
<option name="blaaah" value="true" />
</application>
</device>
v2: switch engine name match to use regexps
v3: Verify that the regexec returns REG_NOMATCH for match failure (Eric)
v4: Add missing bit that went to the following commit (Eric)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Cc: 19.2 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the UNDEF instruction was added, we didn't do anything special in
split_virtual_grfs. This mean that anything with an UNDEF wasn't
getting split which causes problems for the compiler. Among other
things, it makes RA harder because things are in bigger chunks. It also
meant that dvec4s weren't getting split which means that they are larger
than the maximum register size.
Shader-db results on Kaby Lake:
total instructions in shared programs: 14959202 -> 14960035 (<.01%)
instructions in affected programs: 96197 -> 97030 (0.87%)
helped: 140
HURT: 128
helped stats (abs) min: 1 max: 17 x̄: 1.62 x̃: 1
helped stats (rel) min: 0.09% max: 6.15% x̄: 0.65% x̃: 0.45%
HURT stats (abs) min: 1 max: 825 x̄: 8.28 x̃: 1
HURT stats (rel) min: 0.13% max: 139.83% x̄: 1.70% x̃: 0.50%
95% mean confidence interval for instructions value: -2.96 9.18
95% mean confidence interval for instructions %-change: -0.56% 1.51%
Inconclusive result (value mean confidence interval includes 0).
total loops in shared programs: 4372 -> 4372 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs: 352646771 -> 352840997 (0.06%)
cycles in affected programs: 218600800 -> 218795026 (0.09%)
helped: 21167
HURT: 21411
helped stats (abs) min: 1 max: 2924 x̄: 36.89 x̃: 10
helped stats (rel) min: <.01% max: 41.90% x̄: 2.97% x̃: 0.98%
HURT stats (abs) min: 1 max: 26027 x̄: 45.54 x̃: 10
HURT stats (rel) min: <.01% max: 324.46% x̄: 3.88% x̃: 1.06%
95% mean confidence interval for cycles value: 2.87 6.26
95% mean confidence interval for cycles %-change: 0.40% 0.55%
Cycles are HURT.
total spills in shared programs: 8840 -> 8953 (1.28%)
spills in affected programs: 126 -> 239 (89.68%)
helped: 1
HURT: 2
total fills in shared programs: 21782 -> 21914 (0.61%)
fills in affected programs: 431 -> 563 (30.63%)
helped: 1
HURT: 3
LOST: 0
GAINED: 5
Shader-db results on Haswell:
total instructions in shared programs: 13320918 -> 13320769 (<.01%)
instructions in affected programs: 40998 -> 40849 (-0.36%)
helped: 146
HURT: 56
helped stats (abs) min: 1 max: 8 x̄: 2.73 x̃: 2
helped stats (rel) min: 0.16% max: 8.60% x̄: 2.52% x̃: 2.22%
HURT stats (abs) min: 2 max: 23 x̄: 4.45 x̃: 4
HURT stats (rel) min: 0.21% max: 10.26% x̄: 6.83% x̃: 10.26%
95% mean confidence interval for instructions value: -1.26 -0.21
95% mean confidence interval for instructions %-change: -0.62% 0.77%
Inconclusive result (%-change mean confidence interval includes 0).
total loops in shared programs: 4373 -> 4373 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs: 374518258 -> 374384193 (-0.04%)
cycles in affected programs: 231101954 -> 230967889 (-0.06%)
helped: 21427
HURT: 19438
helped stats (abs) min: 1 max: 2035 x̄: 31.09 x̃: 8
helped stats (rel) min: <.01% max: 40.95% x̄: 2.42% x̃: 0.86%
HURT stats (abs) min: 1 max: 20875 x̄: 27.38 x̃: 8
HURT stats (rel) min: <.01% max: 59.09% x̄: 2.49% x̃: 0.80%
95% mean confidence interval for cycles value: -4.49 -2.07
95% mean confidence interval for cycles %-change: -0.14% -0.04%
Cycles are helped.
total spills in shared programs: 23406 -> 23411 (0.02%)
spills in affected programs: 3 -> 8 (166.67%)
helped: 0
HURT: 2
total fills in shared programs: 34845 -> 34850 (0.01%)
fills in affected programs: 3 -> 8 (166.67%)
helped: 0
HURT: 2
LOST: 0
GAINED: 0
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111566
Fixes: f4ef34f207d1 "intel/fs: Add an UNDEF instruction to avoid..."
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
| |
Initial benchmarking didn't show any performance benefits. But it might eventually.
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change needed to fix the following building error:
In file included from external/mesa/src/intel/vulkan/anv_device.c:43:
external/mesa/src/util/xmlpool.h:115:10: fatal error: 'xmlpool/options.h' file not found
^~~~~~~~~~~~~~~~~~~
1 error generated.
Fixes: 4dcb1ff ("anv: add support for driconf")
Signed-off-by: Mauro Rossi <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the case where the stencil clear is nicely aligned, we can clear
stencil much more efficiently by mapping it as a wide format (say
RGBA32_UINT) and blasting out the stencil clear value with a repclear.
On Unigine Heaven, this makes one stencil clear go from non-trivial to
unnoticeable when looking at per-draw timings.
In order for this change to work properly, ANV needs to do a bit more
flushing around depth and stencil clears. i965 and iris already have
the cache tracking logic to handle this so no changes are required
there.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
| |
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
This isn't known to fix any current bugs but it does prevent a
regression in a subsequent commit.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
Cc: [email protected]
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
No option is supported yet, this is just the boilerplate.
Cc: [email protected]
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Fixes: 9a129510f56f "anv: Bump maxComputeWorkgroupInvocations"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111552
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This bit redirects the state cache from the unified/RO sections of the
L3 cache to the "CS command buffer" section of the cache, which would
be set up via TCCNTLREG. The documentation says:
"Additionaly, this redirection should be enabled only if there is a
non-zero allocation for the CS command buffer section."
We don't allocate any cache to the CS command buffer section, so
enabling this redirection effectively disabled the state cache.
The Windows driver only sets up that section when using POSH, which
we do not currently use. So, leave it unallocated and disable the
redirection to get a functional state cache again.
Improves performance in Civilization VI by 18%, Manhattan 3.0 by 6%,
and Car Chase by 2%.
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit c0504569eac5e5c305e9f0c240e248aca9d8891f. Now that
we're doing interpolation lowering in NIR, we can continue to stride the
FS input registers directly in the brw_fs_nir code like we did before.
This fixes SIMD32 fragment shaders which broke because lower_simd_width
depended on the 0 stride to split PLN instructions correctly.
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit does two things. First, it simplifies the way we compute
the FB write group bit. There's no reason to use a ternary because
inst->group / 16 can only be 0 or 1. Second, it fixes an order-of-
operations bug where the ternary wasn't selecting between (1 << 11) and
0 but between (1 << 11) and 0 | brw_dp_write_desc(...).
Fixes: 0d9648416 "intel/compiler: Use generic SEND for Gen7+ FB writes"
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Set of opcodes doesn't have enough flexibility in certain cases. E.g.
Utgard PP has vector conditional select operation, but condition is always
scalar. Lowering all the vector selects to scalar increases instruction
number, so we need a way to filter only those ops that can't be handled
in hardware.
Reviewed-by: Qiang Yu <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
|
| |
Fixes: 9775894f102535a79186 ("anv: Move size check from anv_bo_cache_import() to caller (v2)")
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Acked-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a field name differs slightly between two generations then this
change will still add the fields into the same group.
For example, these will be treated as equal:
* "Software Exception" and "Software Exception"
* "Per Thread" and "Per-Thread"
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
See the previous commit for the explanation of the Fixes tag.
Hurts 21 shaders in shader-db. All of the hurt shaders are in Unreal
Engine 4 tech demos.
Reviewed-by: Matt Turner <[email protected]>
Fixes: 7afa26d4e39 ("nir: Add lowering for nir_op_bitfield_reverse.")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gen11 adds support for specifying the render target index and src0
alpha present bits in the extended message descriptor. Previously,
we had to use a message header for this, requiring extra instructions
to write the fields, and two registers of extra payload.
Improves performance on my ICL 8x8 frequency locked to 700Mhz, on iris:
GfxBench5 Manhattan 3.0: 2.13635% +/- 0.159859% (n=5)
GfxBench5 Aztec Ruins: 1.57173% +/- 0.128749% (n=5)
Synmark2 OglDeferred: 2.86914% +/- 0.191211% (n=10)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
This takes care of generate_fb_write/fire_fb_write/brw_fb_WRITE's stuff
earlier in the visitor. It will also make it easier to generate SENDSC
messages with indirect extended descriptors in a few patches.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
This will be used by visitor code to convert directly to SEND in a bit.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Annoyingly, these bits exist in some extended message descriptors
(in particular render target writes), but they don't have any
corresponding bits in the ISA encoding. So we can't use an immediate
and have to fall back to an indirect extended descriptor.
Thanks to Jason Ekstrand for reminding me that you can still set these
bits via an indirect descriptor, even if they don't exist in the ISA.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
src0 vstride and type overlap with bits of the extended descriptor.
brw_set_desc() also sets the extended descriptor to 0. So by setting
the descriptor, then setting src0, we were accidentally setting a bunch
of extended descriptor bits unintentionally.
When using this infrastructure for framebuffer writes (in a future
patch), this ended up setting the extended descriptor bit 20, which is
"Null Render Target" on Icelake, causing nothing to be written to the
framebuffer.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Looks like a copy/paste error. This patch prevents a segfault when
running the following on BDW:
INTEL_DEBUG=no8,no16,do32 ./deqp-vk -n \
dEQP-VK.subgroups.arithmetic.compute.subgroupmin_dvec4
For the curious, the message we're getting is:
CS compile failed: Failure to register allocate. Reduce number
of live scalar values to avoid this.
Fixes: 864737ce6cd5 ("i965/fs: Build 32-wide compute shader when needed.")
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Paulo Zanoni <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes dEQP-GLES3.functional.texture.specification subtests on iris:
- texsubimage3d_depth.depth24_stencil8_2d_array
- texsubimage3d_depth.depth32f_stencil8_2d_array
- texsubimage3d_depth.depth_component32f_2d_array
- texsubimage3d_depth.depth_component24_2d_array
- texstorage2d.format.depth24_stencil8_2d
- texstorage2d.format.depth32f_stencil8_2d
- texstorage2d.format.depth_component24_2d
- texstorage2d.format.depth_component32f_2d
- texstorage3d.format.depth24_stencil8_2d_array
- texstorage3d.format.depth32f_stencil8_2d_array
- texstorage3d.format.depth_component24_2d_array
- texstorage3d.format.depth_component32f_2d_array
Here, something appears to be going wrong with having this bit set
during blorp_copy operations for texture upload, which override the
format to R8G8B8A8_UINT.
AFAICT this bit should have no effect for integer surfaces, as it has
to do with blending, and integer blending is not a thing. So it should
be harmless to disable it.
The Windows driver appears to be setting this bit universally, so
I am unclear why we would need to. Perhaps they simply haven't run
into this issue.
Fixes: f741de236b5 ("isl: Enable Unorm Path in Color Pipe")
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Jason suggested I remove this in review, and he's right. AFAICT this
affects blending, and that just isn't going to happen on buffers.
Fixes: f741de236b5 ("isl: Enable Unorm Path in Color Pipe")
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It's not used by anything anymore now that so much lowering has been
moved into NIR. Sadly, we still need on in brw_compile_gs() for
geometry shaders on Sandy Bridge. Short of a lot of pointless work,
that one's probably not going away.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On commit f6e7de41d7b, we started emitting 3DSTATE_LINE_STIPPLE as part
of the non-dynamic state. That gets re-emitted every time we bind a new
VkPipeline. But that instruction is non-pipelined, and it caused a perf
regression of about 9-10% on Dota2.
This commit makes anv_dynamic_state_copy() return a mask with only the
state that has changed when copying it. 3DSTATE_LINE_STIPPLE won't be
emitted anymore unless it has changed, fixing the problem above.
v2: Improve commit message and add documentation about skipped checks
(Jason)
Fixes: f6e7de41d7b ("anv: Implement VK_EXT_line_rasterization")
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Initialize `next_batch_addr` and `second_level`. If the batch is well
formed, those values will be overriden, if not, they are as good as
uninitialized garbage.
Acked-by: Eric Engestrom <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Leftover from 021fa28163a ("xintel/nir: Add a helper for getting
BRW_AOP from an intrinsic").
Acked-by: Eric Engestrom <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Compiler can't see that d is initialized.
../src/intel/compiler/brw_vec4_nir.cpp: In function ‘int brw::try_immediate_source(const nir_alu_instr*, brw::src_reg*, bool, const gen_device_info*)’:
../src/intel/compiler/brw_vec4_nir.cpp:984:12: warning: ‘d’ may be used uninitialized in this function [-Wmaybe-uninitialized]
984 | d = MAX2(-d, d);
Assert that we expect at least one component -- hence d going to be
set. That by itself is not enough, so also zero initialize the
variable.
Acked-by: Eric Engestrom <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
So many duplicated switch statements....
Reviewed-by: Kenneth Graunke <[email protected]>
|