| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the case where the stencil clear is nicely aligned, we can clear
stencil much more efficiently by mapping it as a wide format (say
RGBA32_UINT) and blasting out the stencil clear value with a repclear.
On Unigine Heaven, this makes one stencil clear go from non-trivial to
unnoticeable when looking at per-draw timings.
In order for this change to work properly, ANV needs to do a bit more
flushing around depth and stencil clears. i965 and iris already have
the cache tracking logic to handle this so no changes are required
there.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
| |
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
This isn't known to fix any current bugs but it does prevent a
regression in a subsequent commit.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
Cc: [email protected]
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
No option is supported yet, this is just the boilerplate.
Cc: [email protected]
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Fixes: 9a129510f56f "anv: Bump maxComputeWorkgroupInvocations"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111552
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This bit redirects the state cache from the unified/RO sections of the
L3 cache to the "CS command buffer" section of the cache, which would
be set up via TCCNTLREG. The documentation says:
"Additionaly, this redirection should be enabled only if there is a
non-zero allocation for the CS command buffer section."
We don't allocate any cache to the CS command buffer section, so
enabling this redirection effectively disabled the state cache.
The Windows driver only sets up that section when using POSH, which
we do not currently use. So, leave it unallocated and disable the
redirection to get a functional state cache again.
Improves performance in Civilization VI by 18%, Manhattan 3.0 by 6%,
and Car Chase by 2%.
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit c0504569eac5e5c305e9f0c240e248aca9d8891f. Now that
we're doing interpolation lowering in NIR, we can continue to stride the
FS input registers directly in the brw_fs_nir code like we did before.
This fixes SIMD32 fragment shaders which broke because lower_simd_width
depended on the 0 stride to split PLN instructions correctly.
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit does two things. First, it simplifies the way we compute
the FB write group bit. There's no reason to use a ternary because
inst->group / 16 can only be 0 or 1. Second, it fixes an order-of-
operations bug where the ternary wasn't selecting between (1 << 11) and
0 but between (1 << 11) and 0 | brw_dp_write_desc(...).
Fixes: 0d9648416 "intel/compiler: Use generic SEND for Gen7+ FB writes"
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Set of opcodes doesn't have enough flexibility in certain cases. E.g.
Utgard PP has vector conditional select operation, but condition is always
scalar. Lowering all the vector selects to scalar increases instruction
number, so we need a way to filter only those ops that can't be handled
in hardware.
Reviewed-by: Qiang Yu <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Vasily Khoruzhick <[email protected]>
|
|
|
|
|
|
| |
Fixes: 9775894f102535a79186 ("anv: Move size check from anv_bo_cache_import() to caller (v2)")
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Acked-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a field name differs slightly between two generations then this
change will still add the fields into the same group.
For example, these will be treated as equal:
* "Software Exception" and "Software Exception"
* "Per Thread" and "Per-Thread"
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
See the previous commit for the explanation of the Fixes tag.
Hurts 21 shaders in shader-db. All of the hurt shaders are in Unreal
Engine 4 tech demos.
Reviewed-by: Matt Turner <[email protected]>
Fixes: 7afa26d4e39 ("nir: Add lowering for nir_op_bitfield_reverse.")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gen11 adds support for specifying the render target index and src0
alpha present bits in the extended message descriptor. Previously,
we had to use a message header for this, requiring extra instructions
to write the fields, and two registers of extra payload.
Improves performance on my ICL 8x8 frequency locked to 700Mhz, on iris:
GfxBench5 Manhattan 3.0: 2.13635% +/- 0.159859% (n=5)
GfxBench5 Aztec Ruins: 1.57173% +/- 0.128749% (n=5)
Synmark2 OglDeferred: 2.86914% +/- 0.191211% (n=10)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
This takes care of generate_fb_write/fire_fb_write/brw_fb_WRITE's stuff
earlier in the visitor. It will also make it easier to generate SENDSC
messages with indirect extended descriptors in a few patches.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
This will be used by visitor code to convert directly to SEND in a bit.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Annoyingly, these bits exist in some extended message descriptors
(in particular render target writes), but they don't have any
corresponding bits in the ISA encoding. So we can't use an immediate
and have to fall back to an indirect extended descriptor.
Thanks to Jason Ekstrand for reminding me that you can still set these
bits via an indirect descriptor, even if they don't exist in the ISA.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
src0 vstride and type overlap with bits of the extended descriptor.
brw_set_desc() also sets the extended descriptor to 0. So by setting
the descriptor, then setting src0, we were accidentally setting a bunch
of extended descriptor bits unintentionally.
When using this infrastructure for framebuffer writes (in a future
patch), this ended up setting the extended descriptor bit 20, which is
"Null Render Target" on Icelake, causing nothing to be written to the
framebuffer.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Looks like a copy/paste error. This patch prevents a segfault when
running the following on BDW:
INTEL_DEBUG=no8,no16,do32 ./deqp-vk -n \
dEQP-VK.subgroups.arithmetic.compute.subgroupmin_dvec4
For the curious, the message we're getting is:
CS compile failed: Failure to register allocate. Reduce number
of live scalar values to avoid this.
Fixes: 864737ce6cd5 ("i965/fs: Build 32-wide compute shader when needed.")
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Paulo Zanoni <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes dEQP-GLES3.functional.texture.specification subtests on iris:
- texsubimage3d_depth.depth24_stencil8_2d_array
- texsubimage3d_depth.depth32f_stencil8_2d_array
- texsubimage3d_depth.depth_component32f_2d_array
- texsubimage3d_depth.depth_component24_2d_array
- texstorage2d.format.depth24_stencil8_2d
- texstorage2d.format.depth32f_stencil8_2d
- texstorage2d.format.depth_component24_2d
- texstorage2d.format.depth_component32f_2d
- texstorage3d.format.depth24_stencil8_2d_array
- texstorage3d.format.depth32f_stencil8_2d_array
- texstorage3d.format.depth_component24_2d_array
- texstorage3d.format.depth_component32f_2d_array
Here, something appears to be going wrong with having this bit set
during blorp_copy operations for texture upload, which override the
format to R8G8B8A8_UINT.
AFAICT this bit should have no effect for integer surfaces, as it has
to do with blending, and integer blending is not a thing. So it should
be harmless to disable it.
The Windows driver appears to be setting this bit universally, so
I am unclear why we would need to. Perhaps they simply haven't run
into this issue.
Fixes: f741de236b5 ("isl: Enable Unorm Path in Color Pipe")
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Jason suggested I remove this in review, and he's right. AFAICT this
affects blending, and that just isn't going to happen on buffers.
Fixes: f741de236b5 ("isl: Enable Unorm Path in Color Pipe")
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It's not used by anything anymore now that so much lowering has been
moved into NIR. Sadly, we still need on in brw_compile_gs() for
geometry shaders on Sandy Bridge. Short of a lot of pointless work,
that one's probably not going away.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On commit f6e7de41d7b, we started emitting 3DSTATE_LINE_STIPPLE as part
of the non-dynamic state. That gets re-emitted every time we bind a new
VkPipeline. But that instruction is non-pipelined, and it caused a perf
regression of about 9-10% on Dota2.
This commit makes anv_dynamic_state_copy() return a mask with only the
state that has changed when copying it. 3DSTATE_LINE_STIPPLE won't be
emitted anymore unless it has changed, fixing the problem above.
v2: Improve commit message and add documentation about skipped checks
(Jason)
Fixes: f6e7de41d7b ("anv: Implement VK_EXT_line_rasterization")
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Initialize `next_batch_addr` and `second_level`. If the batch is well
formed, those values will be overriden, if not, they are as good as
uninitialized garbage.
Acked-by: Eric Engestrom <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Leftover from 021fa28163a ("xintel/nir: Add a helper for getting
BRW_AOP from an intrinsic").
Acked-by: Eric Engestrom <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Compiler can't see that d is initialized.
../src/intel/compiler/brw_vec4_nir.cpp: In function ‘int brw::try_immediate_source(const nir_alu_instr*, brw::src_reg*, bool, const gen_device_info*)’:
../src/intel/compiler/brw_vec4_nir.cpp:984:12: warning: ‘d’ may be used uninitialized in this function [-Wmaybe-uninitialized]
984 | d = MAX2(-d, d);
Assert that we expect at least one component -- hence d going to be
set. That by itself is not enough, so also zero initialize the
variable.
Acked-by: Eric Engestrom <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
So many duplicated switch statements....
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This better matches all the other atomic intrinsics such as those for
SSBOs and shared variables where the sign is part of the intrinsic
opcode. Both generators (GLSL and SPIR-V) know the sign from the type
of the image variable or handle. In SPIR-V, signed min/max are separate
opcodes from unsigned.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
| |
In a descriptor set inline uniform blocks don't use up any bindings.
However, the presence of any inline uniform blocks doed require the
use of the descriptor buffer, which takes up one binding.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Improves performance on my Icelake 8x8 locked to 700Mhz. For example,
some GfxBench5 subtests have the following results:
- [i965] gl_manhattan: ................ 7.01119% +/- 0.180971% (n=5)
- [i965] gl_4 (Car Chase): 4.24351% +/- 0.175622% (n=5)
- [i965] gl_blending: ................ 3.36327% +/- 0.180267% (n=5)
- [i965] gl_5_normal (Aztec Ruins): 1.67962% +/- 0.243534% (n=10)
- [iris] gl_manhattan: ................ 3.92357% +/- 0.073965% (n=25)
- [iris] gl_4 (Car Chase): 2.17746% +/- 0.0826858% (n=5)
- [iris] gl_blending: ................ 2.79599% +/- 0.803652% (n=15)
- [iris] gl_5_normal (Aztec Ruins): 1.30930% +/- 0.106523% (n=25)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When subslices_delta == 0 and we take the early return,
device->slice_hash is not initialized on GEN11. It then causes a
segfault when going through anv_DestroyDevice, if compiled with
valgrind.
Fixes: 7bc022b4bbc ("anv/gen11: Emit SLICE_HASH_TABLE when pipes are
unbalanced.)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
CID: 1452261
Fixes: 04a99515 "intel/compiler: add ability to override shader's assembly"
Signed-off-by: Danylo Piliaiev <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `device` can be set earlier either by a command line or a by
intercepting an ioctl call to get the I915_PARAM_CHIPSET_ID done by
the application early. In both cases `aub_file` and `devinfo` would
not be initialized.
Fix by splitting the conditions
- `device == 0`: use the FD to get both device and devinfo.
- Or `devinfo.gen == 0`: use `device` to initialize it.
And separatedly, initialize aub_file the first time it is needed.
Fixes: d594d2a0524 ("intel/tools: use device info initializer")
Acked-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
If the pixel pipes have a different number of subslices, emit a slice
hashing table that will ensure proper workload distribution.
v2: Don't need to set the mask - it's mbo (Ken).
|