| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
We allocate GGTT entries and physical addresses are we create engines
rather than having a fixed layout.
Context images now receive a parameter argument which is used to setup
pml4 & ring buffer addresses.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We'll make them more parameterized in a later commit.
As this is just a transitional commit, we allow ourself to leak the
context images allocated in get_context_init(). We'll fix this in the
next commit.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
| |
We want to use this allocator in the next commit for GGTT pages.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Prepare aub write to deal with multiple engine instances. We don't
pass the instance number yet this could be done in the future by
having a 2 dimensional array of struct engine.
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
| |
We want to reuse the execlist submission, but won't need the ring
buffer update.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
| |
In the future we'll want error2aub to reuse the context image saved by
i915 instead of the default one we write in intel_dump_gpu.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
IGT has a test to hang the GPU that works by having a batch buffer
jump back into itself, trigger an infinite loop on the command stream.
As our implementation of the decoding is "perfectly" mimicking the
hardware, our decoder also "hangs". This change limits the number of
batch buffer we'll decode before we bail to 100.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
| |
An MI_BATCH_BUFFER_START in the ring buffer acts as a second level
batchbuffer (aka jump back to ring buffer when running into a
MI_BATCH_BUFFER_END).
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
| |
Some commands like MI_BATCH_BUFFER_START have this indicator.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes leaks from anv_device_upload_nir:
==7345== 8,192 bytes in 2 blocks are definitely lost in loss record 24 of 24
==7345== at 0x4C2ED78: malloc (vg_replace_malloc.c:308)
==7345== by 0x4C31393: realloc (vg_replace_malloc.c:836)
==7345== by 0x54E0848: grow_to_fit (blob.c:67)
==7345== by 0x54E0BE5: blob_reserve_bytes (blob.c:166)
==7345== by 0x54E0C7C: blob_reserve_intptr (blob.c:186)
==7345== by 0x54704A7: nir_serialize (nir_serialize.c:1091)
==7345== by 0x512F97D: anv_device_upload_nir (anv_pipeline_cache.c:756)
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use anv_gem_munmap for unmap when softpin in use, this corresponds to
anv_gem_mmap used in anv_block_pool_expand_range. This fixes valgrind
errors seen for each pool when softpin is in use:
==25581== 262,144 bytes in 1 blocks are definitely lost in loss record 31 of 31
==25581== at 0x50E77E8: anv_gem_mmap (anv_gem.c:96)
==25581== by 0x50EEE2B: anv_block_pool_expand_range (anv_allocator.c:543)
==25581== by 0x50EEB51: anv_block_pool_init (anv_allocator.c:477)
==25581== by 0x50EF7EF: anv_state_pool_init (anv_allocator.c:920)
==25581== by 0x510B8EB: anv_CreateDevice (anv_device.c:2031)
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we have a loop unrolling cost function and loop unrolling isn't
going to kill us the moment we have a 64-bit op in a loop, we can go
ahead and move 64-bit lowering later. This gives us the opportunity to
do more optimizations and actually let the full optimizer run even on
64-bit ops rather than hoping one round of opt_algebraic will fix
everything. This substantially reduces both fp64 shader compile times
and the resulting code size. On the vs-isnan-dvec test from piglit:
Before this commit:
1684.63s user 17.29s system 99% cpu 28:28.24 total
101479 instructions. 0 loops. 802452 cycles. 79:369 spills:fills.
Peak memory usage (according to massif): 1.435 GB
After this commit:
179.64s user 7.75s system 99% cpu 3:07.92 total
57316 instructions. 0 loops. 459287 cycles. 0:0 spills:fills.
Peak memory usage (according to massif): 531.0 MB
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of trusting the caller to already have created a softfp64
function shader and added all its functions to our shader, we simply
take the softfp64 shader as an argument and do the function inlining
ouselves. This means that there's no more nasty functions lying around
that the caller needs to worry about cleaning up.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Even though this is technically a step in the function inlining process
as laid out in nir_inline_functions.c, it's not really needed. We
already have constant initializers lowered here and no new ones are
added by appending the softfp64 functions.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the old code, we would generate the exact same instruction for
extract_u8(some_u64, 0) and extract_u8(some_u64, 1). The mask-a-word
trick only works for even numbered bytes.
This fixes the (new) piglit test
tests/spec/arb_gpu_shader_int64/execution/fs-ushr-and-mask.shader_test.
v2: Use a SHR instead of an AND. This saves an instruction compared to
using two moves. Suggested by Jason.
Fixes: 6ac2d169019 ("i965/fs: Fix extract_i8/u8 to a 64-bit destination")
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Fixes: 6ac2d169019 ("i965/fs: Fix extract_i8/u8 to a 64-bit destination")
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The parameter is never used, and it's not part of a common interface
idiom. Remove it.
src/intel/compiler/brw_interpolation_map.c: In function ‘brw_setup_vue_interpolation’:
src/intel/compiler/brw_interpolation_map.c:62:59: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
const struct gen_device_info *devinfo)
^~~~~~~
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In file included from src/intel/compiler/brw_eu_util.c:34:0:
src/intel/compiler/brw_eu.h: In function ‘brw_message_desc_header_present’:
src/intel/compiler/brw_eu.h:288:63: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
brw_message_desc_header_present(const struct gen_device_info *devinfo,
^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_message_ex_desc’:
src/intel/compiler/brw_eu.h:296:51: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
brw_message_ex_desc(const struct gen_device_info *devinfo,
^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_message_ex_desc_ex_mlen’:
src/intel/compiler/brw_eu.h:303:59: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
brw_message_ex_desc_ex_mlen(const struct gen_device_info *devinfo,
^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_binding_table_index’:
src/intel/compiler/brw_eu.h:337:68: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
brw_sampler_desc_binding_table_index(const struct gen_device_info *devinfo,
^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_sampler’:
src/intel/compiler/brw_eu.h:344:56: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
brw_sampler_desc_sampler(const struct gen_device_info *devinfo, uint32_t desc)
^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_return_format’:
src/intel/compiler/brw_eu.h:371:62: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
brw_sampler_desc_return_format(const struct gen_device_info *devinfo,
^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_dp_desc_binding_table_index’:
src/intel/compiler/brw_eu.h:405:63: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
brw_dp_desc_binding_table_index(const struct gen_device_info *devinfo,
^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_dp_a64_untyped_atomic_desc’:
src/intel/compiler/brw_eu.h:754:41: warning: unused parameter ‘exec_size’ [-Wunused-parameter]
unsigned exec_size, /**< 0 for SIMD4x2 */
^~~~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_dp_a64_untyped_atomic_float_desc’:
src/intel/compiler/brw_eu.h:775:47: warning: unused parameter ‘exec_size’ [-Wunused-parameter]
unsigned exec_size,
^~~~~~~~~
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Replace was done using:
find ./src -type f -exec sed -i -- \
's/is_record(/is_struct(/g' {} \;
Acked-by: Karol Herbst <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
v2: Ignore the import if handleType == 0. (Jason)
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
We need more space than just a 32-bit scalar and we have to burn all
that space anyway so we may as well expose it to the driver. This also
fixes a subtle bug when UBOs and SSBOs have different pointer types.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This buffer goes along side the CPU data structure and may contain
pointers, bindless handles, or any other descriptor information.
Currently, all descriptors are size zero and nothing goes in the buffer
but this commit sets up the framework we will need later.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
Technically, descriptor set layouts aren't required to survive past the
function they're passed into so we need to reference them.
Cc: "19.0" <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Pull the common code out of the two entrypoints into the helper which
fetches the push descriptor set for us. Now that it does more than just
get a thing, call it anv_cmd_buffer_push_descriptor_set.
Cc: "19.0" <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
| |
It has exactly one caller. Just inline it.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The descriptor set layout code in our driver has undergone many changes
over the years. Some of the fields which were once essential are now
useless or nearly so. The has_dynamic_offsets field was completely
unused accept for the code to set and hash it. The per-stage indices
were only being used to determine if a particular binding had images,
samplers, etc. The fact that it's per-stage also doesn't matter because
that binding should never be accessed by a shader of the wrong stage.
This commit deletes a pile of cruft and replaces it all with a
descriptive bitfield which states what a particular descriptor contains.
This merely describes the data available and doesn't necessarily dictate
how it will be lowered in anv_nir_apply_pipeline_layout.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is what we're actually storing in the descriptor set and consuming
when we bind surface states. This commit renames image_count to
image_param_count a few places and moves the decision to not count image
params on gen9+ into anv_descriptor_set.c when we build the layout.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
We emit the surface states for those on-the-fly so we don't need the
buffer view.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
Make them all take a device followed by a set. This is consistent
with how the actual Vulkan entrypoint parameters are laid out.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
This commit just puts the free list code together as part of the pool
instead of having it inlined into the descriptor set create code.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
| |
We already propagate coord_components correctly and did not have
layer restrictions for ycbcr formats.
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
This does not seem to fix anything ATM but is the right thing todo.
Signed-off-by: Tapani Pälli <[email protected]>
Fixes: f3e91e78a33775 ("anv: add nir lowering pass for ycbcr textures")
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
We zero out the prog data anyway and, now that bias is always zero, this
function is accomplishing nothing.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This commit moves our handling of gl_NumWorkgroups over to work like our
handling of other special bindings in the Vulkan driver. We give it a
magic descriptor set number and teach emit_binding_tables to handle it.
This is better than the bias mechanism we were using because it allows
us to do proper accounting through the bind map mechanism.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
When we have a larger sampler index, we get into the "high sampler"
scenario and need an instruction header. Even in SIMD8, this pushes the
instruction over the sampler message size maximum of 11 registers.
Instead, we have to lower TXD to TXL.
Fixes: cb98e0755f8d "intel/fs: Support min_lod parameters on texture..."
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
We were accidentally not counting those surfaces
Fixes: ddc4069122 "anv: Implement VK_KHR_maintenance3"
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Gen 8 and 9, "mul" instruction supports 64 bit destination type. We
can reduce our 64x64 int multiplication from 4 instructions to 3.
Also instead of emitting two mul instructions, we can emit single mul
instuction and extract low/high 32 bits from 64 bit result for
[i/u]mulExtended
v2: 1) Allow lower_mul_high64 to use new opcode (Jason Ekstrand)
2) Add lower_mul_2x32_64 flag (Matt Turner)
3) Remove associative property as bit size is different (Connor
Abbott)
v3: Fix indentation and variable naming convention (Jason Ekstrand)
Signed-off-by: Sagar Ghuge <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes undefined reference building errors for XML_* functions
Signed-off-by: Mauro Rossi <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
Cc: "19.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix anv_extrypoints.{c,h} and anv_extensions.{c,h} missing dependencies
Rename the variable labels according to targets and python scripts
Align the building rules as per Automake for simplification
Fixes building errors during rebuils due to missing dependencies
(v2) Fixed a missing $(VULKAN_API_XML) reference
Fixes: 9a508b7 ("android: anv/extensions: fix generated sources build")
Fixes: dd088d4bec7 ("anv/extensions: Generate a header file with extension tables")
Signed-off-by: Mauro Rossi <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Cc: "19.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Instead of calculating the int64 and doubles lowering options each
time a shader is preprocessed, save and use the values in
nir_shader_compiler_options.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
| |
This ran afoul of Iris's use of nir_lower_clamp_color_outputs which
applies fsat() before writes to vertex shader color outpus.
Reviewed-by: Kenneth Graunke <[email protected]>
Fixes: 7725d609387 ("intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As requested by Ken ;)
v2: Also decode simple batches (Caio)
Fix u_vector usage issues (Lionel)
v3: Make binding/instruction/state/surface available (Lionel)
v4: Going through device pools for simple batches (Lionel)
Centralize search BO callbacks into anv_device.c (Lionel)
v5: Clear decoded batch buffer var after use (Caio)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
| |
Reviewed-by: Jordan Justen <[email protected]>
|