| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes leaks for each glsl_type generated:
==32470== 384 bytes in 3 blocks are possibly lost in loss record 18 of 18
==32470== at 0x483880B: malloc (vg_replace_malloc.c:309)
==32470== by 0x4C43F4A: ralloc_size (ralloc.c:119)
==32470== by 0x4C44014: rzalloc_size (ralloc.c:151)
==32470== by 0x4C44258: rzalloc_array_size (ralloc.c:215)
==32470== by 0x4D38957: glsl_type::glsl_type(glsl_struct_field const*, unsigned int, char const*) (glsl_types.cpp:114)
==32470== by 0x4D3BEED: glsl_type::get_struct_instance(glsl_struct_field const*, unsigned int, char const*) (glsl_types.cpp:1146)
==32470== by 0x4D42ECC: glsl_struct_type (nir_types.cpp:501)
==32470== by 0x4CDB5A1: vtn_handle_type (spirv_to_nir.c:1269)
==32470== by 0x4CE53DD: vtn_handle_variable_or_type_instruction (spirv_to_nir.c:4018)
==32470== by 0x4CD8CFF: vtn_foreach_instruction (spirv_to_nir.c:365)
==32470== by 0x4CE5E6B: spirv_to_nir (spirv_to_nir.c:4490)
==32470== by 0x497AF10: anv_shader_compile_to_nir (anv_pipeline.c:173)
v2: move release call to vkDestroyInstance
Signed-off-by: Tapani Pälli <[email protected]>
Cc: [email protected]
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes following leak:
==21853== 32 bytes in 1 blocks are definitely lost in loss record 2 of 20
==21853== at 0x483AB1A: calloc (vg_replace_malloc.c:762)
==21853== by 0x4C4DD7F: util_vma_heap_free (vma.c:221)
==21853== by 0x4C4D647: util_vma_heap_init (vma.c:46)
==21853== by 0x4957B9F: anv_CreateDescriptorPool (anv_descriptor_set.c:578)
Fixes: c520f4dec9cb ("anv: Add a concept of a descriptor buffer")
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch maintains a list of sets in the pool and destroys possible
remaining sets when pool is destroyed.
As stated in Vulkan spec:
"When a pool is destroyed, all descriptor sets allocated from
the pool are implicitly freed and become invalid."
This fixes memory leaks spotted with valgrind:
==19622== 96 bytes in 1 blocks are definitely lost in loss record 2 of 3
==19622== at 0x483880B: malloc (vg_replace_malloc.c:309)
==19622== by 0x495B67E: default_alloc_func (anv_device.c:547)
==19622== by 0x4955E05: vk_alloc (vk_alloc.h:36)
==19622== by 0x4956A8F: anv_multialloc_alloc (anv_private.h:538)
==19622== by 0x4956A8F: anv_CreateDescriptorSetLayout (anv_descriptor_set.c:217)
Fixes: 14f6275c92f1 ("anv/descriptor_set: add reference counting for descriptor set layouts")
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
| |
This information will be used by the vkpipeline-db tool.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Now that nir_opt_copy_prop_vars can properly handle array derefs on
vectors, it's safe to move UBO and SSBO lowering to late in the
pipeline. This should allow NIR to actually start optimizing SSBO
access.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It doesn't really matter where this pass goes as long as it's after we
call nir_lower_explicit_io and before we go into the back-end. Putting
it brw_postprocess_nir lets us move nir_lower_explicit_io significantly
later in the pipeline.
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to be used for OpenGL (right now for ARB_gl_spirv).
This commit adds two new structures:
* nir_xfb_varying_info: that identifies each individual varying. For
each one, we need to know the type, buffer and xfb_offset
* nir_xfb_buffer_info: as now for each buffer, in addition to the
stride, we need to know how many varyings are assigned to it.
For this patch, the only case where num_outputs != num_varyings is
with the case of doubles, that for dvec3/4 could require more than one
output. There are more cases though (like aoa), that will be handled
on following patches.
v2: updated after new nir general XFB support introduced for "anv: Add
support for VK_EXT_transform_feedback"
v3: compute num_varyings beforehand for allocating, instead of relying
on num_outputs as approximate value (Timothy Arceri)
Reviewed-by: Timothy Arceri <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
| |
When Guc is enabled, the error state will contain a "global" buffer
for the GuC log buffer.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
| |
We don't write them in the aub file yet.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
| |
Found out that some base64 data matched the '---' identifier. We can
avoid this by adding the surrounding spaces.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
The error state contains several kind of BOs, including the context
image which we will want to write in a later commit. Because it can
come later in the error state than the user buffers and because we
need to write it first in the aub file, we have to first build a list
of BOs and then write them in the appropriate order.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
| |
Unlike most of the cases in which we do this by hand, the new helper
properly handles non-32-bit pointers.
Reviewed-by: Karol Herbst <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
| |
This is necessary for legacy texture buffer object formats, where we'll
need to use a swizzle to fake e.g. luminance.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We allocate GGTT entries and physical addresses are we create engines
rather than having a fixed layout.
Context images now receive a parameter argument which is used to setup
pml4 & ring buffer addresses.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We'll make them more parameterized in a later commit.
As this is just a transitional commit, we allow ourself to leak the
context images allocated in get_context_init(). We'll fix this in the
next commit.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
| |
We want to use this allocator in the next commit for GGTT pages.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Prepare aub write to deal with multiple engine instances. We don't
pass the instance number yet this could be done in the future by
having a 2 dimensional array of struct engine.
Signed-off-by: Lionel Landwerlin <[email protected]>
Acked-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
| |
We want to reuse the execlist submission, but won't need the ring
buffer update.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
| |
In the future we'll want error2aub to reuse the context image saved by
i915 instead of the default one we write in intel_dump_gpu.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
IGT has a test to hang the GPU that works by having a batch buffer
jump back into itself, trigger an infinite loop on the command stream.
As our implementation of the decoding is "perfectly" mimicking the
hardware, our decoder also "hangs". This change limits the number of
batch buffer we'll decode before we bail to 100.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
| |
An MI_BATCH_BUFFER_START in the ring buffer acts as a second level
batchbuffer (aka jump back to ring buffer when running into a
MI_BATCH_BUFFER_END).
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
| |
Some commands like MI_BATCH_BUFFER_START have this indicator.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Rafael Antognolli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes leaks from anv_device_upload_nir:
==7345== 8,192 bytes in 2 blocks are definitely lost in loss record 24 of 24
==7345== at 0x4C2ED78: malloc (vg_replace_malloc.c:308)
==7345== by 0x4C31393: realloc (vg_replace_malloc.c:836)
==7345== by 0x54E0848: grow_to_fit (blob.c:67)
==7345== by 0x54E0BE5: blob_reserve_bytes (blob.c:166)
==7345== by 0x54E0C7C: blob_reserve_intptr (blob.c:186)
==7345== by 0x54704A7: nir_serialize (nir_serialize.c:1091)
==7345== by 0x512F97D: anv_device_upload_nir (anv_pipeline_cache.c:756)
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use anv_gem_munmap for unmap when softpin in use, this corresponds to
anv_gem_mmap used in anv_block_pool_expand_range. This fixes valgrind
errors seen for each pool when softpin is in use:
==25581== 262,144 bytes in 1 blocks are definitely lost in loss record 31 of 31
==25581== at 0x50E77E8: anv_gem_mmap (anv_gem.c:96)
==25581== by 0x50EEE2B: anv_block_pool_expand_range (anv_allocator.c:543)
==25581== by 0x50EEB51: anv_block_pool_init (anv_allocator.c:477)
==25581== by 0x50EF7EF: anv_state_pool_init (anv_allocator.c:920)
==25581== by 0x510B8EB: anv_CreateDevice (anv_device.c:2031)
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we have a loop unrolling cost function and loop unrolling isn't
going to kill us the moment we have a 64-bit op in a loop, we can go
ahead and move 64-bit lowering later. This gives us the opportunity to
do more optimizations and actually let the full optimizer run even on
64-bit ops rather than hoping one round of opt_algebraic will fix
everything. This substantially reduces both fp64 shader compile times
and the resulting code size. On the vs-isnan-dvec test from piglit:
Before this commit:
1684.63s user 17.29s system 99% cpu 28:28.24 total
101479 instructions. 0 loops. 802452 cycles. 79:369 spills:fills.
Peak memory usage (according to massif): 1.435 GB
After this commit:
179.64s user 7.75s system 99% cpu 3:07.92 total
57316 instructions. 0 loops. 459287 cycles. 0:0 spills:fills.
Peak memory usage (according to massif): 531.0 MB
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of trusting the caller to already have created a softfp64
function shader and added all its functions to our shader, we simply
take the softfp64 shader as an argument and do the function inlining
ouselves. This means that there's no more nasty functions lying around
that the caller needs to worry about cleaning up.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Even though this is technically a step in the function inlining process
as laid out in nir_inline_functions.c, it's not really needed. We
already have constant initializers lowered here and no new ones are
added by appending the softfp64 functions.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the old code, we would generate the exact same instruction for
extract_u8(some_u64, 0) and extract_u8(some_u64, 1). The mask-a-word
trick only works for even numbered bytes.
This fixes the (new) piglit test
tests/spec/arb_gpu_shader_int64/execution/fs-ushr-and-mask.shader_test.
v2: Use a SHR instead of an AND. This saves an instruction compared to
using two moves. Suggested by Jason.
Fixes: 6ac2d169019 ("i965/fs: Fix extract_i8/u8 to a 64-bit destination")
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Fixes: 6ac2d169019 ("i965/fs: Fix extract_i8/u8 to a 64-bit destination")
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The parameter is never used, and it's not part of a common interface
idiom. Remove it.
src/intel/compiler/brw_interpolation_map.c: In function ‘brw_setup_vue_interpolation’:
src/intel/compiler/brw_interpolation_map.c:62:59: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
const struct gen_device_info *devinfo)
^~~~~~~
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In file included from src/intel/compiler/brw_eu_util.c:34:0:
src/intel/compiler/brw_eu.h: In function ‘brw_message_desc_header_present’:
src/intel/compiler/brw_eu.h:288:63: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
brw_message_desc_header_present(const struct gen_device_info *devinfo,
^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_message_ex_desc’:
src/intel/compiler/brw_eu.h:296:51: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
brw_message_ex_desc(const struct gen_device_info *devinfo,
^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_message_ex_desc_ex_mlen’:
src/intel/compiler/brw_eu.h:303:59: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
brw_message_ex_desc_ex_mlen(const struct gen_device_info *devinfo,
^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_binding_table_index’:
src/intel/compiler/brw_eu.h:337:68: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
brw_sampler_desc_binding_table_index(const struct gen_device_info *devinfo,
^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_sampler’:
src/intel/compiler/brw_eu.h:344:56: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
brw_sampler_desc_sampler(const struct gen_device_info *devinfo, uint32_t desc)
^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_return_format’:
src/intel/compiler/brw_eu.h:371:62: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
brw_sampler_desc_return_format(const struct gen_device_info *devinfo,
^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_dp_desc_binding_table_index’:
src/intel/compiler/brw_eu.h:405:63: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
brw_dp_desc_binding_table_index(const struct gen_device_info *devinfo,
^~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_dp_a64_untyped_atomic_desc’:
src/intel/compiler/brw_eu.h:754:41: warning: unused parameter ‘exec_size’ [-Wunused-parameter]
unsigned exec_size, /**< 0 for SIMD4x2 */
^~~~~~~~~
src/intel/compiler/brw_eu.h: In function ‘brw_dp_a64_untyped_atomic_float_desc’:
src/intel/compiler/brw_eu.h:775:47: warning: unused parameter ‘exec_size’ [-Wunused-parameter]
unsigned exec_size,
^~~~~~~~~
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Replace was done using:
find ./src -type f -exec sed -i -- \
's/is_record(/is_struct(/g' {} \;
Acked-by: Karol Herbst <[email protected]>
Acked-by: Jason Ekstrand <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
v2: Ignore the import if handleType == 0. (Jason)
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
| |
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
We need more space than just a 32-bit scalar and we have to burn all
that space anyway so we may as well expose it to the driver. This also
fixes a subtle bug when UBOs and SSBOs have different pointer types.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This buffer goes along side the CPU data structure and may contain
pointers, bindless handles, or any other descriptor information.
Currently, all descriptors are size zero and nothing goes in the buffer
but this commit sets up the framework we will need later.
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
Technically, descriptor set layouts aren't required to survive past the
function they're passed into so we need to reference them.
Cc: "19.0" <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Pull the common code out of the two entrypoints into the helper which
fetches the push descriptor set for us. Now that it does more than just
get a thing, call it anv_cmd_buffer_push_descriptor_set.
Cc: "19.0" <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|