summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* freedreno: a2xx: fix gmem2mem viewportJonathan Marek2019-01-211-0/+7
| | | | | | | Fixes cases where previous viewport values might case gmem2mem to fail. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno: a2xx: cleanup REG_A2XX_PA_CL_VTE_CNTLJonathan Marek2019-01-212-18/+20
| | | | | | | | | | Doesn't change much, but reduces the size of fd2_emit_state gmem2mem does not need to change the value: no Z clipping on resolve mem2gmem now needs to restore the common value after rendering Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno: a2xx: cleanup init_shader_constJonathan Marek2019-01-213-14/+11
| | | | | | | | | Only 3 vertices are used so we can drop the data for vertex 4 It doesn't make sense to have 1.1 for some coordinates, use 1.0 instead Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* nir: add bit_size parameter to system values with multiple allowed bit sizesKarol Herbst2019-01-215-9/+19
| | | | | | | | | v2: add assert to verify we have at least one valid bit_size v3: fix use of load_front_face in nir_lower_two_sided_color and tgsi_to_nir Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: add legal bit_sizes to intrinsicsKarol Herbst2019-01-214-13/+30
| | | | | | | | | | | | | | | | | | | With OpenCL some system values match the address bits, but in GLSL we also have some system values being 64 bit like subgroup masks. With this it is possible to adjust the builder functions so that depending on the bit_sizes the correct bit_size is used or an additional argument is added in case of multiple possible values. v2: validate dest bit_size v3: generate hex values in python code remove useless imports rename and move bit_sizes v4: add 1 to legal bit_sizes for front_face Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir/validate: allow to check against a bitmask of bit_sizesKarol Herbst2019-01-211-17/+17
| | | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: replace more nir_load_system_value calls with builder functionsKarol Herbst2019-01-215-14/+12
| | | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* glsl/lower_output_reads: set invariant and precise flags on temporariesKarol Herbst2019-01-211-0/+4
| | | | | | | | | | | | | | | | | fixes a couple of deqp tests (on nvc0 and potential other drivers): dEQP-GLES3.functional.shaders.invariance.highp.common_subexpression_1 dEQP-GLES3.functional.shaders.invariance.highp.common_subexpression_2 dEQP-GLES3.functional.shaders.invariance.highp.common_subexpression_3 dEQP-GLES3.functional.shaders.invariance.mediump.common_subexpression_1 dEQP-GLES3.functional.shaders.invariance.mediump.common_subexpression_2 dEQP-GLES3.functional.shaders.invariance.mediump.common_subexpression_3 dEQP-GLES3.functional.shaders.invariance.lowp.common_subexpression_1 dEQP-GLES3.functional.shaders.invariance.lowp.common_subexpression_2 dEQP-GLES3.functional.shaders.invariance.lowp.common_subexpression_3 CC: <[email protected]> Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: add missing CAPs for unsupported featuresRhys Kidd2019-01-202-0/+4
| | | | | Signed-off-by: Rhys Kidd <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nir/spirv: handle SpvStorageClassCrossWorkgroupKarol Herbst2019-01-195-0/+12
| | | | | | | | | | v2: rename nir_var_global to nir_var_mem_global Signed-off-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: rename nir_var_shared to nir_var_mem_sharedKarol Herbst2019-01-1911-23/+23
| | | | | | | | Signed-off-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: rename nir_var_ssbo to nir_var_mem_ssboKarol Herbst2019-01-1915-37/+37
| | | | | | | | Signed-off-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: rename nir_var_ubo to nir_var_mem_uboKarol Herbst2019-01-1910-14/+14
| | | | | | | | Signed-off-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: rename nir_var_function to nir_var_function_tempKarol Herbst2019-01-1928-81/+81
| | | | | | | | Signed-off-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* nir: rename nir_var_private to nir_var_shader_tempKarol Herbst2019-01-1917-29/+29
| | | | | | | | Signed-off-by: Karol Herbst <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* intel/genxml: add missing MI_PREDICATE compare operationsLionel Landwerlin2019-01-197-1/+12
| | | | | | | | Doesn't save us a great deal of lines but at least they get decoded in aubinators. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* anv: document cache flushes & invalidationsLionel Landwerlin2019-01-191-0/+67
| | | | | | | | | | | | | | | | | A little bit of explanation regarding how vkCmdPipelineBarrier() works. v2: Avoid referring to data port cache when it's actually sampler caches (Jason) Complete explanation for indirect draws (Jason) v3: s/samplers/sampler/ (Jason) s/UBOs/data port/ Add documentation for VK_ACCESS_CONDITIONAL_RENDERING_READ_BIT_EXT (Lionel) Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Eric Engestrom <[email protected]> (v1) Reviewed-by: Jason Ekstrand <[email protected]> (v2)
* anv: narrow flushing of the render target to buffer writesLionel Landwerlin2019-01-196-20/+15
| | | | | | | | | | | | | | In commit 9a7b3199037ac4 ("anv/query: flush render target before copying results") we tracked all the render target writes to apply a flushes in the vkCopyQueryResults(). But we can narrow this down to only when we write a buffer (which is the only input of vkCopyQueryResults). v2: Drop newer render target write flags introduce by 1952fd8d2ce905 ("anv: Implement VK_EXT_conditional_rendering for gen 7.5+") Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (v1)
* glsl: be much more aggressive when skipping shader compilationTimothy Arceri2019-01-192-6/+10
| | | | | | | | | | | | | | | | | | | | Currently we only add a cache key for a shader once it is linked. However games like Team Fortress 2 compile a whole bunch of shaders which are never actually linked. These compiled shaders can take up a bunch of memory. This patch changes things so that we add the key for the shader to the cache as soon as it is compiled. This means on a warm cache we can avoid the wasted memory from these shaders. Worst case scenario is we need to compile the shaders at link time but this can happen anyway if the shader has been evicted from the cache. Reduces memory use in Team Fortress 2 from 1.3GB -> 770MB on a warm cache from start up to the game menu. V2: only add key to cache when compilation is successful. Acked-by: Marek Olšák <[email protected]>
* intel/fs: Promote execution type to 32-bit when any half-float conversion is ↵Francisco Jerez2019-01-181-0/+21
| | | | | | | | | | | | | | | needed. The docs are fairly incomplete and inconsistent about it, but this seems to be the reason why half-float destinations are required to be DWORD-aligned on BDW+ projects. This way the regioning lowering pass will make sure that the destination components of W to HF and HF to W conversions are aligned like the corresponding conversion operation with 32-bit execution data type. Tested-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* ac/nir_to_llvm: fix interpolateAt* for arraysTimothy Arceri2019-01-191-19/+58
| | | | | | | | | | | This builds on the recent interpolate fix by Rhys ee8488ea3b99. This fixes the arb_gpu_shader5 interpolateAt* tests that contain arrays. Fixes: ee8488ea3b99 ("ac/nir,radv,radeonsi/nir: use correct indices for interpolation intrinsics") Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* Revert "glsl: be much more aggressive when skipping shader compilation"Timothy Arceri2019-01-192-10/+6
| | | | | | This reverts commit 64b8c86d37ebb1e1d286c69d642d52b7bcf051d3. Reverting for now as it was causing some segfaults.
* freedreno/a6xx: Turn on texture tiling by defaultKristian H. Kristensen2019-01-187-43/+64
| | | | | | | | The color swap isn't available for tiled formats and it's not needed either. We pick one channel order and use for all non-linear formats. Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno: Synchronize batch and flush for staging resourceKristian H. Kristensen2019-01-181-1/+15
| | | | | | | | Staging blit downloads would wait on the src resource instead of the staging resource and didn't make sure to submit the blit batch first. Signed-off-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* glsl: be much more aggressive when skipping shader compilationTimothy Arceri2019-01-192-6/+10
| | | | | | | | | | | | | | | | | | Currently we only add a cache key for a shader once it is linked. However games like Team Fortress 2 compile a whole bunch of shaders which are never actually linked. These compiled shaders can take up a bunch of memory. This patch changes things so that we add the key for the shader to the cache as soon as it is compiled. This means on a warm cache we can avoid the wasted memory from these shaders. Worst case scenario is we need to compile the shaders at link time but this can happen anyway if the shader has been evicted from the cache. Reduces memory use in Team Fortress 2 from 1.3GB -> 770MB on a warm cache from start up to the game menu. Acked-by: Marek Olšák <[email protected]>
* glsl: don't skip GLSL IR opts on first-time compilesTimothy Arceri2019-01-193-34/+2
| | | | | | | | | | | | | | This basically reverts c2bc0aa7b188. By running the opts we reduce memory using in Team Fortress 2 from 1.5GB -> 1.3GB from start-up to game menu. This will likely increase Deus Ex start up times as per commit c2bc0aa7b188. However currently 32bit games like Team Fortress 2 can run out of memory on low memory systems, so that seems more important. Reviewed-by: Marek Olšák <[email protected]>
* nir: check NIR_SKIP to skip passes by nameCaio Marcelo de Oliveira Filho2019-01-183-0/+40
| | | | | | | | | | | | | | Passes' function names, separated by comma, listed in NIR_SKIP environment variable will be skipped in debug mode. The mechanism is hooked into the _PASS macro, like NIR_PRINT. The extra macro NIR_SKIP is available as a developer convenience, to skip at pointer other than the passes entry points. v2: Fix typo in NIR_SKIP macro. (Bas) Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* anv: Implement VK_EXT_conditional_rendering for gen 7.5+Danylo Piliaiev2019-01-187-14/+265
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conditional rendering affects next functions: - vkCmdDraw, vkCmdDrawIndexed, vkCmdDrawIndirect, vkCmdDrawIndexedIndirect - vkCmdDrawIndirectCountKHR, vkCmdDrawIndexedIndirectCountKHR - vkCmdDispatch, vkCmdDispatchIndirect, vkCmdDispatchBase - vkCmdClearAttachments Value from conditional buffer is cached into designated register, MI_PREDICATE is emitted every time conditional rendering is enabled and command requires it. v2: by Jason Ekstrand - Use vk_find_struct_const instead of manually looping - Move draw count loading to prepare function - Zero the top 32-bits of MI_ALU_REG15 v3: Apply pipeline flush before accessing conditional buffer (The issue was found by Samuel Iglesias) v4: - Remove support of Haswell due to possible hardware bug - Made TMP_REG_PREDICATE and TMP_REG_DRAW_COUNT defines to define registers in one place. v5: thanks to Jason Ekstrand and Lionel Landwerlin - Workaround the fact that MI_PREDICATE_RESULT is not accessible on Haswell by manually calculating MI_PREDICATE_RESULT and re-emitting MI_PREDICATE when necessary. v6: suggested by Lionel Landwerlin - Instead of calculating the result of predicate once - re-emit MI_PREDICATE to make it easier to investigate error states. v7: suggested by Jason - Make anv_pipe_invalidate_bits_for_access_flag add CS_STALL if VK_ACCESS_CONDITIONAL_RENDERING_READ_BIT is set. v8: suggested by Lionel - Precompute conditional predicate's result to support secondary command buffers. - Make prepare_for_draw_count_predicate more readable. Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Implement VK_KHR_draw_indirect_count for gen 7+Danylo Piliaiev2019-01-182-0/+148
| | | | | | | | | | | | | | | v2: by Jason Ekstrand - Move out of the draw loop population of registers which aren't changed in it. - Remove dependency on ALU registers. - Clarify usage of PIPE_CONTROL - Without usage of ALU registers patch works for gen7+ v3: set pending_pipe_bits |= ANV_PIPE_RENDER_TARGET_WRITES Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* bin/meson-cmd-extract: Also handle cross and native filesDylan Baker2019-01-181-0/+11
| | | | | | | Native file support in command line serialization isn't present in meson 0.49, but will be for 0.49.1 and 0.50 Reviewed-by: Eric Engestrom <[email protected]>
* anv: Re-sort the extensions listJason Ekstrand2019-01-181-6/+6
| | | | | | I like to keep things in good order so that you can find them. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/fs: Don't touch accumulator destination while applying regioning ↵Jason Ekstrand2019-01-181-1/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | alignment rule In some shaders, you can end up with a stride in the source of a SHADER_OPCODE_MULH. One way this can happen is if the MULH is acting on the top bits of a 64-bit value due to 64-bit integer lowering. In this case, the compiler will produce something like this: mul(8) acc0<1>UD g5<8,4,2>UD 0x0004UW { align1 1Q }; mach(8) g6<1>UD g5<8,4,2>UD 0x00000004UD { align1 1Q AccWrEnable }; The new region fixup pass looks at the MUL and sees a strided source and unstrided destination and determines that the sequence is illegal. It then attempts to fix the illegal stride by replacing the destination of the MUL with a temporary and emitting a MOV into the accumulator: mul(8) g9<2>UD g5<8,4,2>UD 0x0004UW { align1 1Q }; mov(8) acc0<1>UD g9<8,4,2>UD { align1 1Q }; mach(8) g6<1>UD g5<8,4,2>UD 0x00000004UD { align1 1Q AccWrEnable }; Unfortunately, this new sequence isn't correct because MOV accesses the accumulator with a different precision to MUL and, instead of filling the bottom 32 bits with the source and zeroing the top 32 bits, it leaves the top 32 (or maybe 31) bits alone and full of garbage. When the MACH comes along and tries to complete the multiplication, the result is correct in the bottom 32 bits (which we throw away) and garbage in the top 32 bits which are actually returned by MACH. This commit does two things: First, it adds an assert to ensure that we don't try to rewrite accumulator destinations of MUL instructions so we can avoid this precision issue. Second, it modifies required_dst_byte_stride to require a tightly packed stride so that we fix up the sources instead and the actual code which gets emitted is this: mov(8) g9<1>UD g5<8,4,2>UD { align1 1Q }; mul(8) acc0<1>UD g9<8,8,1>UD 0x0004UW { align1 1Q }; mach(8) g6<1>UD g5<8,4,2>UD 0x00000004UD { align1 1Q AccWrEnable }; Fixes: efa4e4bc5fc "intel/fs: Introduce regioning lowering pass" Reviewed-by: Francisco Jerez <[email protected]>
* intel/eu: Stop overriding exec sizes in send_indirect_messageJason Ekstrand2019-01-181-3/+0
| | | | | | | | For a long time, we based exec sizes on destination register widths. We've not been doing that since 1ca3a9442760b6f7 but a few remnants accidentally remained. Reviewed-by: Anuj Phogat <[email protected]>
* radv: initialize the per-queue descriptor BO only onceSamuel Pitoiset2019-01-181-24/+23
| | | | | | | Totally useless to write the descriptors inside the loop. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: do not write unused descriptors to the per-queue BOSamuel Pitoiset2019-01-181-124/+128
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: reduce size of the per-queue descriptor BOSamuel Pitoiset2019-01-181-1/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: drop unused code related to 16 sample locationsSamuel Pitoiset2019-01-183-13/+0
| | | | | | | The driver only supports up to 8 sample locations. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* gm107/ir: disable TEXS for tex with derivAll setKarol Herbst2019-01-182-1/+3
| | | | | | | | | | | | | | | | | | | | | | | fixes deqp tests: dEQP-GLES3.functional.shaders.texture_functions.texturegrad.samplercube_fixed_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.samplercube_float_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.isamplercube_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.usamplercube_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.sampler3d_fixed_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.sampler3d_float_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.isampler3d_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.usampler3d_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.sampler2dshadow_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.sampler3d_fixed_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.sampler3d_float_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.isampler3d_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.usampler3d_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.sampler2dshadow_vertex Fixes: f821e80213e38e93f96255b3deacb737a600ed40 "gm107/ir: use scalar tex instructions where possible" Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: disable tryCollapseChainedMULs in ConstantFolding for precise ↵Karol Herbst2019-01-181-1/+1
| | | | | | | | | | instructions fixes dEQP-GLES2.functional.shaders.invariance.mediump.loop_3 CC: <[email protected]> Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nir: Account for atomics in copy propagation.Bas Nieuwenhuizen2019-01-181-1/+24
| | | | | | | | | | | Otherwise writes get propagated across atomics if no barrier is used. Without barrier writes should still be visible in the same invocation, so an atomic has to be considered a write. CC: <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Fixes: b3c61469255 "nir: Copy propagation between blocks" Fixes: 62332d139c8 "nir: Add a local variable-based copy propagation pass"
* anv/tests: Adding test for the state_pool padding.Rafael Antognolli2019-01-172-1/+75
| | | | | | | Add a test that checks that we can use the extra space allocated for padding while allocating larger anv_states. Reviewed-by: Jason Ekstrand <[email protected]>
* anv/allocator: Add support for non-userptr.Rafael Antognolli2019-01-171-46/+71
| | | | | | | | | | | | | If softpin is supported, create new BOs for the required size and add the respective BO maps. The other main change of this commit is that anv_block_pool_map() now returns the map for the BO that the given offset is part of. So there's no block_pool->map access anymore (when softpin is used. v3: - set fd to -1 on softpin case (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Remove state flush.Rafael Antognolli2019-01-1710-51/+2
| | | | | | | We have all the state buffers snooped, so we don't need to clflush everything anymore. Reviewed-by: Jason Ekstrand <[email protected]>
* anv/allocator: Enable snooping on block pool and anv_bo_pool BOs.Rafael Antognolli2019-01-171-10/+16
| | | | | | | | | | | | | | | | | | We are not going to use userptr for anv block pool BOs anymore. However, so far we have been relying on the fact that userptr BOs are snooped on non-llc platforms. Let's make sure that the block pool BOs are still snooped, and we can also remove the clflush'ing that we do on all state buffers. And since we plan to remove the flushes, set the anv_bo_pool BOs to cached (snooped on non-LLC platforms) too. For LLC platforms, they are all cached by default, so this becomes a no-op. v5: - Add snooping to anv_bo_pool BOs too (Jason). - Remove anv_gem_set_domain. Reviewed-by: Jason Ekstrand <[email protected]>
* anv/allocator: Add padding information.Rafael Antognolli2019-01-173-10/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's possible that we still have some space left in the block pool, but we try to allocate a state larger than that state. This means such state would start somewhere within the range of the old block_pool, and end after that range, within the range of the new size. That's fine when we use userptr, since the memory in the block pool is CPU mapped continuously. However, by the end of this series, we will have the block_pool split into different BOs, with different CPU mapping ranges that are not necessarily continuous. So we must avoid such case of a given state being part of two different BOs in the block pool. This commit solves the issue by detecting that we are growing the block_pool even though we are not at the end of the range. If that happens, we don't use the space left at the end of the old size, and consider it as "padding" that can't be used in the allocation. We update the size requested from the block pool to take the padding into account, and return the offset after the padding, which happens to be at the start of the new address range. Additionally, we return the amount of padding we used, so the caller knows that this happens and can return that padding back into a list of free states, that can be reused later. This way we hopefully don't waste any space, but also avoid having a state split between two different BOs. v3: - Calculate offset + padding at anv_block_pool_alloc_new (Jason). v4: - Remove extra "leftover". Reviewed-by: Jason Ekstrand <[email protected]>
* anv/allocator: Rework chunk return to the state pool.Rafael Antognolli2019-01-171-23/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit tries to rework the code that split and returns chunks back to the state pool, while still keeping the same logic. The original code would get a chunk larger than we need and split it into pool->block_size. Then it would return all but the first one, and would split that first one into alloc_size chunks. Then it would keep the first one (for the allocation), and return the others back to the pool. The new anv_state_pool_return_chunk() function will take a chunk (with the alloc_size part removed), and a small_size hint. It then splits that chunk into pool->block_size'd chunks, and if there's some space still left, split that into small_size chunks. small_size in this case is the same size as alloc_size. The idea is to keep the same logic, but make it in a way we can reuse it to return other chunks to the pool when we are growing the buffer. v2: - Include Jason's suggestions to the algorithm that returns chunks. - Update comments. v3: - Disallow returning 0 blocks (Jason). - fix min_size in the loop (Jason). - remove temporary variables (Jason) v4: - return_chunk() should never return blocks larger than pool->block_size. Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Remove some asserts.Rafael Antognolli2019-01-171-3/+0
| | | | | | | They won't be true anymore once we add support for multiple BOs with non-userptr. Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Validate the list of BOs from the block pool.Rafael Antognolli2019-01-171-5/+49
| | | | | | | | | | | | | | | | | | | We now have multiple BOs in the block pool, but sometimes we still reference only the first one in some instructions, and use relative offsets in others. So we must be sure to add all the BOs from the block pool to the validation list when submitting commands. v2: - Don't add block pool BOs to the dependency list right before execbuf (Jason) - Call anv_execbuf_add_bo() to each BO in the block pools (Jason) - Use anv_execbuf_add_bo_set() to add surface state dependencies to execbuf. v3: - Add comment to the non-softpin case (Jason). Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Split code to add BO dependencies to execbuf.Rafael Antognolli2019-01-171-23/+39
| | | | | | | | | This part of the anv_execbuf_add_bo() code is totally independent of the BO being added. Let's split it out, so we can reuse it later. v3: rename to anv_execbuf_add_bo_set (Jason). Reviewed-by: Jason Ekstrand <[email protected]>
* anv/allocator: Add support for a list of BOs in block pool.Rafael Antognolli2019-01-172-11/+59
| | | | | | | | | | | | | | | | | | | So far we use only one BO (the last one created) in the block pool. When we switch to not use the userptr API, we will need multiple BOs. So add code now to store multiple BOs in the block pool. This has several implications, the main one being that we can't use pool->map as before. For that reason we update the getter to find which BO a given offset is part of, and return the respective map. v3: - Simplify anv_block_pool_map (Jason). - Use fixed size array for anv_bo's (Jason) v4: - Respect the order (item, container) in anv_block_pool_foreach_bo (Jason). Reviewed-by: Jason Ekstrand <[email protected]>