summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* util/os_file: actually return the error read() gave usEric Engestrom2019-06-091-1/+3
| | | | | | Fixes: 316964709e21286c2af5 "util: add os_read_file() helper" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* virgl: Work around possible memory exhaustionAlexandros Frantzis2019-06-073-3/+22
| | | | | | | | | | | | | | | | | | | | | Since we don't normally flush before performing copy transfers, it's possible in some scenarios to use too much memory for staging resources and start failing. This can happen either because we exhaust the total available memory (including system memory virtio-gpu swaps out to), or, more commonly, because the total size of resources in a command buffer doesn't fit in virtio-gpu video memory. To reduce the chances of this happening, force a flush before a copy transfer if the total size of queued staging resources exceeds a certain limit. Since after a flush any queued staging resources will be eventually released, this ensures both that each command buffer doesn't require too much video memory, and that we don't end up consuming too much memory for staging resources in total. Fixes kernel errors reported when running texture_upload tests in glbench. Signed-off-by: Alexandros Frantzis <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: Remove incorrect resource wait conditionAlexandros Frantzis2019-06-071-13/+0
| | | | | | | | | | Now that we have copy transfers in place, we can remove the incorrect resource wait condition. Copy transfers and other optimizations minimize the performance impact of this removal, while providing the correct behavior. Signed-off-by: Alexandros Frantzis <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: Use copy transfers for texturesAlexandros Frantzis2019-06-072-9/+87
| | | | | | | | | | Extend copy transfers to also be used for busy textures. Performance results: Unigine Valley, qemu before: 22.7 FPS after: 23.1 FPS Signed-off-by: Alexandros Frantzis <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: Use buffer copy transfers to avoid waiting when mappingAlexandros Frantzis2019-06-076-6/+137
| | | | | | | | | | | | | | | | | | | | We typically need to wait for a buffer to become ready before mapping, so that we don't write new contents while the host is still using the old contents. However, if we are allowed to discard the contents of the mapped buffer range, then we can avoid waiting by using a staging buffer range which we guarantee to never be busy, copying from the staging buffer range to the target buffer in the host. This commit implements this optimization by utilizing a dedicated u_upload_mgr for the staging buffer. Performance results: Twilight Struggle (Steam/Proton), qemu before: 7 FPS after: 25 FPS glmark2 ubo, qemu before: 38 FPS after: 331 FPS Signed-off-by: Alexandros Frantzis <[email protected]> Suggested-by: Gurchetan Singh <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: Support copy transfersAlexandros Frantzis2019-06-075-5/+70
| | | | | | | | | | | | | | | Support transfers that use a different resource as the source of data to transfer. This will be used in upcoming commits to send data to host buffers through a transfer upload buffer, in order to avoid waiting when the buffer resource is busy. Note that we don't support queueing copy transfers in the transfer queue. Copy transfers should be emitted directly in the command queue, allowing us to avoid flushes before them and leads to better performance. Signed-off-by: Alexandros Frantzis <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: Add copy_transfer3d definitionsAlexandros Frantzis2019-06-072-0/+9
| | | | | | | | | | Introduce definitions for the copy_transfer3d protocol command and virgl capability. This command transfers data to the host by copying through another resource, and will be used in upcoming commits to avoid waiting when transferring data for busy resources. Signed-off-by: Alexandros Frantzis <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: Make VIRGL_BIND_STAGING resources cacheableAlexandros Frantzis2019-06-072-2/+4
| | | | | | | | This could help performance when trying to recreate such resources for copy transfers. Signed-off-by: Alexandros Frantzis <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: Support VIRGL_BIND_STAGINGAlexandros Frantzis2019-06-073-4/+16
| | | | | | | | | Support a new virgl bind type for staging buffers which don't require dedicated host-side storage. These will be used to implement copy transfers. Signed-off-by: Alexandros Frantzis <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: Avoid unfinished transfer_get with PIPE_TRANSFER_DONTBLOCKAlexandros Frantzis2019-06-071-9/+12
| | | | | | | | | | | | | If we are not allowed to block, and we know that we will have to wait, either because the resource is busy, or because it will become busy due to a readback, return early to avoid performing an incomplete transfer_get. Such an incomplete transfer_get may finish at any time, during which another unsynchronized map could write to the resource contents, leaving the contents in an undefined state. Signed-off-by: Alexandros Frantzis <[email protected]> Suggested-by: Chia-I Wu <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: Deduplicate checks for resource cachingAlexandros Frantzis2019-06-074-20/+14
| | | | | | | | | | | Also fixes a missed check for VIRGL_BIND_CUSTOM in one of the duplicate code snippets. Note that legacy fences also use VIRGL_BIND_CUSTOM, but we ensured they don't go through the cache in the previous commit. Signed-off-by: Alexandros Frantzis <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: Don't try to use cached resources for legacy fencesAlexandros Frantzis2019-06-072-6/+12
| | | | | | | | Resources for fences should not be from the cache, since we are basing the fence status on the resource creation busy status. Signed-off-by: Alexandros Frantzis <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: More info about chosen alignment valueAlexandros Frantzis2019-06-071-0/+5
| | | | | | | Add more info about why the value of VIRGL_MAP_BUFFER_ALIGNMENT. Signed-off-by: Alexandros Frantzis <[email protected]> Reviewed-by: Chia-I Wu <[email protected]>
* virgl: store all info about atomic buffersChia-I Wu2019-06-072-16/+23
| | | | | | | | | We will need the full info. This also speeds up virgl_attach_res_atomic_buffers and fixes resource leaks when the context is destroyed. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]>
* virgl: add shader images to virgl_shader_binding_stateChia-I Wu2019-06-072-14/+27
| | | | | | | It replaces virgl_context::images. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]>
* virgl: add SSBOs to virgl_shader_binding_stateChia-I Wu2019-06-072-14/+26
| | | | | | | It replaces virgl_context::ssbos. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]>
* virgl: add UBOs to virgl_shader_binding_stateChia-I Wu2019-06-072-20/+37
| | | | | | | It replaces virgl_context::ubos. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]>
* virgl: add virgl_shader_binding_stateChia-I Wu2019-06-072-43/+44
| | | | | | | | | | | | | | virgl_shader_binding_state will be used to manage all per-stage shader bindings. For now, it manages only sampler views. This replaces virgl_textures_info and fixes some issues - start_slot is now honored - views outside of [start_slot, slart_slot+count) are unmodified - views are released when the context is destroyed Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]>
* iris: Zero shs->cbuf0 when binding a passthrough TCSKenneth Graunke2019-06-071-0/+16
| | | | | | Fixes valgrind errors when running two CTS tests back to back: - KHR-GL45.shader_image_load_store.basic-allTargets-loadStoreT* (The first test has an actual TCS, the second uses passthrough.)
* intel/blorp: Only double the fast-clear rect alignment on HSWJason Ekstrand2019-06-071-10/+15
| | | | | | | | | This restriction was accidentally added to the BSpec/PRM as an unrestricted restriction starting with the HSW docs and it was never removed. However, it only ever applied to HSW and actually potentially causes problems on BDW and above where we have mipmapped fast-clears. Reviewed-by: Nanley Chery <[email protected]>
* freedreno/a6xx: re-arrange program stageobj/groupRob Clark2019-06-074-30/+58
| | | | | | | | | | | | | | Split out a separate program config state group to run early before the other groups. This seems to help w/ intermittent "missed tiles" (although I had assumed that was a mem2gmem issue), or at least I can't reproduce that issue with this patch, but can without. It has the benefit of HLSQ_VS_CNTL.CONSTLEN matching for VS and BS. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: fix hangs with newer sqe fwRob Clark2019-06-071-32/+81
| | | | | | | | | | | | | | | | | | | With the newer (v1.76) fw, we were getting hangs (compared to older v1.66 fw). Re-work the GMEM code to structure things a bit closer to the blob. This moves some PKT7 packets from IB2 to IB1, which I think is what was confusing SQE and causing it to get stuck in an infinite loop. But in general structuring things at least closer to the same way blob does makes it easier to compare cmdstream. Note: this is a bit on the large side for what I'd normally consider for stable.. but right now it is looking like it is the newer fw that is headed for linux-firmware. This should defn have some soak time on master, but probably a good idea for this patch to end up in distro mesa builds by the time a630_sqe.fw hits linux-firmware. Cc: [email protected] Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: WFI before RB_CCU_CNTL writesRob Clark2019-06-072-0/+4
| | | | | | | | | | | | This seems to be in a block of non buffered/context regs. Blob always WFIs before write, so probably a good idea. Annoyingly, compared to ealier gens, it is a bit harder to tell from the register offset whether it is a buffered reg, it isn't as simple as everything below 0x2000, it seems. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: don't pre-dispatch texture fetch on accidentRob Clark2019-06-071-1/+4
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: fix issues with gallium HUDRob Clark2019-06-071-5/+8
| | | | | | | | | | | | | In some cases the draw for the text wasn't working. This seems to be fixed by resyncing some of the "golded registers" from blob (initial values were based on somewhat older blob version). Perhaps good to have a bit of soak time on master, but would be good to eventually land in 19.x stable branches. Cc: [email protected] Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* anv/cmd_buffer: Initalize the clear color struct for CNL+Nanley Chery2019-06-071-13/+7
| | | | | | | | | | | | | | On CNL+, the clear color struct is composed of RGBA channel values and fields which are either reserved by the HW or used to control fast-clears. Currently anv initializes the channel values to zero and allows the other fields to be undefined. Satisfy the MBZ field requirements by removing an optimization that doesn't hold true for CNL+ and pulling in the number of dwords to initialize from ISL. Cc: <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* glx/windows: Fix compilation with -Werror-formatJon Turney2019-06-072-5/+5
| | | | | | | | | | | | | | | | | | | | Fix compilation where the DWORD type is used with a format, after -Werror-format added by c9c1e261. Some Win32 API types are different fundamental types in the 32-bit and 64-bit versions. This problem is then further compounded by the fact that whilst both 32-bit Cygwin and 32-bit MinGW use the ILP32 data model, 64-bit MinGW uses the LLP64 data model, but 64-bit Cygwin uses the LP64 data model. This makes it near impossible to write printf format specifiers which are correct for all those targets. In the Win32 API, DWORD is an unsigned, 32-bit type. So, it is defined in terms of an unsigned long, except in the LP64 data model used by 64-bit Cygwin, where it is an unsigned int. It should always be safe to cast it to unsigned int and use %u or %x. Reviewed-by: Eric Anholt <[email protected]>
* iris: Rename bind_state to bind_shader_state.Kenneth Graunke2019-06-071-9/+9
| | | | | bind_state is possibly the worst name ever. For create, we used create_shader_state, which is more descriptive. Put shader in the name.
* isl: Mark enum isl_channel_select packed so it becomes 1 byte.Kenneth Graunke2019-06-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I recently discovered that the following code lead to valgrind errors: struct isl_swizzle swizzle = ISL_SWIZZLE_IDENTITY; VALGRIND_CHECK_MEM_IS_DEFINED(&swizzle, sizeof(swizzle)); which is surprising, because struct isl_swizzle is simply: struct isl_swizzle { enum isl_channel_select r:4; enum isl_channel_select g:4; enum isl_channel_select b:4; enum isl_channel_select a:4; }; and the above code initializes all of them with a C99 initializer. Iván Briano reminded me that C99 initializers don't necessarily zero padding. A quick inspection revealed that sizeof(struct isl_swizzle) was 4 (rather than the expected 2). Ian Romanick suggested changing it to uint16_t, since this is essentially dicing up an unsigned, and that worked. This patch marks enum isl_channel_select packed, changing its size from 4 bytes to 1 byte. This then makes struct isl_swizzle 2 bytes, with no bogus padding fields. This eliminates valgrind undefined memory warnings. These isl_swizzle values become part of our BLORP blit program keys, which are then hashed. This undefined padding was being included in the hashing, possibly leading to issues. I originally saw this error when running KHR-GL45.texture_size_promotion.functional in iris under valgrind. Reviewed-by: Jason Ekstrand <[email protected]>
* panfrost/ci: Texture wrap tests are legitimately fixedAlyssa Rosenzweig2019-06-071-58/+0
| | | | | | These depended on the wallpaper reload. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Lower inot to inor with 0Alyssa Rosenzweig2019-06-071-1/+2
| | | | | | | We were previously lowering to inand, but the second arg was not duplicated so inot would always return ~0. Oops. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Cleanup tag fetch in disassemblerAlyssa Rosenzweig2019-06-071-2/+3
| | | | | | Trivial. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Use fancy iteratorAlyssa Rosenzweig2019-06-071-1/+1
| | | | | | Trivial cleanup. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Cull dead branchesAlyssa Rosenzweig2019-06-072-2/+31
| | | | | | This fixes bugs with complex control flow. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Add mir_print_bundle helperAlyssa Rosenzweig2019-06-072-0/+14
| | | | | | This helps with debugging scheduling/emission. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard/disasm: Pretty-print branch tagsAlyssa Rosenzweig2019-06-071-7/+34
| | | | | | Just makes it a little more obvious what's going on. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/ci: Note some since-fixed testsAlyssa Rosenzweig2019-06-071-26/+0
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Vectorize I/OAlyssa Rosenzweig2019-06-073-7/+18
| | | | | | | | | This uses the new mesa/st functionality for NIR I/O vectorization, which eliminates a number of corner cases (resulting in assorted dEQP failures and regressions) and should improve performance substantial due to lessened pressure on the load/store pipe. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Remove varyings delay passAlyssa Rosenzweig2019-06-072-75/+9
| | | | | | | | This pass interfered with the more delicate path required for non-vectorized I/O. It's also ugly and duplicating the job of an actual honest-to-goodness scheduler. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Apply component to load_inputAlyssa Rosenzweig2019-06-071-0/+4
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* nir: fix s/&&/||/ typoEric Engestrom2019-06-071-1/+1
| | | | | | Fixes: cd73b6174b093b75f581 "nir/lower_to_source_mods: Stop turning add, sat, and neg into mov" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* freedreno/a6xx: Drop struct stage arrayKristian H. Kristensen2019-06-071-144/+80
| | | | | | | | | | | | | | | This now boils down to just picking between binning or vertex shader and dummy_fs or real fs, which we can do in a couple of lines of code instead. The constlen logic isn't doing what it thinks it's doing, both constlens at this point MAX2(s[VS].constlen, align(state->bs->constlen, 4)); are binning shader constlens. We'll have to revisit the constlen logic, but this commit doesn't change how it works. Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/a6xx: Drop support for SS6_DIRECT shader uploadKristian H. Kristensen2019-06-071-30/+3
| | | | | | | a6xx only supports indirect shaders. Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/a6xx: Share shader_t_to_opcodeKristian H. Kristensen2019-06-073-35/+21
| | | | | | | | We have a similar function in fd6_program.c. Move to fd6_emit.h and share. Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/a6xx: Consolidate more of dword 0 building in fd6_draw_vboKristian H. Kristensen2019-06-071-31/+24
| | | | | | | | | There's already a bit of duplicated logic here and tessellation will add more. Build up dword 0 in fd6_draw_vbo() and drop the a4xx in the process. Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno: Move fd4_size2indextype() helper to freedreno_util.hKristian H. Kristensen2019-06-072-13/+13
| | | | | | | In preparation for refactoring fd6_draw.c a bit. Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* radv: enable VK_EXT_sample_locationsSamuel Pitoiset2019-06-072-9/+1
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* radv: enable HTILE for images that might need variable sample locationsSamuel Pitoiset2019-06-071-7/+0
| | | | | | | This is now supported. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* radv: handle sample locations during automatic layout transitionsSamuel Pitoiset2019-06-072-18/+168
| | | | | | | | | | | | | | | | From the Vulkan spec 1.1.109: "Some implementations may need to evaluate depth image values while performing image layout transitions. To accommodate this, instances of the VkSampleLocationsInfoEXT structure can be specified for each situation where an explicit or automatic layout transition has to take place. [...] and VkRenderPassSampleLocationsBeginInfoEXT can be chained from VkRenderPassBeginInfo to provide sample locations for layout transitions performed implicitly by a render pass instance." Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>
* radv: determine the first subpass id for every attachmentsSamuel Pitoiset2019-06-072-1/+20
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-By: Bas Nieuwenhuizen <[email protected]>