aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* panfrost: Remove link stage for jobsTomeu Vizoso2019-05-312-68/+54
| | | | | | | | | | And instead, link them as they are added. Makes things a bit clearer and prepares future work such as FB reload jobs. Signed-off-by: Tomeu Vizoso <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* panfrost: ci: Switch to kernel 5.2-rc2Tomeu Vizoso2019-05-311-4/+3
| | | | | Signed-off-by: Tomeu Vizoso <[email protected]> Acked-by: Alyssa Rosenzweig <[email protected]>
* panfrost: ci: Update expectationsTomeu Vizoso2019-05-311-8/+3
| | | | | | | A bunch of tests have been fixed, but some regressions have appeared on T760. Signed-off-by: Tomeu Vizoso <[email protected]>
* radeonsi/nir: Remove hack for builtinsConnor Abbott2019-05-311-11/+2
| | | | | | | | | | We now bounds check properly in the uniform loading fast path, so there's no need to disable it by pretending there are other UBO bindings in use. The way this looks at the variable name was causing problems when two piglit shaders, one with a name that triggered the hack and one that didn't, got hashed to the same thing after stripping out the names. Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi/nir: Use correct location for uniform access boundConnor Abbott2019-05-311-1/+1
| | | | | | | | | location is the API-level location, but driver_location is the actual location the uniform gets passed to the driver. This apparently only caused failures with builtins, where the location is 0 because it's represented via the state tokens instead. Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi/nir: Correctly handle double TCS/TES varyingsConnor Abbott2019-05-311-4/+28
| | | | | | | | | | | ac expands the store to 32-bit components for us, but we still have to deal with storing up to 8 components, and when a varying is split across two vec4 slots we have to calculate the address again for the second slot, since they aren't adjacent in memory. I didn't do this on the ac level because we should generate better indexing arithmetic for the lds store, where slots are contiguous. Reviewed-by: Timothy Arceri <[email protected]>
* etnaviv: blt: s/TRUE/true && s/FALSE/falseChristian Gmeiner2019-05-311-6/+6
| | | | Signed-off-by: Christian Gmeiner <[email protected]>
* etnaviv: rs: s/TRUE/true && s/FALSE/falseChristian Gmeiner2019-05-311-8/+8
| | | | Signed-off-by: Christian Gmeiner <[email protected]>
* swr/rast: Enable ARB_GL_texture_buffer_rangeJan Zielinski2019-05-301-1/+1
| | | | | | | | No significant changes in the code needed to enable the extension. Just updating SWR capabilities and the documentation Reviewed-by: Alok Hota <[email protected]>
* swr/rast: fix 32-bit compilation on LinuxJan Zielinski2019-05-301-65/+0
| | | | | | | Removing unused but problematic code from simdlib header to fix compilation problem on 32-bit Linux. Reviewed-by: Alok Hota <[email protected]>
* iris: Avoid holding the lock while allocating pages.Kenneth Graunke2019-05-301-5/+5
| | | | | | | | | | | | | | | | | We only need the lock for: 1. Rummaging through the cache 2. Allocating VMA We don't need it for alloc_fresh_bo(), which does GEM_CREATE, and also SET_DOMAIN to allocate the underlying pages. The idea behind calling SET_DOMAIN was to avoid a lock in the kernel while allocating pages, now we avoid our own global lock as well. We do have to re-lock around VMA. Hopefully this shouldn't happen too much in practice because we'll find a cached BO in the right memzone and not have to reallocate it. Reviewed-by: Chris Wilson <[email protected]>
* iris: Move SET_DOMAIN to alloc_fresh_bo()Kenneth Graunke2019-05-301-17/+15
| | | | | | | | Chris pointed out that the order between SET_DOMAIN and SET_TILING doesn't matter, so we can just do the page allocation when creating a new BO. Simplifies the flow a bit. Reviewed-by: Chris Wilson <[email protected]>
* iris: Be lazy about cleaning up purged BOs in the cache.Kenneth Graunke2019-05-291-17/+1
| | | | | | | | | | | | | | | | | | | | | | Mathias Fröhlich reported that commit 6244da8e23e5470d067680 crashes. list_for_each_entry_safe is safe against removing the current entry, but iris_bo_cache_purge_bucket was potentially removing next entries too, which broke our saved next pointer. To fix this, don't bother with the iris_bo_cache_purge_bucket step. We just detected a single entry where the kernel has purged the BO's memory, and so it isn't a usable entry for our cache. We're about to continue the search with the next BO. If that one's purged, we'll clean it up too. And so on. We may miss cleaning up purged BOs that are further down the list after non-purged BOs...but that's probably fine. We still have the time-based cleaner (cleanup_bo_cache) which will take care of them eventually, and the kernel's already freed their memory, so it's not that harmful to have a few kicking around a little longer. Fixes: 6244da8e23e iris: Dig through the cache to find a BO in the right memzone Reviewed-by: Chris Wilson <[email protected]>
* iris: Dig through the cache to find a BO in the right memzoneKenneth Graunke2019-05-291-7/+17
| | | | | | | | | | | | | | | | This saves some util_vma thrash when the first entry in the cache happens to be in a different memory zone, but one just a tiny bit ahead is already there and instantly reusable. Hopefully the cost of a little extra searching won't break the bank - if it does, we can consider having separate list heads or keeping a separate VMA cache. Improves OglDrvRes performance by 22%, restoring a regression from deleting the bucket allocators in 694d1a08d3e5883d97d5352895f8431f. Thanks to Clayton Craft for alerting me to the regression. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* iris: Tidy BO sizing code and commentsKenneth Graunke2019-05-291-12/+5
| | | | | | Buckets haven't been power of two sized in over a decade. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* iris: Move some field setting after we drop the lock.Kenneth Graunke2019-05-291-13/+13
| | | | | | It's not much, but we may as well hold the lock for a bit less time. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* iris: Move cached BO allocation into a helper function.Kenneth Graunke2019-05-291-44/+64
| | | | | | | There's enough going on here to warrant a helper. This also simplifies the control flow and eliminates the last non-error-case goto. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* iris: Fall back to fresh allocations of mapping for zero-memset fails.Kenneth Graunke2019-05-291-3/+4
| | | | | | | | | | | | | | | It is unlikely that we would fail to map a cached BO in order to zero its contents. When we did, we would free the first BO in the cache and try again with the second. It's possible that this next BO already had a map setup, in which case we'd succeed. But if it didn't, we'd likely fail again in the same manner. There's not much point in optimizing this case (and frankly, if we're out of CPU-side VMA we should probably dump the cache entirely)...so instead, just fall back to allocating a fresh BO from the kernel which will already be zeroed so we don't have to try and map it. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* iris: Move fresh BO allocation into a helper function.Kenneth Graunke2019-05-291-26/+30
| | | | | | There's enough going on here to warrant a helper. More cleaning coming. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* iris: Do SET_TILING at a single point rather than in two places.Kenneth Graunke2019-05-291-20/+20
| | | | | | | | | | | | | | | | | | Both the from-cache and fresh-from-GEM cases were calling SET_TILING. In the cached case, we would retry the allocation on failure, pitching one BO from the cache each time. This is silly, because the only time it should fail is if the tiling or stride parameters are unacceptable, which has nothing to do with the particular BO in question. So there's no point in retrying - we should simply fail the allocation. This patch moves both calls to bo_set_tiling_internal() below the cache/fresh split, so we have it at a single point in time instead of two. To preserve the ordering between SET_TILING and SET_DOMAIN, we move that below as well. (I am unsure if the order matters.) Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* iris: Use the BO cache even for coherent buffers on non-LLC.Kenneth Graunke2019-05-291-3/+0
| | | | | | | | | | | We mark snooped BOs as non-reusable, so we never return them to the cache. This means that we'd need to call I915_GEM_SET_CACHING to make any BO we find in the cache snooped. But then again, any BO we freshly allocate from the kernel will also be non-snooped, so it has the same issue. There's really no reason to skip the cache - we may as well use it to avoid the I915_GEM_CREATE overhead. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* iris: Fix locking around vma_alloc in iris_bo_create_userptrKenneth Graunke2019-05-291-0/+4
| | | | | | | util_vma needs to be protected by a lock. All other callers of vma_alloc and vma_free appear to be holding a lock already. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* iris: Fix lock/unlock mismatch for non-LLC coherent BO allocation.Kenneth Graunke2019-05-291-7/+3
| | | | | | | | The goto jumped over the mtx_lock, but proceeded to hit the mtx_unlock. We can simply set the bucket to NULL and it will skip the cache without goto, and without messing up locking. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* radeonsi: fix timestamp queries for compute-only contextsMarek Olšák2019-05-291-3/+5
| | | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Jan Vesely <[email protected]>
* Change a few frequented uses of DEBUG to !NDEBUGMarek Olšák2019-05-294-7/+7
| | | | | | | | debugoptimized builds don't define NDEBUG, but they also don't define DEBUG. We want to enable cheap debug code for these builds. I only chose those occurences that I care about. Reviewed-by: Mathias Fröhlich <[email protected]>
* iris: Re-emit Surface State Base Address when context is lost.Kenneth Graunke2019-05-291-0/+1
| | | | | | | When we hit a GPU hang, we failed to reset Surface State Base Address right away, and would keep hanging until we filled up the binder. Then we'd finally get it right after a lot of repeated stumbles. Update it right away so we hopefully hang fewer times before succeeding.
* iris: Enable nir_opt_large_constantsJason Ekstrand2019-05-294-0/+82
| | | | | | | | | | | | | | | | | | | | | | Shader-db results on Kaby Lake: total instructions in shared programs: 15306230 -> 15304726 (<.01%) instructions in affected programs: 4570 -> 3066 (-32.91%) helped: 16 HURT: 0 total cycles in shared programs: 361703436 -> 361680041 (<.01%) cycles in affected programs: 129388 -> 105993 (-18.08%) helped: 16 HURT: 0 LOST: 0 GAINED: 2 The helped programs were in XCom 2, Deus Ex: Mankind Divided, and Kerbal Space Program Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Don't assume UBO indices are constantJason Ekstrand2019-05-291-1/+2
| | | | | | | | | It will be true for the constant/system value buffer because they use a constant zero but it's not true in general. If we ever got here when the source wasn't constant, nir_src_as_uint would assert. Reviewed-by: Kenneth Graunke <[email protected]> Cc: [email protected]
* iris: Move upload_ubo_ssbo_surf_state to iris_program.cJason Ekstrand2019-05-293-39/+56
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* svga: clamp max_const_buffers to SVGA_MAX_CONST_BUFSBrian Paul2019-05-291-1/+2
| | | | | | In case the device reports 15 (or more) buffers. Reviewed-by: Charmaine Lee <[email protected]>
* iris: Clone before calling nir_strip and serializingKenneth Graunke2019-05-291-6/+8
| | | | | | This is non-destructive and leaves the debugging information in place. Reviewed-by: Jason Ekstrand <[email protected]>
* iris: Only store the SHA1 of the NIR in iris_uncompiled_shaderKenneth Graunke2019-05-293-13/+7
| | | | | | | | | Jason pointed out that we don't need to keep an entire copy of the serialized NIR around, we just need the SHA1. This does change our disk cache key to be taking a SHA1 of a SHA1, which is a bit odd, but should work out and be faster and use less memory. Reviewed-by: Jason Ekstrand <[email protected]>
* spirv: Change spirv_to_nir() to return a nir_shaderCaio Marcelo de Oliveira Filho2019-05-291-4/+4
| | | | | | | | | | | | | | | spirv_to_nir() returned the nir_function corresponding to the entrypoint, as a way to identify it. There's now a bool is_entrypoint in nir_function and also a helper function to get the entry_point from a nir_shader. The return type reflects better what the function name suggests. It also helps drivers avoid the mistake of reusing internal shader references after running NIR_PASS on it. When using NIR_TEST_CLONE or NIR_TEST_SERIALIZE, those would be invalidated right in the first pass executed. Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* virgl: fix readback with pending transfersChia-I Wu2019-05-291-6/+26
| | | | | | | | | | When readback is true, and there are pending writes in the transfer queue, we should flush to avoid reading back outdated data. This fixes piglit arb_copy_buffer/dlist and a subtest of arb_copy_buffer/data-sync. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Alexandros Frantzis <[email protected]>
* radeonsi: Fix editorconfigConnor Abbott2019-05-291-0/+1
| | | | | | | At least on vim, indenting doesn't work without this. Copied from src/amd/vulkan. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: clean up winsys creationMarek Olšák2019-05-272-6/+26
| | | | | | - unify the code - choose radeon or amdgpu based on the DRM version, not based on which one succeeds first
* radeonsi: allow query functions for compute-only contextsMarek Olšák2019-05-272-4/+5
|
* ac: treat Mullins as Kabini, remove the enumMarek Olšák2019-05-272-6/+3
| | | | it's the same design
* etnaviv: rs: choose clear format based on block sizeChristian Gmeiner2019-05-271-1/+13
| | | | | | | | Fixes following piglit and does not introduce any regressions. spec@ext_packed_depth_stencil@fbo-depth-gl_depth24_stencil8-blit Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Lucas Stach <[email protected]>
* lima/ppir: implement discard and discard_ifVasily Khoruzhick2019-05-277-10/+253
| | | | | | | | This commit also adds codegen for branch since we need it for discard_if. Reviewed-by: Qiang Yu <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* iris: Don't flag IRIS_DIRTY_URB after BLORP operations unless it changedKenneth Graunke2019-05-261-0/+1
| | | | | We already flag IRIS_DIRTY_URB when we change it, but we were additionally flagging it on every BLORP operation, even if we didn't.
* panfrost/midgard: Implement fneg/fabs/fsatAlyssa Rosenzweig2019-05-261-0/+20
| | | | | | | | | Fix a regression I inadvertently caused by acking typeless movs before implementing/pushing this *whistles* Nothing to see here, move along folks. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* lima: fix lima_blit with non-zero level source resourceQiang Yu2019-05-251-25/+12
| | | | | | | | | | | lima_blit will do blit between resources with different levels. When blit from a level!=0 source, it will sample from that level of resource as texture. Current texture setup won't respect level when not mipmap filter. Reviewed-by: Vasily Khoruzhick <[email protected]> Signed-off-by: Qiang Yu <[email protected]>
* lima: fix render to non-zero level textureQiang Yu2019-05-251-4/+6
| | | | | | | Current implementation won't respect level of surface to render. Reviewed-by: Vasily Khoruzhick <[email protected]> Signed-off-by: Qiang Yu <[email protected]>
* virgl: remove an incorrect check in virgl_res_needs_flushChia-I Wu2019-05-241-15/+0
| | | | | | | | | | | | | | | | | | | | | Imagine this resource_copy_region(ctx, dst, ..., src, ...); transfer_map(ctx, src, 0, PIPE_TRANSFER_WRITE, ...); at the beginning of a cmdbuf. We need to flush in transfer_map so that the transfer is not reordered before the resource copy. The check for "vctx->num_draws == 0 && vctx->num_compute == 0" is not enough. Removing the optimization entirely. Because of the more precise resource tracking in the previous commit, I hope the performance impact is minimized. We will have to go with perfect resource tracking, or attempt a more limited optimization, if there are specific cases we really need to optimize for. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: reemit resources on first draw/clear/computeChia-I Wu2019-05-241-6/+24
| | | | | | | | | This gives us more precise resource tracking. It can be beneficial because glFlush is often followed by state changes. We don't want to reemit resources that are going to be unbound. Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: add missing emit_res for SO targetsChia-I Wu2019-05-241-2/+8
| | | | | Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Gurchetan Singh <[email protected]>
* panfrost: Dereference sampled textureTomeu Vizoso2019-05-241-6/+3
| | | | | | | | We are currently leaking resources if they were sampled from. Once we are done with a sampler, we should dereference that resource. Signed-off-by: Tomeu Vizoso <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* panfrost: ci: Avoid pulling Docker image on every runTomeu Vizoso2019-05-241-23/+29
| | | | | | | | | | Jump over the container stage if we haven't changed any of the files that involved in building the container images. This saves 1-2 minutes in each run and helps conserve resources. Signed-off-by: Tomeu Vizoso <[email protected]> Acked-by: Alyssa Rosenzweig <[email protected]>
* nir: Drop imov/fmov in favor of one mov instructionJason Ekstrand2019-05-246-23/+16
| | | | | | | | | | | | | | | | The difference between imov and fmov has been a constant source of confusion in NIR for years. No one really knows why we have two or when to use one vs. the other. The real reason is that they do different things in the presence of source and destination modifiers. However, without modifiers (which many back-ends don't have), they are identical. Now that we've reworked nir_lower_to_source_mods to leave one abs/neg instruction in place rather than replacing them with imov or fmov instructions, we don't need two different instructions at all anymore. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]> Acked-by: Rob Clark <[email protected]>