mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	iris: Support more RGBX pipe formats.	Kenneth Graunke	2019-06-17	1	-2/+13
\| \| \| \| \| \| \| \|	Without them, the state tracker falls back to an RGBA format, but it doesn't always manage to override the swizzle for us. So we lose the information that the API expects an X channel, where alpha is garbage and reads back as 1. We have no equivalent ISL RGBX format for these, so we just use RGBA directly and override the swizzle in all cases.
*	iris: Make resource_copy_region handle packed depth-stencil resources.	Kenneth Graunke	2019-06-17	1	-0/+10
\| \| \| \| \| \|	Also copy along the separate stencil buffer if needed. Fixes Piglit's arb_copy_image-formats.
*	iris: Order CS stall and TC invalidate for format reinterpretation hacks	Kenneth Graunke	2019-06-17	1	-3/+2
\| \| \| \| \| \| \| \|	This should ensure the TC invalidate happens after the stall. Fixes KHR-GL43.copy_image.functional which does a CopyImage (blorp_copy) from a buffer (using R8G8B8A8_UINT), then GetTexImage to read back the original image (using R10G10B10A2_UNORM).
*	iris: Be more aggressive at post-format-reintepret TC invalidate hack	Kenneth Graunke	2019-06-17	1	-8/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When copying/blitting with format reinterpretation, we invalidate the texture cache before/after. Before is so the source of the copy works, and after is to get rid of our new data in the "wrong" format to protect future attempts to sample. When I ported these hacks to iris, I tried to be cautious by only bothering with the hacks if the batch referenced the BO. This makes some sense for the before case. If it isn't referenced, the texture cache can't really have any data for the BO (since it's also invalidated between batches). But we still need to do the after case regardless, as we've just polluted the cache with hazardous entries.
*	iris: Enable INTEL_shader_atomic_float_minmax	Caio Marcelo de Oliveira Filho	2019-06-13	1	-0/+1
\| \| \| \| \| \| \|	Supported only for gen >= 9. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
*	u_dynarray: turn util_dynarray_{grow, resize} into element-oriented macros	Nicolai Hähnle	2019-06-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The main motivation for this change is API ergonomics: most operations on dynarrays are really on elements, not on bytes, so it's weird to have grow and resize as the odd operations out. The secondary motivation is memory safety. Users of the old byte-oriented functions would often multiply a number of elements with the element size, which could overflow, and checking for overflow is tedious. With this change, we only need to implement the overflow checks once. The checks are cheap: since eltsize is a compile-time constant and the functions should be inlined, they only add a single comparison and an unlikely branch. v2: - ensure operations are no-op when allocation fails - in util_dynarray_clone, call resize_bytes with a compile-time constant element size v3: - fix iris, lima, panfrost Reviewed-by: Marek Olšák <[email protected]>
*	iris: Only upload surface state for grid info when needed	Caio Marcelo de Oliveira Filho	2019-06-11	1	-8/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Special care is needed to ensure that when we have two consecutive calls with the same grid size, we only bail in the second one if it either don't need the surface state or the surface state was already uploaded. v2: Instead of having a new bool in ice->state to know whether we had a surface, check whether we have state->ref. (Ken) Clean up the logic a little bit by adding 'grid_updated' local. (Ken) Reviewed-by: Sagar Ghuge <[email protected]> [v1] Reviewed-by: Kenneth Graunke <[email protected]>
*	iris: Create binding table slot for num_work_groups only when needed	Caio Marcelo de Oliveira Filho	2019-06-11	2	-2/+6
\| \| \| \| \|	Reviewed-by: Sagar Ghuge <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	iris: Enable PIPE_CAP_CS_DERIVED_SYSTEM_VALUES_SUPPORTED	Caio Marcelo de Oliveira Filho	2019-06-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This avoids lowering of CS system values by GLSL (configured by state tracker). In i965 we don't use that lowering, and we also shouldn't need that in Iris. Using it cause some unnecessary round trip between values, e.g.: shader uses gl_LocalInvocationIndex, GLSL rewrites it in terms of gl_LocalInvocationID, then driver rewrites those in terms of gl_LocalInvocationIndex again. Copy propagation can make some of those go away, but not all as seen below. Intel SKL shader-db results: total instructions in shared programs: 15595189 -> 15594556 (<.01%) instructions in affected programs: 74880 -> 74247 (-0.85%) helped: 81 HURT: 4 helped stats (abs) min: 2 max: 172 x̄: 7.88 x̃: 4 helped stats (rel) min: 0.19% max: 5.66% x̄: 1.71% x̃: 1.23% HURT stats (abs) min: 1 max: 2 x̄: 1.25 x̃: 1 HURT stats (rel) min: 0.45% max: 1.65% x̄: 0.76% x̃: 0.46% 95% mean confidence interval for instructions value: -11.56 -3.34 95% mean confidence interval for instructions %-change: -1.91% -1.28% Instructions are helped. total loops in shared programs: 4831 -> 4831 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 372136618 -> 372145628 (<.01%) cycles in affected programs: 9218230 -> 9227240 (0.10%) helped: 131 HURT: 86 helped stats (abs) min: 1 max: 798 x̄: 39.79 x̃: 12 helped stats (rel) min: <.01% max: 6.75% x̄: 0.42% x̃: 0.13% HURT stats (abs) min: 2 max: 2442 x̄: 165.38 x̃: 6 HURT stats (rel) min: <.01% max: 20.83% x̄: 0.74% x̃: 0.12% 95% mean confidence interval for cycles value: -2.07 85.11 95% mean confidence interval for cycles %-change: -0.22% 0.30% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 11956 -> 11950 (-0.05%) spills in affected programs: 77 -> 71 (-7.79%) helped: 3 HURT: 0 total fills in shared programs: 25619 -> 25549 (-0.27%) fills in affected programs: 593 -> 523 (-11.80%) helped: 4 HURT: 0 LOST: 0 GAINED: 0 Total CPU time (seconds): 1695.69 -> 1706.03 (0.61%) Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	iris: Bypass half-float pack/unpack lowering.	Kenneth Graunke	2019-06-10	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This skips GLSL IR lowering of pack/unpackHalf operations, allowing the NIR optimizer to see them Improves performance in Synmark2's OglCSDof by about 2x, by cutting about 90% of the cycles from one of the compute shaders. shader-db statistics on Skylake: 4 compute shaders went from SIMD8 to SIMD16. total instructions in shared programs: 15598871 -> 15542568 (-0.36%) instructions in affected programs: 143016 -> 86713 (-39.37%) helped: 144 HURT: 0 helped stats (abs) min: 17 max: 4669 x̄: 390.99 x̃: 164 helped stats (rel) min: 7.48% max: 85.28% x̄: 30.17% x̃: 24.22% 95% mean confidence interval for instructions value: -510.50 -271.49 95% mean confidence interval for instructions %-change: -32.70% -27.65% Instructions are helped. total cycles in shared programs: 371973958 -> 368902103 (-0.83%) cycles in affected programs: 5557722 -> 2485867 (-55.27%) helped: 144 HURT: 0 helped stats (abs) min: 106 max: 1026600 x̄: 21332.33 x̃: 1697 helped stats (rel) min: 0.53% max: 88.98% x̄: 36.12% x̃: 34.67% 95% mean confidence interval for cycles value: -41570.02 -1094.64 95% mean confidence interval for cycles %-change: -38.44% -33.80% Cycles are helped. total spills in shared programs: 11936 -> 11903 (-0.28%) spills in affected programs: 110 -> 77 (-30.00%) helped: 3 HURT: 2 total fills in shared programs: 25644 -> 25178 (-1.82%) fills in affected programs: 677 -> 211 (-68.83%) helped: 5 HURT: 0 total loops in shared programs: 4830 -> 4829 (-0.02%) loops in affected programs: 1 -> 0 helped: 1 HURT: 0
*	iris: Zero shs->cbuf0 when binding a passthrough TCS	Kenneth Graunke	2019-06-07	1	-0/+16
\| \| \| \| \| \|	Fixes valgrind errors when running two CTS tests back to back: - KHR-GL45.shader_image_load_store.basic-allTargets-loadStoreT* (The first test has an actual TCS, the second uses passthrough.)
*	iris: Rename bind_state to bind_shader_state.	Kenneth Graunke	2019-06-07	1	-9/+9
\| \| \| \| \|	bind_state is possibly the worst name ever. For create, we used create_shader_state, which is more descriptive. Put shader in the name.
*	iris: Sweep the NIR in iris_create_uncompiled_shader().	Kenneth Graunke	2019-06-07	1	-0/+2
\| \| \| \| \| \|	We run a ton of backend specific passes here (mostly brw_preprocess_nir) and ought to sweep up any unused memory at this point, since we're going to hang on to this NIR for as long as the linked program lives.
*	intel/nir: Stop returning the shader from helpers	Jason Ekstrand	2019-06-05	1	-1/+1
\| \| \| \| \| \| \| \|	Now that NIR_TEST_* doesn't swap the shader out from under us, it's sufficient to just modify the shader rather than having to return in case we're testing serialization or cloning. Reviewed-by: Kenneth Graunke <[email protected]>
*	iris: Only recompile CS when needed	Caio Marcelo de Oliveira Filho	2019-06-05	1	-1/+1
\| \| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Sagar Ghuge <[email protected]>
*	iris: Free the buffer when reading from the disk cache.	Kenneth Graunke	2019-06-04	1	-3/+8
\|
*	iris: Fix SO stride units for DrawTransformFeedback	Kenneth Graunke	2019-06-03	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Mesa measures in DWords. The hardware also claims to measure in DWords. Except the SO_WRITE_OFFSET field is actually bits 31:2, with 1:0 MBZ. Which means that it really measures in bytes. So, convert to bytes. Without this, our offset / stride denominator was 1/4th the size it should be, leading to 4x the vertex count that we should have had. Fixes GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_two_buffers
*	iris: Always reserve binding table space for NIR constants	Caio Marcelo de Oliveira Filho	2019-06-03	2	-9/+14
\| \| \| \| \| \| \| \|	Don't have a separate mechanism for NIR constants to be removed from the table. If unused, we will compact it away. The use_null_surface is needed when INTEL_DISABLE_COMPACT_BINDING_TABLE is set. Reviewed-by: Kenneth Graunke <[email protected]>
*	iris: Print binding tables when INTEL_DEBUG=bt	Caio Marcelo de Oliveira Filho	2019-06-03	1	-0/+53
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	iris: Compact binding tables	Caio Marcelo de Oliveira Filho	2019-06-03	3	-76/+234
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change the iris_binding_table to keep track of what surfaces are actually going to be used, then assign binding table indices just for those. Reducing unused bytes on those are valuable because we use a reduced space for those tables in Iris. The rest of the driver can go from "group indices" (i.e. UBO #2) to BTI and vice-versa using helper functions. The value IRIS_SURFACE_NOT_USED is returned to indicate a certain group index is not used or a certain BTI is not valid. The environment variable INTEL_DISABLE_COMPACT_BINDING_TABLE can be set to skip compacting binding table. v2: (all from Ken) Use BITFIELD64_MASK helper. Improve comments. Assert all group is marked as used when we have indirects. Reviewed-by: Kenneth Graunke <[email protected]>
*	iris: Create an enum for the surface groups	Caio Marcelo de Oliveira Filho	2019-06-03	3	-35/+45
\| \| \| \| \| \| \|	This will make convenient to handle compacting and printing the binding table. Reviewed-by: Kenneth Graunke <[email protected]>
*	iris: Handle binding table in the driver	Caio Marcelo de Oliveira Filho	2019-06-03	6	-121/+232
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Stop using brw_compiler to lower the final binding table indices for surface access. This is done by simply not setting the 'prog_data->binding_table.*_start' fields. Then make the driver perform this lowering. This is a better place to perfom the binding table assignments, since the driver has more information and will also later consume those assignments to upload resources. This also prepares us for two changes: use ibc without having to implement binding table logic there; and remove unused entries from the binding table. Since the `block` field in brw_ubo_range now refers to the final binding table index, we need to adjust it before using to index shs->constbuf. Reviewed-by: Kenneth Graunke <[email protected]>
*	iris: Pull brw_nir_analyze_ubo_ranges() call out setup_uniforms	Caio Marcelo de Oliveira Filho	2019-06-03	1	-3/+10
\| \| \| \| \| \| \| \| \|	We'll change iris to perform lowering of the binding table indices earlier (before the backend kick in), but the backend compiler uses the result of the analysis to identify load_ubo intrinsics, so we do the analysis after the lowering to have the right indices. Reviewed-by: Kenneth Graunke <[email protected]>
*	iris: Drop unused locals from iris_clear.c to avoid warning	Caio Marcelo de Oliveira Filho	2019-05-31	1	-3/+0
\| \| \| \|	Reviewed-by: Jordan Justen <[email protected]>
*	iris: Avoid holding the lock while allocating pages.	Kenneth Graunke	2019-05-30	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We only need the lock for: 1. Rummaging through the cache 2. Allocating VMA We don't need it for alloc_fresh_bo(), which does GEM_CREATE, and also SET_DOMAIN to allocate the underlying pages. The idea behind calling SET_DOMAIN was to avoid a lock in the kernel while allocating pages, now we avoid our own global lock as well. We do have to re-lock around VMA. Hopefully this shouldn't happen too much in practice because we'll find a cached BO in the right memzone and not have to reallocate it. Reviewed-by: Chris Wilson <[email protected]>
*	iris: Move SET_DOMAIN to alloc_fresh_bo()	Kenneth Graunke	2019-05-30	1	-17/+15
\| \| \| \| \| \| \| \|	Chris pointed out that the order between SET_DOMAIN and SET_TILING doesn't matter, so we can just do the page allocation when creating a new BO. Simplifies the flow a bit. Reviewed-by: Chris Wilson <[email protected]>
*	iris: Be lazy about cleaning up purged BOs in the cache.	Kenneth Graunke	2019-05-29	1	-17/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mathias Fröhlich reported that commit 6244da8e23e5470d067680 crashes. list_for_each_entry_safe is safe against removing the current entry, but iris_bo_cache_purge_bucket was potentially removing next entries too, which broke our saved next pointer. To fix this, don't bother with the iris_bo_cache_purge_bucket step. We just detected a single entry where the kernel has purged the BO's memory, and so it isn't a usable entry for our cache. We're about to continue the search with the next BO. If that one's purged, we'll clean it up too. And so on. We may miss cleaning up purged BOs that are further down the list after non-purged BOs...but that's probably fine. We still have the time-based cleaner (cleanup_bo_cache) which will take care of them eventually, and the kernel's already freed their memory, so it's not that harmful to have a few kicking around a little longer. Fixes: 6244da8e23e iris: Dig through the cache to find a BO in the right memzone Reviewed-by: Chris Wilson <[email protected]>
*	iris: Dig through the cache to find a BO in the right memzone	Kenneth Graunke	2019-05-29	1	-7/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This saves some util_vma thrash when the first entry in the cache happens to be in a different memory zone, but one just a tiny bit ahead is already there and instantly reusable. Hopefully the cost of a little extra searching won't break the bank - if it does, we can consider having separate list heads or keeping a separate VMA cache. Improves OglDrvRes performance by 22%, restoring a regression from deleting the bucket allocators in 694d1a08d3e5883d97d5352895f8431f. Thanks to Clayton Craft for alerting me to the regression. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	iris: Tidy BO sizing code and comments	Kenneth Graunke	2019-05-29	1	-12/+5
\| \| \| \| \| \|	Buckets haven't been power of two sized in over a decade. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	iris: Move some field setting after we drop the lock.	Kenneth Graunke	2019-05-29	1	-13/+13
\| \| \| \| \| \|	It's not much, but we may as well hold the lock for a bit less time. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	iris: Move cached BO allocation into a helper function.	Kenneth Graunke	2019-05-29	1	-44/+64
\| \| \| \| \| \| \|	There's enough going on here to warrant a helper. This also simplifies the control flow and eliminates the last non-error-case goto. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	iris: Fall back to fresh allocations of mapping for zero-memset fails.	Kenneth Graunke	2019-05-29	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is unlikely that we would fail to map a cached BO in order to zero its contents. When we did, we would free the first BO in the cache and try again with the second. It's possible that this next BO already had a map setup, in which case we'd succeed. But if it didn't, we'd likely fail again in the same manner. There's not much point in optimizing this case (and frankly, if we're out of CPU-side VMA we should probably dump the cache entirely)...so instead, just fall back to allocating a fresh BO from the kernel which will already be zeroed so we don't have to try and map it. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	iris: Move fresh BO allocation into a helper function.	Kenneth Graunke	2019-05-29	1	-26/+30
\| \| \| \| \| \|	There's enough going on here to warrant a helper. More cleaning coming. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	iris: Do SET_TILING at a single point rather than in two places.	Kenneth Graunke	2019-05-29	1	-20/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Both the from-cache and fresh-from-GEM cases were calling SET_TILING. In the cached case, we would retry the allocation on failure, pitching one BO from the cache each time. This is silly, because the only time it should fail is if the tiling or stride parameters are unacceptable, which has nothing to do with the particular BO in question. So there's no point in retrying - we should simply fail the allocation. This patch moves both calls to bo_set_tiling_internal() below the cache/fresh split, so we have it at a single point in time instead of two. To preserve the ordering between SET_TILING and SET_DOMAIN, we move that below as well. (I am unsure if the order matters.) Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	iris: Use the BO cache even for coherent buffers on non-LLC.	Kenneth Graunke	2019-05-29	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \|	We mark snooped BOs as non-reusable, so we never return them to the cache. This means that we'd need to call I915_GEM_SET_CACHING to make any BO we find in the cache snooped. But then again, any BO we freshly allocate from the kernel will also be non-snooped, so it has the same issue. There's really no reason to skip the cache - we may as well use it to avoid the I915_GEM_CREATE overhead. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	iris: Fix locking around vma_alloc in iris_bo_create_userptr	Kenneth Graunke	2019-05-29	1	-0/+4
\| \| \| \| \| \| \|	util_vma needs to be protected by a lock. All other callers of vma_alloc and vma_free appear to be holding a lock already. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	iris: Fix lock/unlock mismatch for non-LLC coherent BO allocation.	Kenneth Graunke	2019-05-29	1	-7/+3
\| \| \| \| \| \| \| \|	The goto jumped over the mtx_lock, but proceeded to hit the mtx_unlock. We can simply set the bucket to NULL and it will skip the cache without goto, and without messing up locking. Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
*	iris: Re-emit Surface State Base Address when context is lost.	Kenneth Graunke	2019-05-29	1	-0/+1
\| \| \| \| \| \| \|	When we hit a GPU hang, we failed to reset Surface State Base Address right away, and would keep hanging until we filled up the binder. Then we'd finally get it right after a lot of repeated stumbles. Update it right away so we hopefully hang fewer times before succeeding.
*	iris: Enable nir_opt_large_constants	Jason Ekstrand	2019-05-29	4	-0/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Shader-db results on Kaby Lake: total instructions in shared programs: 15306230 -> 15304726 (<.01%) instructions in affected programs: 4570 -> 3066 (-32.91%) helped: 16 HURT: 0 total cycles in shared programs: 361703436 -> 361680041 (<.01%) cycles in affected programs: 129388 -> 105993 (-18.08%) helped: 16 HURT: 0 LOST: 0 GAINED: 2 The helped programs were in XCom 2, Deus Ex: Mankind Divided, and Kerbal Space Program Reviewed-by: Kenneth Graunke <[email protected]>
*	iris: Don't assume UBO indices are constant	Jason Ekstrand	2019-05-29	1	-1/+2
\| \| \| \| \| \| \| \| \|	It will be true for the constant/system value buffer because they use a constant zero but it's not true in general. If we ever got here when the source wasn't constant, nir_src_as_uint would assert. Reviewed-by: Kenneth Graunke <[email protected]> Cc: [email protected]
*	iris: Move upload_ubo_ssbo_surf_state to iris_program.c	Jason Ekstrand	2019-05-29	3	-39/+56
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	iris: Clone before calling nir_strip and serializing	Kenneth Graunke	2019-05-29	1	-6/+8
\| \| \| \| \| \|	This is non-destructive and leaves the debugging information in place. Reviewed-by: Jason Ekstrand <[email protected]>
*	iris: Only store the SHA1 of the NIR in iris_uncompiled_shader	Kenneth Graunke	2019-05-29	3	-13/+7
\| \| \| \| \| \| \| \| \|	Jason pointed out that we don't need to keep an entire copy of the serialized NIR around, we just need the SHA1. This does change our disk cache key to be taking a SHA1 of a SHA1, which is a bit odd, but should work out and be faster and use less memory. Reviewed-by: Jason Ekstrand <[email protected]>
*	iris: Don't flag IRIS_DIRTY_URB after BLORP operations unless it changed	Kenneth Graunke	2019-05-26	1	-0/+1
\| \| \| \| \|	We already flag IRIS_DIRTY_URB when we change it, but we were additionally flagging it on every BLORP operation, even if we didn't.
*	iris: Advertise coherent framebuffer fetches	Kenneth Graunke	2019-05-23	1	-0/+2
\| \| \| \| \|	This lets us advertise GL_EXT_shader_framebuffer_fetch and GL_KHR_blend_equation_advanced_coherent support.
*	gallium: Change PIPE_CAP_TGSI_FS_FBFETCH bool to PIPE_CAP_FBFETCH count	Kenneth Graunke	2019-05-23	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	TGSI's FBFETCH instruction currently only supports reading from a single render target, but NIR intrinsics can support multiple render targets. radeonsi can only support fetching from RT 0, but other drivers may be able to support fetching from any render target. To express this, this patch renames PIPE_CAP_TGSI_FS_FBFETCH to simply PIPE_CAP_FBFETCH, and converts it from a boolean "is FBFETCH supported?" to an integer number of render targets which can be fetched. Reviewed-by: Marek Olšák <[email protected]>
*	iris: Record state sizes for INTEL_DEBUG=bat decoding.	Kenneth Graunke	2019-05-23	7	-5/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Felix noticed a crash when using INTEL_DEBUG=bat decoding. It turned out that we were sometimes placing variable length data near the end of a buffer, and with the decoder guessing random lengths rather than having an actual count, it was walking off the end and crashing. So this does more than improve the decoder output. Unfortunately, this is a bit more complicated than i965's handling, because we don't have a single state buffer. Various places upload data via u_upload_mgr, and so there isn't a central place to record the size. We don't need to catch every single place, however, since it's only important to record variable length packets (like viewports and binding tables). State data also lives arbitrarily long, rather than being discarded on every batch like i965, so we don't know when to clear out old entries either. (We also don't have a callback when an upload buffer is released.) So, this tracking may space leak over time. That's probably okay though, as this is only a debugging feature and it's a slow leak. We may also get lucky and overwrite existing entries as we reuse BOs, though I find this unlikely to happen. The fact that the decoder works in terms of offsets from a state base address is also not ideal, as dynamic state base address and surface state base address differ for iris. However, because dynamic state addresses start from the top of a 4GB region, and binding tables start from addresses [0, 64K), it's highly unlikely that we'll get overlap. We can always improve this, but for now it's better than what we had.
*	iris: fix android build	Tapani Pälli	2019-05-22	1	-0/+1
\| \| \| \| \| \|	Fixes: 4756864cdc5 ""iris: Start wiring up on-disk shader cache Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
*	iris: Fix ALT mode regressions from shader cache	Kenneth Graunke	2019-05-21	2	-4/+9
\| \| \| \| \| \| \| \| \| \|	We were checking this based on nir->info.name, but with the shader cache enabled, nir_strip throws out the name, causing us to use IEEE mode for ARB programs. gl-1.0-spot-light regressed because it wants ALT mode for 0^0 behavior. Fixes: dc5dc727d59 iris: Serialize the NIR to a blob we can use for shader cache purposes.
*	iris: Expose the disk cache to the state tracker as well.	Kenneth Graunke	2019-05-21	1	-0/+8
\| \| \| \| \| \| \| \|	This lets st/nir cache the NIR for shaders, based on the shader source string hash, allowing us to skip initial compiles altogether, and also letting us start from there should we need to recompile for NOS. Reviewed-by: Dylan Baker <[email protected]>