mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	panfrost/midgard: Cleanup copy propagation	Alyssa Rosenzweig	2019-06-04	1	-11/+4
\| \| \| \| \|	Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
*	panfrost/midgard: Implement "pipeline register" prepass	Alyssa Rosenzweig	2019-06-04	4	-2/+96
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This prepass, run after scheduling but before RA, specializes to pipeline registers where possible. It walks the IR, checking whether sources are ever used outside of the immediate bundle in which they are written. If they are not, they are rewritten to a pipeline register (r24 or r25), valid only within the bundle itself. This has theoretical benefits for power consumption and register pressure (and performance by extension). While this is tested to work, it's not clear how much of a win it really is, especially without an out-of-order scheduler (yet!). Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
*	panfrost/midgard: Helpers for pipeline	Alyssa Rosenzweig	2019-06-04	5	-9/+79
\| \| \| \| \|	Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
*	panfrost/midgard: Refactor schedule/emit pipeline	Alyssa Rosenzweig	2019-06-04	6	-707/+744
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	First, this moves the scheduler and emitter out of midgard_compile.c into their own dedicated files. More interestingly, this slims down midgard_bundle to be essentially an array of _pointers_ to midgard_instructions (plus some bundling metadata), rather than the instructions and packing themselves. The difference is critical, as it means that (within reason, i.e. as long as it doesn't affect the schedule) midgard_instrucitons can now be modified _after_ scheduling while having changes updated in the final binary. On a more philosophical level, this removes an IR. Previously, the IR before scheduling (MIR) was separate from the IR after scheduling (post-schedule MIR), requiring a separate set of utilities to traverse, using different idioms. There was no good reason for this, and it restricts our flexibility with the RA. So unify all the things! Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
*	panfrost/midgard: Cleanup RA (stylistic changes)	Alyssa Rosenzweig	2019-06-04	1	-16/+30
\| \| \| \| \| \| \|	Trivial. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
*	panfrost/midgard: Share MIR utilities	Alyssa Rosenzweig	2019-06-04	2	-40/+46
\| \| \| \| \| \| \|	These are more generally useful than the files they were constrained to. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
*	panfrost/midgard: Misc. cleanup for readibility	Alyssa Rosenzweig	2019-06-04	2	-15/+35
\| \| \| \| \| \| \| \|	Mostly, this fixes a number of instances of lines >> 80 chars, refactoring them into something legible. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
*	panfrost/midgard: Extend RA to non-vec4 sources	Alyssa Rosenzweig	2019-06-04	1	-77/+278
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This represents a major break with the former RA design. We now use conflicting register classes to represent the subdivision of Midgard's 128-bit registers into varying sizes and arrangement. We determine class based on the number of components in the instructions' masks. To support this, we include a number of helpers in the RA to allow composing swizzles and masks, such that MIR written implicitly assuming .xyzw sources can be transformed to use actual (non-aligned) sources. The net result is a marked decrease in register pressure on non-vec4-exclusive shaders. We could still be doing much better. Not implemented yet are: - Register spilling - Per-component liveness Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
*	panfrost/midgard: Set masks on ld_vary	Alyssa Rosenzweig	2019-06-04	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \|	These masks distinguish scalar/vec2/vec3 loads from the default vec4, which helps with assembly readability (since it's immediately obvious how many components are _actually_ affected, rather than doing mysterious things to an unknown number of unused components). Later in the series, this will enable smarter register allocation, as the unused components will not be interpreted abnormally. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
*	panfrost/midgard: Fix liveness analysis bugs	Alyssa Rosenzweig	2019-06-04	1	-2/+8
\| \| \| \| \| \| \| \| \|	This fixes liveness analysis with respect to inline constants and branching. in practice, the symptom is abnormally high register pressure. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
*	panfrost/midgard: Set int outmod for "pasted" code	Alyssa Rosenzweig	2019-06-04	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	These snippets of integer assembly are injected for various purposes. Eventually, we'll want to implement these in NIR directly. Regardless, the "default" output modifier is different between floats and ints, so let's set the right one. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
*	panfrost/midgard: Hoist some utility functions	Alyssa Rosenzweig	2019-06-04	3	-64/+71
\| \| \| \| \| \| \| \|	These were static to midgard_compile.c but are more generally useful across the compiler. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
*	panfrost/midgard: Remove pinning	Alyssa Rosenzweig	2019-06-04	2	-27/+2
\| \| \| \| \| \| \| \| \|	This mechanism is only used by blend shaders, so just use a move here. Ideally, it'll be copy-propped and DCE'd away; this removes a source of considerable indirection and will simplify RA logic. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
*	radeonsi/nir: Fix type in bindless address computation	Connor Abbott	2019-06-04	1	-2/+2
\| \| \| \| \| \|	Bindless handles in GL are 64-bit. This fixes an assert failure in LLVM. Reviewed-by: Marek Olšák <[email protected]>
*	etnaviv: implement set_active_query_state(..) for hw queries	Christian Gmeiner	2019-06-04	1	-1/+10
\| \| \| \| \| \| \| \| \| \|	Clear w/ quad uses a normal draw which adds up to OQ. st/meta uses set_active_query_state(..) to tell the driver to pause queries in such cases. Fixes spec@arb_occlusion_query@occlusion_query_meta_save piglit. Signed-off-by: Christian Gmeiner <[email protected]>
*	iris: Fix SO stride units for DrawTransformFeedback	Kenneth Graunke	2019-06-03	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Mesa measures in DWords. The hardware also claims to measure in DWords. Except the SO_WRITE_OFFSET field is actually bits 31:2, with 1:0 MBZ. Which means that it really measures in bytes. So, convert to bytes. Without this, our offset / stride denominator was 1/4th the size it should be, leading to 4x the vertex count that we should have had. Fixes GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_two_buffers
*	amd/common: use generated register header	Nicolai Hähnle	2019-06-03	8	-9/+6
\|
*	amd/common: use SH{0,1}_CU_EN definitions only of COMPUTE_STATIC_THREAD_MGMT_SE0	Nicolai Hähnle	2019-06-03	1	-5/+5
\| \| \| \| \| \| \|	The automatic header generation unifies identical registers in a series and only emits definitions for the first one. This is mostly to avoid emitting excessive definitions for CB registers, but special-casing an exception for this family of registers doesn't seem worth it.
*	amd/common: unify PITCH_GFX6 and PITCH_GFX9	Nicolai Hähnle	2019-06-03	2	-7/+7
\| \| \| \| \| \| \| \| \| \| \|	The definition of the fields differs, but PITCH_GFX9 is a mere extension of PITCH_GFX6 that does not conflict with any other fields. This aligns the definitions with what will be generated from the register JSON. The information about how large the fields really are is preserved in the register database.
*	amd/common: cleanup DATA_FORMAT/NUM_FORMAT field names	Nicolai Hähnle	2019-06-03	2	-8/+8
\| \| \| \| \| \| \| \| \| \|	The field layout wasn't actually changed in gfx9, so having the suffix isn't very useful. The field contents were changed, but this is reflected in the V_xxx_xxx definitions and is taken into account by the ac_debug logic based on the register JSON. This aligns the definitions with what will be generated from the register JSON.
*	iris: Always reserve binding table space for NIR constants	Caio Marcelo de Oliveira Filho	2019-06-03	2	-9/+14
\| \| \| \| \| \| \| \|	Don't have a separate mechanism for NIR constants to be removed from the table. If unused, we will compact it away. The use_null_surface is needed when INTEL_DISABLE_COMPACT_BINDING_TABLE is set. Reviewed-by: Kenneth Graunke <[email protected]>
*	iris: Print binding tables when INTEL_DEBUG=bt	Caio Marcelo de Oliveira Filho	2019-06-03	1	-0/+53
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	iris: Compact binding tables	Caio Marcelo de Oliveira Filho	2019-06-03	3	-76/+234
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change the iris_binding_table to keep track of what surfaces are actually going to be used, then assign binding table indices just for those. Reducing unused bytes on those are valuable because we use a reduced space for those tables in Iris. The rest of the driver can go from "group indices" (i.e. UBO #2) to BTI and vice-versa using helper functions. The value IRIS_SURFACE_NOT_USED is returned to indicate a certain group index is not used or a certain BTI is not valid. The environment variable INTEL_DISABLE_COMPACT_BINDING_TABLE can be set to skip compacting binding table. v2: (all from Ken) Use BITFIELD64_MASK helper. Improve comments. Assert all group is marked as used when we have indirects. Reviewed-by: Kenneth Graunke <[email protected]>
*	iris: Create an enum for the surface groups	Caio Marcelo de Oliveira Filho	2019-06-03	3	-35/+45
\| \| \| \| \| \| \|	This will make convenient to handle compacting and printing the binding table. Reviewed-by: Kenneth Graunke <[email protected]>
*	iris: Handle binding table in the driver	Caio Marcelo de Oliveira Filho	2019-06-03	6	-121/+232
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Stop using brw_compiler to lower the final binding table indices for surface access. This is done by simply not setting the 'prog_data->binding_table.*_start' fields. Then make the driver perform this lowering. This is a better place to perfom the binding table assignments, since the driver has more information and will also later consume those assignments to upload resources. This also prepares us for two changes: use ibc without having to implement binding table logic there; and remove unused entries from the binding table. Since the `block` field in brw_ubo_range now refers to the final binding table index, we need to adjust it before using to index shs->constbuf. Reviewed-by: Kenneth Graunke <[email protected]>
*	iris: Pull brw_nir_analyze_ubo_ranges() call out setup_uniforms	Caio Marcelo de Oliveira Filho	2019-06-03	1	-3/+10
\| \| \| \| \| \| \| \| \|	We'll change iris to perform lowering of the binding table indices earlier (before the backend kick in), but the backend compiler uses the result of the analysis to identify load_ubo intrinsics, so we do the analysis after the lowering to have the right indices. Reviewed-by: Kenneth Graunke <[email protected]>
*	freedreno/ir3: fix counting and printing for half registers.	Hyunjun Ko	2019-06-03	2	-2/+2
\| \| \| \| \|	v2: defining 0x100 and use this for setting the FS_OUTPUT_REG.HALF_PRECISION Signed-off-by: Rob Clark <[email protected]>
*	freedreno/ir3: Use output type size to set OUTPUT_REG_HALF_PRECISION	Neil Roberts	2019-06-03	2	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously the A5XX_SP_FS_OUTPUT_REG_HALF_PRECISION was set depending on whether half_precision was set in the shader key. With support for mediump precision, it is possible to have different outputs use different precisions. That means we can’t have a global shader state to specify it. Instead it now tries to copy the half-float-ness from the nir_variable for the output into the ir3_shader_variant. This is then used to decide whether to set half-precision for each output. The a6xx version is copied from the a5xx code but it has not been tested. v2. [Hyunjun Ko ([email protected])] There's the half flag recently added, which represents precision based on IR3_REG_HALF. Now use this flag to avoid duplication. Signed-off-by: Rob Clark <[email protected]>
*	radeonsi: init sctx->dma_copy before using it	Pierre-Eric Pelloux-Prayer	2019-06-03	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit a1378639ab19 reordered context functions initializations but broke sctx->b.resource_copy_region init when using AMD_DEBUG=forcedma. In this case sctx->dma_copy was assigned a value after being used in: sctx->b.resource_copy_region = sctx->dma_copy; This commit moves the FORCE_DMA special case after sctx->dma_copy initialization. See https://bugs.freedesktop.org/show_bug.cgi?id=110422 Signed-off-by: Marek Olšák <[email protected]>
*	ac: use amdgpu-flat-work-group-size	Marek Olšák	2019-06-03	1	-5/+2
\| \| \| \|	Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	etnaviv: drop a bunch of duplicated gallium PIPE_CAP default code	Christian Gmeiner	2019-06-03	1	-157/+0
\| \| \| \| \| \| \|	Now that we have the util function for the default values, we can get rid of the boilerplate. Signed-off-by: Christian Gmeiner <[email protected]>
*	nir: copy intrinsic type when lowering load input/uniform and store output	Jonathan Marek	2019-06-03	1	-0/+1
\| \| \| \| \| \| \| \| \|	Fixes: c1275052 "nir: add type information to load uniform/input and store output intrinsics" Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Erico Nunes <[email protected]> Tested-by: Erico Nunes <[email protected]> Tested-by: Andreas Baierl <[email protected]>
*	iris: Drop unused locals from iris_clear.c to avoid warning	Caio Marcelo de Oliveira Filho	2019-05-31	1	-3/+0
\| \| \| \|	Reviewed-by: Jordan Justen <[email protected]>
*	nir: remove bool lowering from lower_int_to_float	Jonathan Marek	2019-05-31	2	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removes the bool_to_float logic from the int_to_float pass, so that both can be used separately. By having separate passes we have better validation and it makes it possible to use with the lower_ftrunc option (int lowering generates ftrunc, but lower_ftrunc generates bools, ftrunc lowering should probably be reworked). For now we always expect lower_bool to come after lower_int. Also fixes f2i32 to become ftrunc and adds u2f/f2u cases. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	nir: add lower_bitshift option	Jonathan Marek	2019-05-31	2	-0/+2
\| \| \| \| \| \| \| \| \|	Add a "lower_bitshift" option, which disables optimizations introducing bitshifts and lowers ishl by constant to a multiply, so that we don't have to deal with bitshifts in int_to_float lowering. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	freedreno/a6xx: add 'type' to shader state key	Rob Clark	2019-05-31	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We could have identical texture state for both VS and FS.. which would result in VS state getting created first, and FS state mapping to the identical cmdstream. Resulting in VS state getting emitted twice and no FS state emitted. Fixes: dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.basic_array.sampler2D_both dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.struct_in_array.sampler2D_samplerCube_both dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.array_in_struct.sampler2D_samplerCube_both dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both dEQP-GLES31.functional.program_uniform.by_pointer.render.array_in_struct.sampler2D_samplerCube_both dEQP-GLES31.functional.program_uniform.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both dEQP-GLES31.functional.program_uniform.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
*	freedreno/a6xx: fix GPU crash on small render targets	Rob Clark	2019-05-31	1	-0/+7
\| \| \| \| \| \| \|	Fixes dEQP-GLES2.functional.multisampled_render_to_texture.readpixels Signed-off-by: Rob Clark <[email protected]> Acked-by: Eric Anholt <[email protected]>
*	panfrost: Remove link stage for jobs	Tomeu Vizoso	2019-05-31	2	-68/+54
\| \| \| \| \| \| \| \| \| \|	And instead, link them as they are added. Makes things a bit clearer and prepares future work such as FB reload jobs. Signed-off-by: Tomeu Vizoso <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
*	panfrost: ci: Switch to kernel 5.2-rc2	Tomeu Vizoso	2019-05-31	1	-4/+3
\| \| \| \| \|	Signed-off-by: Tomeu Vizoso <[email protected]> Acked-by: Alyssa Rosenzweig <[email protected]>
*	panfrost: ci: Update expectations	Tomeu Vizoso	2019-05-31	1	-8/+3
\| \| \| \| \| \| \|	A bunch of tests have been fixed, but some regressions have appeared on T760. Signed-off-by: Tomeu Vizoso <[email protected]>
*	radeonsi/nir: Remove hack for builtins	Connor Abbott	2019-05-31	1	-11/+2
\| \| \| \| \| \| \| \| \| \|	We now bounds check properly in the uniform loading fast path, so there's no need to disable it by pretending there are other UBO bindings in use. The way this looks at the variable name was causing problems when two piglit shaders, one with a name that triggered the hack and one that didn't, got hashed to the same thing after stripping out the names. Reviewed-by: Timothy Arceri <[email protected]>
*	radeonsi/nir: Use correct location for uniform access bound	Connor Abbott	2019-05-31	1	-1/+1
\| \| \| \| \| \| \| \| \|	location is the API-level location, but driver_location is the actual location the uniform gets passed to the driver. This apparently only caused failures with builtins, where the location is 0 because it's represented via the state tokens instead. Reviewed-by: Timothy Arceri <[email protected]>
*	radeonsi/nir: Correctly handle double TCS/TES varyings	Connor Abbott	2019-05-31	1	-4/+28
\| \| \| \| \| \| \| \| \| \| \|	ac expands the store to 32-bit components for us, but we still have to deal with storing up to 8 components, and when a varying is split across two vec4 slots we have to calculate the address again for the second slot, since they aren't adjacent in memory. I didn't do this on the ac level because we should generate better indexing arithmetic for the lds store, where slots are contiguous. Reviewed-by: Timothy Arceri <[email protected]>
*	etnaviv: blt: s/TRUE/true && s/FALSE/false	Christian Gmeiner	2019-05-31	1	-6/+6
\| \| \| \|	Signed-off-by: Christian Gmeiner <[email protected]>
*	etnaviv: rs: s/TRUE/true && s/FALSE/false	Christian Gmeiner	2019-05-31	1	-8/+8
\| \| \| \|	Signed-off-by: Christian Gmeiner <[email protected]>
*	swr/rast: Enable ARB_GL_texture_buffer_range	Jan Zielinski	2019-05-30	1	-1/+1
\| \| \| \| \| \| \| \|	No significant changes in the code needed to enable the extension. Just updating SWR capabilities and the documentation Reviewed-by: Alok Hota <[email protected]>
*	swr/rast: fix 32-bit compilation on Linux	Jan Zielinski	2019-05-30	1	-65/+0
\| \| \| \| \| \| \|	Removing unused but problematic code from simdlib header to fix compilation problem on 32-bit Linux. Reviewed-by: Alok Hota <[email protected]>
*	iris: Avoid holding the lock while allocating pages.	Kenneth Graunke	2019-05-30	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We only need the lock for: 1. Rummaging through the cache 2. Allocating VMA We don't need it for alloc_fresh_bo(), which does GEM_CREATE, and also SET_DOMAIN to allocate the underlying pages. The idea behind calling SET_DOMAIN was to avoid a lock in the kernel while allocating pages, now we avoid our own global lock as well. We do have to re-lock around VMA. Hopefully this shouldn't happen too much in practice because we'll find a cached BO in the right memzone and not have to reallocate it. Reviewed-by: Chris Wilson <[email protected]>
*	iris: Move SET_DOMAIN to alloc_fresh_bo()	Kenneth Graunke	2019-05-30	1	-17/+15
\| \| \| \| \| \| \| \|	Chris pointed out that the order between SET_DOMAIN and SET_TILING doesn't matter, so we can just do the page allocation when creating a new BO. Simplifies the flow a bit. Reviewed-by: Chris Wilson <[email protected]>
*	iris: Be lazy about cleaning up purged BOs in the cache.	Kenneth Graunke	2019-05-29	1	-17/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mathias Fröhlich reported that commit 6244da8e23e5470d067680 crashes. list_for_each_entry_safe is safe against removing the current entry, but iris_bo_cache_purge_bucket was potentially removing next entries too, which broke our saved next pointer. To fix this, don't bother with the iris_bo_cache_purge_bucket step. We just detected a single entry where the kernel has purged the BO's memory, and so it isn't a usable entry for our cache. We're about to continue the search with the next BO. If that one's purged, we'll clean it up too. And so on. We may miss cleaning up purged BOs that are further down the list after non-purged BOs...but that's probably fine. We still have the time-based cleaner (cleanup_bo_cache) which will take care of them eventually, and the kernel's already freed their memory, so it's not that harmful to have a few kicking around a little longer. Fixes: 6244da8e23e iris: Dig through the cache to find a BO in the right memzone Reviewed-by: Chris Wilson <[email protected]>