summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* panfrost/midgard: Cleanup copy propagationAlyssa Rosenzweig2019-06-041-11/+4
| | | | | Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
* panfrost/midgard: Implement "pipeline register" prepassAlyssa Rosenzweig2019-06-044-2/+96
| | | | | | | | | | | | | | This prepass, run after scheduling but before RA, specializes to pipeline registers where possible. It walks the IR, checking whether sources are ever used outside of the immediate bundle in which they are written. If they are not, they are rewritten to a pipeline register (r24 or r25), valid only within the bundle itself. This has theoretical benefits for power consumption and register pressure (and performance by extension). While this is tested to work, it's not clear how much of a win it really is, especially without an out-of-order scheduler (yet!). Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
* panfrost/midgard: Helpers for pipelineAlyssa Rosenzweig2019-06-045-9/+79
| | | | | Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
* panfrost/midgard: Refactor schedule/emit pipelineAlyssa Rosenzweig2019-06-046-707/+744
| | | | | | | | | | | | | | | | | | | | | First, this moves the scheduler and emitter out of midgard_compile.c into their own dedicated files. More interestingly, this slims down midgard_bundle to be essentially an array of _pointers_ to midgard_instructions (plus some bundling metadata), rather than the instructions and packing themselves. The difference is critical, as it means that (within reason, i.e. as long as it doesn't affect the schedule) midgard_instrucitons can now be modified _after_ scheduling while having changes updated in the final binary. On a more philosophical level, this removes an IR. Previously, the IR before scheduling (MIR) was separate from the IR after scheduling (post-schedule MIR), requiring a separate set of utilities to traverse, using different idioms. There was no good reason for this, and it restricts our flexibility with the RA. So unify all the things! Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
* panfrost/midgard: Cleanup RA (stylistic changes)Alyssa Rosenzweig2019-06-041-16/+30
| | | | | | | Trivial. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
* panfrost/midgard: Share MIR utilitiesAlyssa Rosenzweig2019-06-042-40/+46
| | | | | | | These are more generally useful than the files they were constrained to. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
* panfrost/midgard: Misc. cleanup for readibilityAlyssa Rosenzweig2019-06-042-15/+35
| | | | | | | | Mostly, this fixes a number of instances of lines >> 80 chars, refactoring them into something legible. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
* panfrost/midgard: Extend RA to non-vec4 sourcesAlyssa Rosenzweig2019-06-041-77/+278
| | | | | | | | | | | | | | | | | | | | This represents a major break with the former RA design. We now use conflicting register classes to represent the subdivision of Midgard's 128-bit registers into varying sizes and arrangement. We determine class based on the number of components in the instructions' masks. To support this, we include a number of helpers in the RA to allow composing swizzles and masks, such that MIR written implicitly assuming .xyzw sources can be transformed to use actual (non-aligned) sources. The net result is a marked decrease in register pressure on non-vec4-exclusive shaders. We could still be doing much better. Not implemented yet are: - Register spilling - Per-component liveness Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
* panfrost/midgard: Set masks on ld_varyAlyssa Rosenzweig2019-06-041-1/+3
| | | | | | | | | | | | These masks distinguish scalar/vec2/vec3 loads from the default vec4, which helps with assembly readability (since it's immediately obvious how many components are _actually_ affected, rather than doing mysterious things to an unknown number of unused components). Later in the series, this will enable smarter register allocation, as the unused components will not be interpreted abnormally. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
* panfrost/midgard: Fix liveness analysis bugsAlyssa Rosenzweig2019-06-041-2/+8
| | | | | | | | | This fixes liveness analysis with respect to inline constants and branching. in practice, the symptom is abnormally high register pressure. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
* panfrost/midgard: Set int outmod for "pasted" codeAlyssa Rosenzweig2019-06-041-0/+4
| | | | | | | | | | These snippets of integer assembly are injected for various purposes. Eventually, we'll want to implement these in NIR directly. Regardless, the "default" output modifier is different between floats and ints, so let's set the right one. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
* panfrost/midgard: Hoist some utility functionsAlyssa Rosenzweig2019-06-043-64/+71
| | | | | | | | These were static to midgard_compile.c but are more generally useful across the compiler. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
* panfrost/midgard: Remove pinningAlyssa Rosenzweig2019-06-042-27/+2
| | | | | | | | | This mechanism is only used by blend shaders, so just use a move here. Ideally, it'll be copy-propped and DCE'd away; this removes a source of considerable indirection and will simplify RA logic. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Ryan Houdek <[email protected]>
* radeonsi/nir: Fix type in bindless address computationConnor Abbott2019-06-041-2/+2
| | | | | | Bindless handles in GL are 64-bit. This fixes an assert failure in LLVM. Reviewed-by: Marek Olšák <[email protected]>
* etnaviv: implement set_active_query_state(..) for hw queriesChristian Gmeiner2019-06-041-1/+10
| | | | | | | | | | Clear w/ quad uses a normal draw which adds up to OQ. st/meta uses set_active_query_state(..) to tell the driver to pause queries in such cases. Fixes spec@arb_occlusion_query@occlusion_query_meta_save piglit. Signed-off-by: Christian Gmeiner <[email protected]>
* iris: Fix SO stride units for DrawTransformFeedbackKenneth Graunke2019-06-032-2/+2
| | | | | | | | | | | Mesa measures in DWords. The hardware also claims to measure in DWords. Except the SO_WRITE_OFFSET field is actually bits 31:2, with 1:0 MBZ. Which means that it really measures in bytes. So, convert to bytes. Without this, our offset / stride denominator was 1/4th the size it should be, leading to 4x the vertex count that we should have had. Fixes GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_two_buffers
* amd/common: use generated register headerNicolai Hähnle2019-06-038-9/+6
|
* amd/common: use SH{0,1}_CU_EN definitions only of COMPUTE_STATIC_THREAD_MGMT_SE0Nicolai Hähnle2019-06-031-5/+5
| | | | | | | The automatic header generation unifies identical registers in a series and only emits definitions for the first one. This is mostly to avoid emitting excessive definitions for CB registers, but special-casing an exception for this family of registers doesn't seem worth it.
* amd/common: unify PITCH_GFX6 and PITCH_GFX9Nicolai Hähnle2019-06-032-7/+7
| | | | | | | | | | | The definition of the fields differs, but PITCH_GFX9 is a mere extension of PITCH_GFX6 that does not conflict with any other fields. This aligns the definitions with what will be generated from the register JSON. The information about how large the fields really are is preserved in the register database.
* amd/common: cleanup DATA_FORMAT/NUM_FORMAT field namesNicolai Hähnle2019-06-032-8/+8
| | | | | | | | | | The field layout wasn't actually changed in gfx9, so having the suffix isn't very useful. The field *contents* were changed, but this is reflected in the V_xxx_xxx definitions and is taken into account by the ac_debug logic based on the register JSON. This aligns the definitions with what will be generated from the register JSON.
* iris: Always reserve binding table space for NIR constantsCaio Marcelo de Oliveira Filho2019-06-032-9/+14
| | | | | | | | Don't have a separate mechanism for NIR constants to be removed from the table. If unused, we will compact it away. The use_null_surface is needed when INTEL_DISABLE_COMPACT_BINDING_TABLE is set. Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Print binding tables when INTEL_DEBUG=btCaio Marcelo de Oliveira Filho2019-06-031-0/+53
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Compact binding tablesCaio Marcelo de Oliveira Filho2019-06-033-76/+234
| | | | | | | | | | | | | | | | | | | | | Change the iris_binding_table to keep track of what surfaces are actually going to be used, then assign binding table indices just for those. Reducing unused bytes on those are valuable because we use a reduced space for those tables in Iris. The rest of the driver can go from "group indices" (i.e. UBO #2) to BTI and vice-versa using helper functions. The value IRIS_SURFACE_NOT_USED is returned to indicate a certain group index is not used or a certain BTI is not valid. The environment variable INTEL_DISABLE_COMPACT_BINDING_TABLE can be set to skip compacting binding table. v2: (all from Ken) Use BITFIELD64_MASK helper. Improve comments. Assert all group is marked as used when we have indirects. Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Create an enum for the surface groupsCaio Marcelo de Oliveira Filho2019-06-033-35/+45
| | | | | | | This will make convenient to handle compacting and printing the binding table. Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Handle binding table in the driverCaio Marcelo de Oliveira Filho2019-06-036-121/+232
| | | | | | | | | | | | | | | | | | | | | Stop using brw_compiler to lower the final binding table indices for surface access. This is done by simply not setting the 'prog_data->binding_table.*_start' fields. Then make the driver perform this lowering. This is a better place to perfom the binding table assignments, since the driver has more information and will also later consume those assignments to upload resources. This also prepares us for two changes: use ibc without having to implement binding table logic there; and remove unused entries from the binding table. Since the `block` field in brw_ubo_range now refers to the final binding table index, we need to adjust it before using to index shs->constbuf. Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Pull brw_nir_analyze_ubo_ranges() call out setup_uniformsCaio Marcelo de Oliveira Filho2019-06-031-3/+10
| | | | | | | | | We'll change iris to perform lowering of the binding table indices earlier (before the backend kick in), but the backend compiler uses the result of the analysis to identify load_ubo intrinsics, so we do the analysis after the lowering to have the right indices. Reviewed-by: Kenneth Graunke <[email protected]>
* freedreno/ir3: fix counting and printing for half registers.Hyunjun Ko2019-06-032-2/+2
| | | | | v2: defining 0x100 and use this for setting the FS_OUTPUT_REG.HALF_PRECISION Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: Use output type size to set OUTPUT_REG_HALF_PRECISIONNeil Roberts2019-06-032-6/+2
| | | | | | | | | | | | | | | | | | | Previously the A5XX_SP_FS_OUTPUT_REG_HALF_PRECISION was set depending on whether half_precision was set in the shader key. With support for mediump precision, it is possible to have different outputs use different precisions. That means we can’t have a global shader state to specify it. Instead it now tries to copy the half-float-ness from the nir_variable for the output into the ir3_shader_variant. This is then used to decide whether to set half-precision for each output. The a6xx version is copied from the a5xx code but it has not been tested. v2. [Hyunjun Ko ([email protected])] There's the half flag recently added, which represents precision based on IR3_REG_HALF. Now use this flag to avoid duplication. Signed-off-by: Rob Clark <[email protected]>
* radeonsi: init sctx->dma_copy before using itPierre-Eric Pelloux-Prayer2019-06-031-3/+3
| | | | | | | | | | | | | | Commit a1378639ab19 reordered context functions initializations but broke sctx->b.resource_copy_region init when using AMD_DEBUG=forcedma. In this case sctx->dma_copy was assigned a value after being used in: sctx->b.resource_copy_region = sctx->dma_copy; This commit moves the FORCE_DMA special case after sctx->dma_copy initialization. See https://bugs.freedesktop.org/show_bug.cgi?id=110422 Signed-off-by: Marek Olšák <[email protected]>
* ac: use amdgpu-flat-work-group-sizeMarek Olšák2019-06-031-5/+2
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* etnaviv: drop a bunch of duplicated gallium PIPE_CAP default codeChristian Gmeiner2019-06-031-157/+0
| | | | | | | Now that we have the util function for the default values, we can get rid of the boilerplate. Signed-off-by: Christian Gmeiner <[email protected]>
* nir: copy intrinsic type when lowering load input/uniform and store outputJonathan Marek2019-06-031-0/+1
| | | | | | | | | Fixes: c1275052 "nir: add type information to load uniform/input and store output intrinsics" Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Erico Nunes <[email protected]> Tested-by: Erico Nunes <[email protected]> Tested-by: Andreas Baierl <[email protected]>
* iris: Drop unused locals from iris_clear.c to avoid warningCaio Marcelo de Oliveira Filho2019-05-311-3/+0
| | | | Reviewed-by: Jordan Justen <[email protected]>
* nir: remove bool lowering from lower_int_to_floatJonathan Marek2019-05-312-0/+3
| | | | | | | | | | | | | | Removes the bool_to_float logic from the int_to_float pass, so that both can be used separately. By having separate passes we have better validation and it makes it possible to use with the lower_ftrunc option (int lowering generates ftrunc, but lower_ftrunc generates bools, ftrunc lowering should probably be reworked). For now we always expect lower_bool to come after lower_int. Also fixes f2i32 to become ftrunc and adds u2f/f2u cases. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: add lower_bitshift optionJonathan Marek2019-05-312-0/+2
| | | | | | | | | Add a "lower_bitshift" option, which disables optimizations introducing bitshifts and lowers ishl by constant to a multiply, so that we don't have to deal with bitshifts in int_to_float lowering. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* freedreno/a6xx: add 'type' to shader state keyRob Clark2019-05-312-0/+2
| | | | | | | | | | | | | | | | | | | | We could have identical texture state for both VS and FS.. which would result in VS state getting created first, and FS state mapping to the identical cmdstream. Resulting in VS state getting emitted twice and no FS state emitted. Fixes: dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.basic_array.sampler2D_both dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.struct_in_array.sampler2D_samplerCube_both dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.array_in_struct.sampler2D_samplerCube_both dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both dEQP-GLES31.functional.program_uniform.by_pointer.render.array_in_struct.sampler2D_samplerCube_both dEQP-GLES31.functional.program_uniform.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both dEQP-GLES31.functional.program_uniform.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/a6xx: fix GPU crash on small render targetsRob Clark2019-05-311-0/+7
| | | | | | | Fixes dEQP-GLES2.functional.multisampled_render_to_texture.readpixels Signed-off-by: Rob Clark <[email protected]> Acked-by: Eric Anholt <[email protected]>
* panfrost: Remove link stage for jobsTomeu Vizoso2019-05-312-68/+54
| | | | | | | | | | And instead, link them as they are added. Makes things a bit clearer and prepares future work such as FB reload jobs. Signed-off-by: Tomeu Vizoso <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* panfrost: ci: Switch to kernel 5.2-rc2Tomeu Vizoso2019-05-311-4/+3
| | | | | Signed-off-by: Tomeu Vizoso <[email protected]> Acked-by: Alyssa Rosenzweig <[email protected]>
* panfrost: ci: Update expectationsTomeu Vizoso2019-05-311-8/+3
| | | | | | | A bunch of tests have been fixed, but some regressions have appeared on T760. Signed-off-by: Tomeu Vizoso <[email protected]>
* radeonsi/nir: Remove hack for builtinsConnor Abbott2019-05-311-11/+2
| | | | | | | | | | We now bounds check properly in the uniform loading fast path, so there's no need to disable it by pretending there are other UBO bindings in use. The way this looks at the variable name was causing problems when two piglit shaders, one with a name that triggered the hack and one that didn't, got hashed to the same thing after stripping out the names. Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi/nir: Use correct location for uniform access boundConnor Abbott2019-05-311-1/+1
| | | | | | | | | location is the API-level location, but driver_location is the actual location the uniform gets passed to the driver. This apparently only caused failures with builtins, where the location is 0 because it's represented via the state tokens instead. Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi/nir: Correctly handle double TCS/TES varyingsConnor Abbott2019-05-311-4/+28
| | | | | | | | | | | ac expands the store to 32-bit components for us, but we still have to deal with storing up to 8 components, and when a varying is split across two vec4 slots we have to calculate the address again for the second slot, since they aren't adjacent in memory. I didn't do this on the ac level because we should generate better indexing arithmetic for the lds store, where slots are contiguous. Reviewed-by: Timothy Arceri <[email protected]>
* etnaviv: blt: s/TRUE/true && s/FALSE/falseChristian Gmeiner2019-05-311-6/+6
| | | | Signed-off-by: Christian Gmeiner <[email protected]>
* etnaviv: rs: s/TRUE/true && s/FALSE/falseChristian Gmeiner2019-05-311-8/+8
| | | | Signed-off-by: Christian Gmeiner <[email protected]>
* swr/rast: Enable ARB_GL_texture_buffer_rangeJan Zielinski2019-05-301-1/+1
| | | | | | | | No significant changes in the code needed to enable the extension. Just updating SWR capabilities and the documentation Reviewed-by: Alok Hota <[email protected]>
* swr/rast: fix 32-bit compilation on LinuxJan Zielinski2019-05-301-65/+0
| | | | | | | Removing unused but problematic code from simdlib header to fix compilation problem on 32-bit Linux. Reviewed-by: Alok Hota <[email protected]>
* iris: Avoid holding the lock while allocating pages.Kenneth Graunke2019-05-301-5/+5
| | | | | | | | | | | | | | | | | We only need the lock for: 1. Rummaging through the cache 2. Allocating VMA We don't need it for alloc_fresh_bo(), which does GEM_CREATE, and also SET_DOMAIN to allocate the underlying pages. The idea behind calling SET_DOMAIN was to avoid a lock in the kernel while allocating pages, now we avoid our own global lock as well. We do have to re-lock around VMA. Hopefully this shouldn't happen too much in practice because we'll find a cached BO in the right memzone and not have to reallocate it. Reviewed-by: Chris Wilson <[email protected]>
* iris: Move SET_DOMAIN to alloc_fresh_bo()Kenneth Graunke2019-05-301-17/+15
| | | | | | | | Chris pointed out that the order between SET_DOMAIN and SET_TILING doesn't matter, so we can just do the page allocation when creating a new BO. Simplifies the flow a bit. Reviewed-by: Chris Wilson <[email protected]>
* iris: Be lazy about cleaning up purged BOs in the cache.Kenneth Graunke2019-05-291-17/+1
| | | | | | | | | | | | | | | | | | | | | | Mathias Fröhlich reported that commit 6244da8e23e5470d067680 crashes. list_for_each_entry_safe is safe against removing the current entry, but iris_bo_cache_purge_bucket was potentially removing next entries too, which broke our saved next pointer. To fix this, don't bother with the iris_bo_cache_purge_bucket step. We just detected a single entry where the kernel has purged the BO's memory, and so it isn't a usable entry for our cache. We're about to continue the search with the next BO. If that one's purged, we'll clean it up too. And so on. We may miss cleaning up purged BOs that are further down the list after non-purged BOs...but that's probably fine. We still have the time-based cleaner (cleanup_bo_cache) which will take care of them eventually, and the kernel's already freed their memory, so it's not that harmful to have a few kicking around a little longer. Fixes: 6244da8e23e iris: Dig through the cache to find a BO in the right memzone Reviewed-by: Chris Wilson <[email protected]>