aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* iris: Re-enable param compactionJason Ekstrand2019-11-181-1/+1
| | | | | | | | | | In d1c4e64a69e, we added a parameter to tell the back-end compiler to ignore the param array and just push however many constants you ask it to push. I enabled it for iris because this is really what iris wants but it seems to have caused a number of regressions. Revert to the old behavior for now. Fixes: d1c4e64a69e "intel/compiler: Add a flag to avoid compacting..."
* mesa: enable glthread for 7 Days To DieMarek Olšák2019-11-181-0/+8
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* intel/compiler: Don't change hstride if not neededIván Briano2019-11-181-5/+6
| | | | | | | | | | | | Alignment requirements may have changed the horizontal stride already, so don't set it if not required to avoid breaking said requirements. Fixes several tests such as dEQP-VK.subgroups.vote.graphics.subgroupallequal_int8_t Signed-off-by: Iván Briano <[email protected]> Reviewed-by: Paulo Zanoni <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* turnip: add x11 wsiJonathan Marek2019-11-182-0/+114
| | | | | | | Copied from radv Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* turnip: add display wsiJonathan Marek2019-11-184-0/+366
| | | | | | | Copied from radv (minus the fence change) Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: Validate that variables are in the right listsJason Ekstrand2019-11-181-11/+15
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* etnaviv: blt: set TS dirty after clearJonathan Marek2019-11-181-0/+2
| | | | | | | | RS engine does this already, it is missing for BLT engine. This fixes cases where a clear isn't immediately at the start of the frame. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* etnaviv: separate PE and RS formats, use only RS only for tilingJonathan Marek2019-11-188-56/+54
| | | | | | | | | | | There are PE formats not supported by RS, so we can't have a single to translate both. Use RS only for same formats until we have a translate_rs_format and test the possible different format blits. Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* etnaviv: blt: use only for tiling, and add missing formatsJonathan Marek2019-11-181-22/+30
| | | | | | | | | | | | | | | | | | * Removes the incorrect usage of translate_rs_format * Disables use of BLT engine for different src/dst format We only really need the BLT engine for tiling/detiling right now, but it would be nice to support as many blit cases as possible to avoid using PE for that. To deal with different formats we need to: * Have a translate_blt_format which has all supported formats * Fix the swizzle translation from gallium (current version was wrong) * Set the src/dst sRGB bits as needed * Find which type conversions the BLT engine can actually do Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* Call shmget() with permission 0600 instead of 0777Brian Paul2019-11-183-3/+6
| | | | | | | | | | | | | | A security advisory (TALOS-2019-0857/CVE-2019-5068) found that creating shared memory regions with permission mode 0777 could allow any user to access that memory. Several Mesa drivers use shared- memory XImages to implement back buffers for improved performance. This path changes the shmget() calls to use 0600 (user r/w). Tested with legacy Xlib driver and llvmpipe. Cc: [email protected] Reviewed-by: Kristian H. Kristensen <[email protected]>
* anv: Emit a NULL vertex for zero base_vertex/instanceJason Ekstrand2019-11-181-11/+16
| | | | | | | | | | If both are zero (the common case), we can emit a null vertex buffer rather than emitting a vertex buffer with zeros in it. The packing of the VERTEX_BUFFER_STATE is faster because no relocation is emitted and we can avoid creating the vertex buffer which means one less anv_state_stream_alloc. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Use an anv_state for the next binding tableJason Ekstrand2019-11-182-12/+15
| | | | | | | | | | This is a bit more natural because we're already getting an anv_state most places in the pipeline. The important part here, however, is that we're no longer calling anv_block_pool_map on every alloc_binding_table call. While it's probably pretty cheap, it is potentially a linear walk over the list of BOs and it was showing up in profiles. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: More carefully dirty state in BindPipelineJason Ekstrand2019-11-187-25/+101
| | | | | | | | | | | | | | | Instead of blindly dirtying descriptors and push constants the moment we see a pipeline change, check to see if it actually changes the bind layout or push constant layout. This doubles the runtime performance of one CPU-limited example running with the Dawn WebGPU implementation when running on my laptop. NOTE: This effectively reverts beca63c6c07. While it was a nice optimization, it was based on prog_data and we can't do that anymore once we start allowing the same binding table to be used with multiple different pipelines. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: More carefully dirty state in BindDescriptorSetsJason Ekstrand2019-11-184-22/+51
| | | | | | | | | | | | Instead of dirtying all graphics or all compute based on binding point, we're now much more careful. We first check to see if the actual descriptor set changed and then only dirty the stages used by that descriptor set. For dynamic offsets, we keep a bitfield per-stage of which offsets are actually used in that stage and we only dirty push constants and descriptors if that stage has dynamic offsets AND those offsets actually change. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Use a switch statement for binding table setupJason Ekstrand2019-11-181-117/+127
| | | | | | | | | It theoretically could be more efficient but the real point here is that it's no longer really a matter of dealing with special cases and then the "real" thing. The way we're handling binding tables, it's more of a multi-step process and a switch is more natural. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Rework push constant handlingJason Ekstrand2019-11-1811-228/+176
| | | | | | | | | | | | | | | | | | This substantially reworks both the state setup side of push constant handling and the pipeline compile side. The fundamental change here is that we're no longer respecting the prog_data::param array and instead are just instructing the back-end compiler to leave the array alone. This makes the state setup side substantially simpler because we can now just memcpy the whole block of push constants and don't have to upload one DWORD at a time. This also means that we can compute the full push constant layout up-front and just trust the back-end compiler to not mess with it. Maybe one day we'll decide that the back-end compiler can do useful things there again but for now, this is functionally no different from what we had before this commit and makes the NIR handling cleaner. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Re-arrange push constant data a bitJason Ekstrand2019-11-183-23/+46
| | | | | | | | | | This moves the compute stuff into a anv_push_constants::cs sub-struct. It also moves dynamic offsets into the push constants. This means we have to duplicate the data per-stage but that doesn't seem like the end of the world and one day we may wish to make dynamic offsets per-stage anyway. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/compiler: Add a flag to avoid compacting push constantsJason Ekstrand2019-11-186-145/+170
| | | | | | | In vec4, we can just not run the pass. In fs, things are a bit more deeply intertwined. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Pre-compute push ranges for graphics pipelinesJason Ekstrand2019-11-188-64/+137
| | | | | | | | | It turns off that emitting push constants is one of the hottest paths in the driver and ANY work we do there costs us. By pre-computing things a bit ahead of time, we shave 5% off the runtime of a CPU-limited example running with the Dawn WebGPU implementation. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Stop bounds-checking pushed UBOsJason Ekstrand2019-11-181-28/+10
| | | | | | | | | | | | | | | The bounds checking is actually less safe than just pushing the data. If the bounds checking actually ever kicks in and it's not on the last UBO push range, then the shrinking will cause all subsequent ranges to be pushed to the wrong place in the GRF. One of the behaviors we definitely don't want is for OOB UBO access to result in completely unrelated UBOs returning garbage values. It's safer to just push the UBOs as-requested. If we're really concerned about robustness, we can emit shader code to do bounds checking which should be stupid cheap (a CMP followed by SEL). Cc: [email protected] Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Delete dead shader constant pushing codeJason Ekstrand2019-11-182-13/+7
| | | | | | | | As of 2d78e55a8c5481, nir_intrinsic_load_constant with a constant offset is constant-folded so we should never end up with any that trigger brw_nir_analyze_ubo_ranges. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Flatten descriptor bindings in anv_nir_apply_pipeline_layoutJason Ekstrand2019-11-186-76/+54
| | | | | | | | This lets us stop tracking the pipeline layout. It also means less indirection on a very hot path. As an extra bonus, we can make some of our data structures smaller. No measurable CPU overhead improvement. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Input attachments are always single-planeJason Ekstrand2019-11-181-2/+3
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* genxml: Mark everything in genX_pack.h always_inlineJason Ekstrand2019-11-181-8/+8
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/pipeline: Assume layout != NULLJason Ekstrand2019-11-181-21/+19
| | | | | | | | | In the early days of the driver we allowed layout to be VK_NULL_HANDLE and used that for some internal pipelines when we wanted to be lazy. Vulkan doesn't actually allow NULL layouts, however, so there's no reason to have this check. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/compiler: remove old commentItalo Nicola2019-11-181-3/+0
| | | | | | | This comment was correct some time ago, but since commit d3c10ad42729c1fe74a7f7c67465bd2, it isn't true anymore. Reviewed-by: Paulo Zanoni <[email protected]>
* pan/midgard: Use shader stage in mir_op_computes_derivativeAlyssa Rosenzweig2019-11-183-3/+10
| | | | | | | A 'normal' texture op may be emitted in a vertex shader on T720 but it still doesn't take any derivatives. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* i965: Unify CC_STATE and BLEND_STATE atoms on Haswell as a workaroundDanylo Piliaiev2019-11-181-2/+35
| | | | | | | | | | | | | | | | | | | Re-emitting 3DSTATE_CC_STATE_POINTERS after emitting 3DSTATE_BLEND_STATE_POINTERS fixes the shadow flickering in SuperTuxCart and Tropico 6 which was seen only on Haswell. The reason for this is unknown and fix was found empirically. The closest mention in PRM is that it should improve performance. From the HSW PRM, volume 2b, page 823 (3DSTATE_BLEND_STATE_POINTERS): "When the BLEND_STATE pointer changes but not the CC_STATE pointer, driver needs to force a CC_STATE pointer change to improve blend performance in pixel backend." Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1834 Fixes: eca4a654 ("i965: Disable dual source blending when shader doesn't support it on gen8+") Cc: <[email protected]> Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* radv: implement VK_AMD_device_coherent_memorySamuel Pitoiset2019-11-183-15/+101
| | | | | | | | | | | This extension adds the device coherent and device uncached memory types. It's known to be slower than non-device coherent memory but it might be useful for debugging. This is only exposed for chips that support L2 uncached. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add radeon_info::has_l2_uncachedSamuel Pitoiset2019-11-182-0/+4
| | | | | | | For chips that have uncached device memory (ie. MTYPE_UC). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: enable mesa_glthread for GfxBenchPierre-Eric Pelloux-Prayer2019-11-181-0/+4
| | | | | | It improves offscreen tests performance. Reviewed-by: Marek Olšák <[email protected]>
* pan/midgard: Represent ld/st offset unpackedAlyssa Rosenzweig2019-11-176-47/+14
| | | | | | | This simplifies manipulation of the offsets dramatically, fixing some UBO access related bugs. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Fix masks/alignment for 64-bit loadsAlyssa Rosenzweig2019-11-174-13/+37
| | | | | | | | These need to be handled with special care. Oh, Midgard, you're *extra* special. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Expose more typesize helpersAlyssa Rosenzweig2019-11-172-1/+21
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Implement non-aligned UBOsAlyssa Rosenzweig2019-11-171-5/+2
| | | | | | The field is more fine-grained than we had assumed. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* etnaviv: rs: upsampling is not supportedChristian Gmeiner2019-11-171-1/+32
| | | | | | | | This change makes it possible to support different downsample cases like 4 -> 2 or 4 -> 1. Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Gert Wollny <[email protected]>
* freedreno/registers: fix a6xx_2d_blit_cntl ROTATEJonathan Marek2019-11-171-2/+1
| | | | | | | | | A change from b7093882 got overwritten by 610c8c93 Fixes: 610c8c93 ("freedreno/registers: Update with GS, HS and DS registers") Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* freedreno/ir3: disable texture prefetch for 1d array texturesJonathan Marek2019-11-171-6/+5
| | | | | | | | | | Prefetch only supports the basic 2D texture case, checking is_array is needed because 1d array textures pass the coord num_components==2 test. Fixes: 2a0d45ae ("freedreno/ir3: Add a NIR pass to select tex instructions eligible for pre-fetch") Signed-off-by: Jonathan Marek <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* lima: Parse VS and PLBU command stream while making a dumpAndreas Baierl2019-11-177-0/+461
| | | | | | | | This makes the streams more readable and comparable with the blob's parser as it parses the VS and PLBU stream and shows the currently known values. Reviewed-by: Qiang Yu <[email protected]> Signed-off-by: Andreas Baierl <[email protected]>
* lima: Beautify stream dumpsAndreas Baierl2019-11-171-7/+11
| | | | | | | | | | | Change the dump, that the output looks more like the output of mali-syscall-tracker [1]. This is a preparation for a more detailed stream analysis. Reviewed-by: Qiang Yu <[email protected]> Signed-off-by: Andreas Baierl <[email protected]> [1]: https://gitlab.freedesktop.org/lima/mali-syscall-tracker
* clover/llvm: fix build after llvm 10 commit 1dfede3122eeAaron Watry2019-11-152-4/+20
| | | | | | | | | CodeGenFileType moved from ::llvm::TargetMachine in llvm/Target/TargetMachine.h to ::llvm:: in llvm/Support/CodeGen.h Signed-off-by: Aaron Watry <[email protected]> Reviewed-by: Jan Vesely <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* android: util/format: fix include path listMauro Rossi2019-11-161-1/+2
| | | | | | | | | | | | To avoid following building error: out/target/product/x86_64/obj_x86/STATIC_LIBRARIES/libmesa_util_intermediates/format/u_format_table.c:30:10: fatal error: 'u_format.h' file not found ^~~~~~~~~~~~ 1 error generated. Fixes: 882ca6d ("util: Move gallium's PIPE_FORMAT utils to /util/format/") Signed-off-by: Mauro Rossi <[email protected]>
* android: radeonsi: fix build error due to wrong u_format.csv file pathMauro Rossi2019-11-151-1/+1
| | | | | | | | | | | | GEN10_FORMAT_TABLE_INPUTS requires correction of u_format.csv file path in order to avoid following build error: ninja: error: 'external/mesa/util/format/u_format.csv', needed by 'out/target/product/x86_64/gen/STATIC_LIBRARIES/libmesa_pipe_radeonsi_intermediates/radeonsi/gfx10_format_table.h', missing and no known rule to make it Fixes: 882ca6d ("util: Move gallium's PIPE_FORMAT utils to /util/format/") Signed-off-by: Mauro Rossi <[email protected]>
* mesa/st: Reuse st_choose_matching_format from st_choose_format().Eric Anholt2019-11-155-94/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | We had this ad-hoc exact size matching for unsized internalformats, but st_choose_matching_format() can do exactly what we want. This means, that, for example, we'll now prefer the matching ordering for 565/565_REV if the driver supports both orders. We also pass Unpack.SwapBytes through from ChooseTextureFormat so that we can hit the memcpy path for 8888 formats when that flag is set. Some interesting format choice changes from this (on softpipe): intf/form/type before after ---------------------------------------------------- RGBA/RGBA/USHORT: R8G8B8A8_UNORM -> RGBA_UNORM16 RGB/RGBA/8888: X8B8G8R8_UNORM -> R8G8B8X8_UNORM RGB/ABGR/8888_REV: X8B8G8R8_UNORM -> R8G8B8X8_UNORM RGBA/RGBA/5551: B5G5R5A1_UNORM -> A1B5G5R5_UNORM RGBA/RGBA/4444: R8G8B8A8_UNORM -> A4B4G4R4_UNORM RGBA/GL_RGBA/1010102: R8G8B8A8_UNORM -> A2B10G10R10_UNORM DEPTH/DEPTH/UINT: Z24X8 -> Z_UNORM32 DEPTH/DEPTH/USHORT: Z24X8 -> Z_UNORM16 v2: Make sure that the baseformat still matches. v1 would pick MESA_FORMAT_L16_UNORM for RED/LUMINANCE/SHORT, when we clearly want a red format. Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Don't put sRGB formats in the array format table.Eric Anholt2019-11-151-8/+6
| | | | | | | | | | sRGB vs unorm was the only conflict case being guarded against in this function. Before the PIPE_FORMAT conversion, we always listed the unorm before the sRGB in the enums, but PIPE_FORMAT_A8B8G8R8_SRGB happens to be before _UNORM. We always want the unorm result here. Fixes: 807a800d8c3e ("mesa: Redefine MESA_FORMAT_* in terms of PIPE_FORMAT_*.") Reviewed-by: Kenneth Graunke <[email protected]>
* mesa/st: Simplify st_choose_matching_format().Eric Anholt2019-11-151-27/+11
| | | | | | | | | | | | | | We now have a nice helper function for finding those memcpy formats, without needing to go through each entry of the mesa format table to see if it happens to match. While looking at sysprof of a softpipe GLES2 CTS run, we were spending ~8% of the CPU on ChooseTextureFormat. With this, roughly the same region of the testsuite was .4%. v2: Add Ken's fix for canonicalizing array formats. Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Handle GL_COLOR_INDEX in _mesa_format_from_format_and_type().Kenneth Graunke2019-11-151-0/+3
| | | | | | | | | | | Just return MESA_FORMAT_NONE to avoid triggering unreachable; there's really no sensible thing to return for this case anyway. This prevents regressions in the next commit, which makes st/mesa start using this function to find a reasonable format from GL format and type enums. Reviewed-by: Eric Anholt <[email protected]>
* pan/midgard: Use generic constant packing for 8/64-bitAlyssa Rosenzweig2019-11-151-1/+1
| | | | | | | Eventually, we will want to combine constants across types, but for now let's not break the world. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Pack 64-bit swizzlesAlyssa Rosenzweig2019-11-151-21/+63
| | | | | | | 64-bit ops have their own funky swizzles. Let's pack them, both for native 64-bit sources as well as extended 32-bit sources. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Fix mir_round_bytemask_down for !32bAlyssa Rosenzweig2019-11-151-2/+2
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>