aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* meson: replace libmesa_util with idep_mesautilEric Engestrom2019-08-0351-125/+124
| | | | | | | | | | | This automates the include_directories and dependencies tracking so that all users of libmesa_util don't need to add them manually. Next commit will remove the ones that were only added for that reason. Signed-off-by: Eric Engestrom <[email protected]> Acked-by: Eric Anholt <[email protected]> Tested-by: Vinson Lee <[email protected]>
* pan/midgard: Print texture outmodAlyssa Rosenzweig2019-08-022-4/+8
| | | | | | I have no idea who thought this was a good idea. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Promote all 16 uniformsAlyssa Rosenzweig2019-08-023-9/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that register spilling is in place, this is reasonable. It turns out for some shaders, it's actually better to cap at 8 work registers and extra >8 uniform reigsters and tolerate the spilling, since the extra resulting threads make up for the spillage. So incidentally, the shader that spills here is in -bterrain, which jumps from 19fps to 21fps as a result of this change. total instructions in shared programs: 3513 -> 3448 (-1.85%) instructions in affected programs: 776 -> 711 (-8.38%) helped: 20 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 3.25 x̃: 2 helped stats (rel) min: 3.57% max: 16.00% x̄: 8.37% x̃: 7.19% 95% mean confidence interval for instructions value: -4.28 -2.22 95% mean confidence interval for instructions %-change: -10.02% -6.73% Instructions are helped. total bundles in shared programs: 2067 -> 2024 (-2.08%) bundles in affected programs: 515 -> 472 (-8.35%) helped: 19 HURT: 1 helped stats (abs) min: 1 max: 6 x̄: 2.37 x̃: 2 helped stats (rel) min: 2.13% max: 17.86% x̄: 10.19% x̃: 11.11% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 3.23% max: 3.23% x̄: 3.23% x̃: 3.23% 95% mean confidence interval for bundles value: -3.01 -1.29 95% mean confidence interval for bundles %-change: -12.13% -6.91% Bundles are helped. total quadwords in shared programs: 3468 -> 3426 (-1.21%) quadwords in affected programs: 764 -> 722 (-5.50%) helped: 19 HURT: 1 helped stats (abs) min: 1 max: 5 x̄: 2.26 x̃: 2 helped stats (rel) min: 1.41% max: 12.50% x̄: 6.76% x̃: 7.14% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.08% max: 1.08% x̄: 1.08% x̃: 1.08% 95% mean confidence interval for quadwords value: -2.83 -1.37 95% mean confidence interval for quadwords %-change: -8.08% -4.65% Quadwords are helped. total registers in shared programs: 383 -> 360 (-6.01%) registers in affected programs: 112 -> 89 (-20.54%) helped: 19 HURT: 0 helped stats (abs) min: 1 max: 3 x̄: 1.21 x̃: 1 helped stats (rel) min: 12.50% max: 27.27% x̄: 20.63% x̃: 20.00% 95% mean confidence interval for registers value: -1.47 -0.95 95% mean confidence interval for registers %-change: -22.39% -18.87% Registers are helped. total threads in shared programs: 432 -> 451 (4.40%) threads in affected programs: 19 -> 38 (100.00%) helped: 11 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.73 x̃: 2 helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% 95% mean confidence interval for threads value: 1.41 2.04 95% mean confidence interval for threads %-change: 100.00% 100.00% Threads are [helped]. total loops in shared programs: 4 -> 4 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 0 -> 4 spills in affected programs: 0 -> 4 helped: 0 HURT: 2 total fills in shared programs: 0 -> 7 fills in affected programs: 0 -> 7 helped: 0 HURT: 2 Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Break mir_spill_register into its functionAlyssa Rosenzweig2019-08-021-117/+129
| | | | | | | No functional changes, just breaks out a megamonster function and fixes the indentation. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Switch sources to an array for trinary sourcesAlyssa Rosenzweig2019-08-0212-145/+133
| | | | | | | | | We need three independent sources to support indirect SSBO writes (as well as textures with both LOD/bias and offsets). Now is a good time to make sources just an array so we don't have to rewrite a ton of code if we ever needed a fourth source for some reason. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Remove "r27-only" register classAlyssa Rosenzweig2019-08-025-97/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As far as I know, there's no such thing as a load/store op that only takes its argument in r27. We just need to set the appropriate arg_1 field in the RA to specify other registers if we want them. To facilitate this, various RA-related changes are needed across the compiler ; this should also fix indirect offsets which were implicitly interpreted as "r27-only" despite not even passing through RA yet. One ripple effect change is switching the move insertion point and adjusting the liveness analysis accordingly, so while this was intended as a purely functional change, there are some shader-db changes: total instructions in shared programs: 3511 -> 3498 (-0.37%) instructions in affected programs: 563 -> 550 (-2.31%) helped: 12 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.08 x̃: 1 helped stats (rel) min: 0.93% max: 5.00% x̄: 2.58% x̃: 2.33% 95% mean confidence interval for instructions value: -1.27 -0.90 95% mean confidence interval for instructions %-change: -3.23% -1.93% Instructions are helped. total bundles in shared programs: 2067 -> 2067 (0.00%) bundles in affected programs: 398 -> 398 (0.00%) helped: 7 HURT: 4 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 1.54% max: 10.00% x̄: 5.04% x̃: 5.56% HURT stats (abs) min: 1 max: 2 x̄: 1.75 x̃: 2 HURT stats (rel) min: 2.13% max: 4.26% x̄: 3.72% x̃: 4.26% 95% mean confidence interval for bundles value: -0.95 0.95 95% mean confidence interval for bundles %-change: -5.21% 1.50% Inconclusive result (value mean confidence interval includes 0). total quadwords in shared programs: 3464 -> 3454 (-0.29%) quadwords in affected programs: 1199 -> 1189 (-0.83%) helped: 18 HURT: 4 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 1.03% max: 5.26% x̄: 2.44% x̃: 1.79% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 2.56% max: 2.82% x̄: 2.63% x̃: 2.56% 95% mean confidence interval for quadwords value: -0.98 0.07 Inconclusive result (value mean confidence interval includes 0). total registers in shared programs: 383 -> 373 (-2.61%) registers in affected programs: 56 -> 46 (-17.86%) helped: 12 HURT: 2 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 9.09% max: 33.33% x̄: 29.58% x̃: 33.33% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 20.00% max: 50.00% x̄: 35.00% x̃: 35.00% 95% mean confidence interval for registers value: -1.13 -0.29 95% mean confidence interval for registers %-change: -35.07% -5.63% Registers are helped. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Handle get/set_swizzle for load/store argumentsAlyssa Rosenzweig2019-08-022-3/+83
| | | | | | | Load/store's main "argument 0" already has its swizzle handled correctly (for stores, that is). But the tinier arguments, the compact ones with a component select but not a full swizzle, those are not yet handled. Let's do something about that!
* pan/midgard: Fix block successorsAlyssa Rosenzweig2019-08-022-29/+43
| | | | | | | | | Rather than an ersatz thing that sort of looks like successors but is in fact just the source order traversal with some backward jumps hacked in for loops... construct an actual flow graph so we can do analysis sanely. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Add helper to pack load/store registersAlyssa Rosenzweig2019-08-021-0/+18
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Decode register/component in load/store argumentAlyssa Rosenzweig2019-08-022-2/+24
| | | | | | 3-bits out of 8 down! Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Fix REGISTER_OFFSETAlyssa Rosenzweig2019-08-022-3/+2
| | | | | | r27 isn't the special one, usually. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Split ld/st unknown to arg_1/arg_2 fieldsAlyssa Rosenzweig2019-08-027-17/+46
| | | | | | | | | The 16-bit field can be decomposed to two independent 8-bit fields, each representing a single (additional) argument to the load/store op, generally used for encoding registers. Addressable registers here are substantially limited compared to the main register in a load/store op. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* radv: Expose VK_KHR_imageless_framebuffer.Bas Nieuwenhuizen2019-08-023-0/+8
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Implement VK_KHR_imageless_framebuffer.Bas Nieuwenhuizen2019-08-022-10/+38
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Store image view also outside framebuffer.Bas Nieuwenhuizen2019-08-026-33/+31
| | | | | | So we can use it with imageless framebuffers. Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Store color/depth surface info in attachment info instead of framebuffer.Bas Nieuwenhuizen2019-08-027-104/+102
| | | | | | That way we can use it for imageless framebuffers. Reviewed-by: Samuel Pitoiset <[email protected]>
* panfrost: Allocate polygon lists on-demandAlyssa Rosenzweig2019-08-026-10/+36
| | | | | | | | | | | | | | | Rather than alloacting a huge (64MB) polygon list on context creation and sharing it across framebuffers, we instead allocate polygon lists as BOs (which consistently hit the cache) sized appropriately; for about a month, we've known how to calculate the polygon list size so this has only recently become possible. The good news is we can render to truly massive framebuffers without crashing and, more importantly, we eliminate the 64MB upfront overhead. If a list that size isn't actually needed, it's not allocated. Signed-off-by: Alyssa Rosenzweig <[email protected]> Signed-off-by: Boris Brezillon <[email protected]>
* panfrost: Handle the bo == NULL case in panfrost_bo_[un]reference()Boris Brezillon2019-08-021-1/+5
| | | | | | | Allows us to pass BOs without checking if they're NULL or not. Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Get rid of the skippable param in attach_vt_framebuffer()Boris Brezillon2019-08-021-3/+3
| | | | | | | The only user of this function always passes true. Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Don't emit a new FB desc when setting a new FB stateBoris Brezillon2019-08-021-1/+5
| | | | | | | | The FB desc will be emitted/attached on the first draw targetting this new FB. Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Bail out early when doing a wallpaper blitBoris Brezillon2019-08-021-2/+14
| | | | | | | | | | | The wallpaper blit is a bit special in that the operation is targetting the current FB, but the u_blitter logic creates a new surface for it which makes util_framebuffer_state_equal() return false. In that case we don't want a new FB descriptor to be emitted/attached, so let's just copy the new state into ctx->pipe_framebuffer and exit the function. Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Bail out early when new and current FB states are equalBoris Brezillon2019-08-021-0/+4
| | | | | | | | | If the current FB matches the new one there's nothing to be done in panfrost_set_framebuffer_state(). By bailing out early in that case we avoid emitting new FB descriptors (the old ones are still valid). Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Delay FB descriptor allocationBoris Brezillon2019-08-022-18/+6
| | | | | | | | | No need to emit SFBD/MFBD at frame invalidation. They can be emitted when the framebuffer is attached, which saves us a potential FB desc re-allocation if a new FB is bound after the swap. Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Remove job from ctx->jobs at submission timeBoris Brezillon2019-08-021-0/+8
| | | | | | | | This guarantees that new draws targetting the same framebuffer will get a new job instance. Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Make ctx->job usefulBoris Brezillon2019-08-022-1/+23
| | | | | | | | | | | | | | | | ctx->job is supposed to serve as a cache to avoid an hash table lookup everytime we access the job attached to the currently bound FB, except it was never assigned to anything but NULL. Fix that by adding the missing assignment in panfrost_get_job_for_fbo(). Also add a missing NULL assignment in the ->set_framebuffer_state() path. While at it, add extra assert()s to make sure ctx->job is consistent. Fixes: 59c9623d0a75 ("panfrost: Import job data structures from v3d") Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]>
* ac/nir,radv: Optimize bounds check for 64 bit CAS.Bas Nieuwenhuizen2019-08-028-17/+37
| | | | | | | | When the application does not ask for robust buffer access. Only implemented the check in radv. Reviewed-by: Samuel Pitoiset <[email protected]>
* gallivm: fix issue with AtomicCmpXchg wrapper on llvm 3.5-3.8Roland Scheidegger2019-08-021-1/+3
| | | | | | | | | | | | | These versions still need wrapper but already have both success and failure ordering. (Compile tested on llvm 3.3, 3.7, 3.8.) v2: don't duplicate whole function (suggested by Brian). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111102 Reviewed-by: Charmaine Lee <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* util: Handle differences in pthread_setname_npMatt Turner2019-08-021-0/+11
| | | | | | | | | | | | | | | There are a lot of unfortunate differences in the implementation of this function. NetBSD and Mac OS X in particular require different arguments. https://stackoverflow.com/questions/2369738/how-to-set-the-name-of-a-thread-in-linux-pthreads/7989973#7989973 provides for a good overview of the differences. Fixes: 9c411e020d1 ("util: Drop preprocessor guards for glibc-2.12") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111264 Reviewed-by: Eric Engestrom <[email protected]> [Eric: use DETECT_OS_* instead of PIPE_OS_*] Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* util/os_time: use detect_os.h to uncouple from galliumEric Engestrom2019-08-021-11/+9
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* util/u_debug: use detect_os.hEric Engestrom2019-08-022-3/+4
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* util/os_misc: use detect_os.h to start uncoupling from galliumEric Engestrom2019-08-022-14/+15
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* util/os_memory: use detect_os.h to uncouple it from galliumEric Engestrom2019-08-024-14/+3
| | | | | | | While at it, remove p_compiler.h as well as it is unused. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* gallium: deduplicate os detection logic by using detect_os.hEric Engestrom2019-08-021-28/+19
| | | | | | | | This allows us to avoid having to rename all the PIPE_OS_* at once while still making sure PIPE_OS_* and DETECT_OS_* are always in sync. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* gallium/utils: drop PIPE_SUBSYSTEM_WINDOWS_USEREric Engestrom2019-08-0210-37/+18
| | | | | | | This is basically just an alias for PIPE_OS_WINDOWS. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* scons: rename PIPE_SUBSYSTEM_EMBEDDED to EMBEDDED_DEVICEEric Engestrom2019-08-027-8/+8
| | | | | | | It has nothing to do with the PIPE_SUBSYSTEM_* stuff from gallium. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* gallium: remove never-used PIPE_SUBSYSTEM_DRIEric Engestrom2019-08-021-4/+0
| | | | | | | | PIPE_SUBSYSTEM_DRI was introduced in dacfef158943665fc0d1 ("gallium: New configuration header.") 11 years ago, and was never used. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* util: fix typo in commentEric Engestrom2019-08-021-1/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* util: introduce detect_os.hEric Engestrom2019-08-021-0/+131
| | | | | | | | | | | Mostly copied from src/gallium/include/pipe/p_config.h, so I kept its copyright and authorship. Other than the obvious rename, the big difference is that these are always defined, to be used as `#if DETECT_OS_LINUX`. Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* freedreno/batch: fix dependency loop detectionRob Clark2019-08-021-11/+10
| | | | | | | | | | | | | | | | | We can have a scenario like: A -> B A -> C -> B When adding the A->C dependency, it doesn't really matter that C depends on something that A depends on, that isn't a necessary condition for a dependency loop. Instead what we want to know is that nothing C depends on, directly or indirectly, depends on A. We can detect this by recursively OR'ing the dependents_mask of C and all it's dependencies. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: add missing flush/invalidates for blitRob Clark2019-08-022-15/+9
| | | | | | Various things we were missing for multiple blits in a single batch. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: skip tiles with no geometryRob Clark2019-08-023-3/+66
| | | | | | | | | If no clear, and no geometry according to VSC_STATE[pipe] we can skip the tile entirely. If there is a fast-clear, we can't skip restore (clear) or resolve IBs, but we can still skip draw IB. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/a6xx: VSC overflow detection/handlingRob Clark2019-08-023-34/+266
| | | | | | | | | | | | | | | | | | Check VSC_SIZE/VSC_SIZE2 regs from cmdstream to detect overflow, and skip use of VSC visibility stream when overflow is detected, to avoid GPU hangs. This is done w/ introduction of some CP_REG_TEST/ CP_COND_REG_EXEC packet pairs. In addition, eventually (after a frame or two) detect the condition and resize the VSC buffers until overflow no longer happens. Note that this significantly reduces the initial size of the VSC buffers, backing out a previous hack to make them 16x larger than what should be typically required (the previous "solution" for VSC overflow). Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/a6xx: remove USE/IGNORE_VISIBILITY draw patchingRob Clark2019-08-022-23/+9
| | | | | | | | | | Seems this isn't needed anymore on a6xx to control whether visibility stream is used. And it would be hard to deal with if it was, for disabling use of VSC stream in draw pass. So just remove it and simplify things. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/a6xx: cleanup "blit_mem"Rob Clark2019-08-024-14/+25
| | | | | | | | | | | Rename to "control_mem", and switch to using a struct to manage the layout, rather than just ad-hoc hard-coded offsets. For recovering from VSC stream overflow, we'll need to add more, but best to clean it up first. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno: refresh tile debugRob Clark2019-08-021-15/+22
| | | | | | | | | | Fix some #ifdef'd bitrot, and get rid of #ifdef so it doesn't bitrot again. And add a prints for per-tile state. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno: update registersRob Clark2019-08-022-4/+42
| | | | | | | Pull in some updates of VSC regs Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/gmem: small cleanupRob Clark2019-08-021-2/+2
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* freedreno/drm: convert ring_pool to child_poolRob Clark2019-08-023-6/+29
| | | | | | Worth another couple percent at driver2 Signed-off-by: Rob Clark <[email protected]>
* freedreno/drm: remove idx_lockRob Clark2019-08-023-29/+24
| | | | | | | Since it ends up contended, it is a bit of a bottleneck for workloads with high driver overhead. Worth nearly +10% at gfxbench driver2. Signed-off-by: Rob Clark <[email protected]>
* freedreno/batch: always update last_fenceRob Clark2019-08-021-0/+2
| | | | | | | | | | | | | Not all flush paths come thru fd_context_flush(), so we should also set last_fence in the batch flush path. This avoids some no-op flushes just to get a fence. For example when pctx->flush_resource() triggers a flush. We should probably keep the last_fence update in fd_context_flush() as well to handle deferred flush case. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>