summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* mesa: avoid warning on WindowsErik Faye-Lund2019-08-151-1/+1
| | | | | | | | | | On Windows, p_atomic_inc_return returns an unsigned long long rather than the type the pointer refers to, so let's make sure we cast the result to the right type. Otherwise, we'll trigger a warning about the wrong format-string for the type. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* win32: unify strcasecmp definitionsErik Faye-Lund2019-08-157-3/+6
| | | | | | | | | There was two incompatible definitions of strcasecmp, which lead to a compiler warning. Let's clean this up by only leaving one of them, and using that one all the time. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* mesa/main: avoid warning when casting offset to pointerErik Faye-Lund2019-08-151-1/+1
| | | | | | | | This generates a warning on some 64-bit systems, so let's cast to a properly sized integer first. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: avoid warning when casting bogus pointerErik Faye-Lund2019-08-151-1/+1
| | | | | | | | This intentionally-bogus pointer generates a warning on some 64-bit systems, so let's cast to a properly-sized integer first. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* glsl: fixup u64-warningErik Faye-Lund2019-08-151-1/+1
| | | | | | | | | Similarly to the unsigned-version, we need to first cast the result to a suiting integer before negating the number, otherwise we'll trigger a warning. Signed-off-by: Erik Faye-Lund <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* isl: Enable Unorm Path in Color PipeKenneth Graunke2019-08-152-0/+9
| | | | | | | | | | | | | | | | Improves performance on my Icelake 8x8 locked to 700Mhz. For example, some GfxBench5 subtests have the following results: - [i965] gl_manhattan: ................ 7.01119% +/- 0.180971% (n=5) - [i965] gl_4 (Car Chase): 4.24351% +/- 0.175622% (n=5) - [i965] gl_blending: ................ 3.36327% +/- 0.180267% (n=5) - [i965] gl_5_normal (Aztec Ruins): 1.67962% +/- 0.243534% (n=10) - [iris] gl_manhattan: ................ 3.92357% +/- 0.073965% (n=25) - [iris] gl_4 (Car Chase): 2.17746% +/- 0.0826858% (n=5) - [iris] gl_blending: ................ 2.79599% +/- 0.803652% (n=15) - [iris] gl_5_normal (Aztec Ruins): 1.30930% +/- 0.106523% (n=25) Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Properly initialize device->slice_hash.Rafael Antognolli2019-08-151-2/+2
| | | | | | | | | | | | When subslices_delta == 0 and we take the early return, device->slice_hash is not initialized on GEN11. It then causes a segfault when going through anv_DestroyDevice, if compiled with valgrind. Fixes: 7bc022b4bbc ("anv/gen11: Emit SLICE_HASH_TABLE when pipes are unbalanced.) Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: Fix resource leak in error pathDanylo Piliaiev2019-08-151-0/+1
| | | | | | | | | CID: 1452261 Fixes: 04a99515 "intel/compiler: add ability to override shader's assembly" Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* panfrost: Implement native RECT texturesAlyssa Rosenzweig2019-08-141-8/+4
| | | | | | | | | | | | We started honouring the normalized_coords flag in the texture descriptor, but a bisection revealed that broke RECT textures -- since we were *also* lowering them in the shader. So just remove the shader-based lowering, use native RECT textures, and enjoy the nominal reduction in complexity and performance boost. Fixes: 3e47a1181b7 ("panfrost: Add MALI_SAMP_NORM_COORDS flag") Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Add R10G10B10A2_SSCALED vertex formatAlyssa Rosenzweig2019-08-141-0/+4
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Disassemble UBO index explicitlyAlyssa Rosenzweig2019-08-141-2/+9
| | | | | | It's a bit of a special case but that's fine. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Account for unaligned UBOs when promoting uniformsAlyssa Rosenzweig2019-08-141-4/+20
| | | | | | | | We only know how to promote aligned accesses, although theoretically we should be able to promote unaligned to swizzles in the future. Check this. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Add mir_ubo_shift helperAlyssa Rosenzweig2019-08-142-0/+22
| | | | | | Different UBO reads have different shift requirements. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Address emit_ubo_read offset in bytesAlyssa Rosenzweig2019-08-141-6/+6
| | | | | | | We'll want to be smarter about unaligned reads, so let's get this code all in one place. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Wire writemask into UBO readsAlyssa Rosenzweig2019-08-142-10/+18
| | | | | | Helps the disassembly be clearer and maybe regalloc be smarter. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Identify UBO/SSBO op symmetryAlyssa Rosenzweig2019-08-145-15/+28
| | | | | | It's the same thing, just shifted. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Extend blending to MRTAlyssa Rosenzweig2019-08-143-44/+58
| | | | | | | Our hardware supports independent (per-RT) blending, but we need to route those settings through from Gallium. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Emit store_output branch just-in-timeAlyssa Rosenzweig2019-08-142-27/+62
| | | | | | | | We'll need multiple branches for MRT, so we can't defer. Also, we need to track dependencies to ensure r0 is set to the correct value for each store_output. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Add dont_eliminate flagAlyssa Rosenzweig2019-08-142-0/+4
| | | | | | We need to treat fragment writes specially. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/mfbd: Stuff in RT countAlyssa Rosenzweig2019-08-141-8/+10
| | | | | | | | | Fixes DATA_INVALID_FAULTs with multiple render targets. We do always allocate space for 4 cbufs just to keep things sane. This may not be strictly necessary. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/decode: Dump FBD tagged pointerAlyssa Rosenzweig2019-08-141-2/+7
| | | | | | Turns out the rt count is stuffed in here.. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/decode: Decode invalid access type upon faultAlyssa Rosenzweig2019-08-142-2/+33
| | | | | | | We don't have a good way to confirm this, but it parallels the kernel definitons for MMU faults nicely. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/decode: Fix duplicate heap_end propertyAlyssa Rosenzweig2019-08-141-1/+1
| | | | | | | This was supposed to read heap_start. It's the same value but still, better get this right. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Note "MFBD preload disable" bitAlyssa Rosenzweig2019-08-143-4/+16
| | | | | | It's a chicken bit, as far as I can tell. Buck buck. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/bifrost: Link in compilerAlyssa Rosenzweig2019-08-142-6/+12
| | | | | | | We enable the standalone compiler, build the new files, and let it blast. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/bifrost: Check in remainder of the Bifrost compilerAlyssa Rosenzweig2019-08-145-0/+711
| | | | | | | What it says on the tin. Signed-off-by: Ryan Houdek <[email protected]> Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/bifrost: Add bifrost_print.c/hAlyssa Rosenzweig2019-08-142-0/+191
| | | | | | | IR printers. Signed-off-by: Ryan Houdek <[email protected]> Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/bifrost: Style format the disassemblerAlyssa Rosenzweig2019-08-144-1084/+2270
| | | | | | $ astyle *.c *.h --style=linux -s8 Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/bifrost: Stub out standalone compilerAlyssa Rosenzweig2019-08-141-2/+47
| | | | | | | | We don't actually have a standalone compiler in-tree yet, but let's get prepared for when we do. Signed-off-by: Ryan Houdek <[email protected]> Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/bifrost: Sync disassembler with Ryan's treeAlyssa Rosenzweig2019-08-143-19/+130
| | | | | | | | | The disassembler was updated to move common code with the compiler into a shared header. Additional, some new ops and control registers relating to rounding were added. Signed-off-by: Ryan Houdek <[email protected]> Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Remove standalone pandecode toolAlyssa Rosenzweig2019-08-142-156/+0
| | | | | | | | | Now that panwrap has gained the ability to trace directly without dumping to the filesystem, there's no need to lug around this tool. I can assure you nobody will miss it. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Fix disassembly termination conditionAlyssa Rosenzweig2019-08-141-2/+2
| | | | | | Fixes: 863bdd1f8dc ("pan/midgard: Break, not return, in disassembler") Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Ensure we upload at least 1 blend RTAlyssa Rosenzweig2019-08-141-1/+1
| | | | | | Otherwise we'll get memory junk. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Zero tripipe on initializeAlyssa Rosenzweig2019-08-141-1/+1
| | | | | | | I don't think the hardware cares, but this adds a lot of noise to traces that we would rather not need to look at. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Improve disassembler robustnessAlyssa Rosenzweig2019-08-141-0/+9
| | | | | | | | Some memory corruption / etc issues let to an accidental "fuzzing" of the disassembler ;) This uncovered some issues leading to a disassembler hang, so let's fix that. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/decode: Split public.h outAlyssa Rosenzweig2019-08-142-12/+53
| | | | | | | We want a defined ABI for tracing; this set of functions should be as small as strictly necessary to minimize panwrap shenanigans. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/decode: Prefer uint64_t to mali_ptrAlyssa Rosenzweig2019-08-143-14/+14
| | | | | | This removes an unwanted dependency on panfrost-job.h Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Allocate spill_slot onceAlyssa Rosenzweig2019-08-141-1/+3
| | | | | | Multiple spill moves share a single spill slot. Issue found in Krita. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Use hint on midgard_instruction for spill_moveAlyssa Rosenzweig2019-08-142-4/+16
| | | | | | | This allows us to have multiple spill moves, whereas otherwise for N spill moves, the first N-1 would be clobbered. Issue found in Krita. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Remove panfrost_add_dependency assertsAlyssa Rosenzweig2019-08-141-4/+0
| | | | | | | | It doesn't... make a ton of sense to need to assert and this routine is hotter than you might expect. Doesn't matter for release builds, of course. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* radeonsi: add support for RenoirMarek Olšák2019-08-148-2/+17
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* meson: add nir tests to the compiler/nir test suiteEric Engestrom2019-08-141-2/+5
| | | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* EGL: sync headers with KhronosEric Engestrom2019-08-144-19/+124
| | | | Signed-off-by: Eric Engestrom <[email protected]>
* relnotes: Add new ext on etnaviv for 19.2.Christian Gmeiner2019-08-141-0/+1
| | | | | Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Jonathan Marek <[email protected]>
* etnaviv: fix weird indentationChristian Gmeiner2019-08-141-7/+3
| | | | | | Fixes: 797a2e4fd03 ("etnaviv: update logic to determine uniform limits") Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Jonathan Marek <[email protected]>
* nir/algebraic: Reassociate shift-by-constant of shift-by-constantIan Romanick2019-08-141-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v2: After some review discussion with Alyssa, the replacements now correct account for cases where (b+c) >= bitsize. v3: Use a temporary to simplify the Python code quite a bit. Suggested by Jason. Haswell and all Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16251155 -> 16249576 (<.01%) instructions in affected programs: 232627 -> 231048 (-0.68%) helped: 547 HURT: 1 helped stats (abs) min: 1 max: 15 x̄: 2.89 x̃: 3 helped stats (rel) min: 0.04% max: 7.84% x̄: 1.14% x̃: 1.06% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% 95% mean confidence interval for instructions value: -3.12 -2.65 95% mean confidence interval for instructions %-change: -1.20% -1.06% Instructions are helped. total cycles in shared programs: 365924392 -> 365372103 (-0.15%) cycles in affected programs: 59207053 -> 58654764 (-0.93%) helped: 497 HURT: 34 helped stats (abs) min: 1 max: 29300 x̄: 1118.16 x̃: 16 helped stats (rel) min: <.01% max: 10.59% x̄: 1.82% x̃: 1.82% HURT stats (abs) min: 2 max: 424 x̄: 101.03 x̃: 63 HURT stats (rel) min: 0.07% max: 46.17% x̄: 4.72% x̃: 2.06% 95% mean confidence interval for cycles value: -1426.41 -653.77 95% mean confidence interval for cycles %-change: -1.66% -1.15% Cycles are helped. total spills in shared programs: 8870 -> 8871 (0.01%) spills in affected programs: 104 -> 105 (0.96%) helped: 0 HURT: 1 Ivy Bridge and all pre-Gen7 platforms had similar results. (Ivy Bridge shown) total instructions in shared programs: 11956236 -> 11955635 (<.01%) instructions in affected programs: 94110 -> 93509 (-0.64%) helped: 106 HURT: 0 helped stats (abs) min: 1 max: 14 x̄: 5.67 x̃: 4 helped stats (rel) min: 0.12% max: 4.71% x̄: 1.96% x̃: 0.76% 95% mean confidence interval for instructions value: -6.62 -4.72 95% mean confidence interval for instructions %-change: -2.27% -1.64% Instructions are helped. total cycles in shared programs: 179296340 -> 178788044 (-0.28%) cycles in affected programs: 51009603 -> 50501307 (-1.00%) helped: 82 HURT: 7 helped stats (abs) min: 5 max: 27820 x̄: 6199.00 x̃: 16 helped stats (rel) min: 0.30% max: 8.16% x̄: 2.58% x̃: 3.11% HURT stats (abs) min: 2 max: 8 x̄: 3.14 x̃: 2 HURT stats (rel) min: 0.02% max: 1.40% x̄: 0.34% x̃: 0.10% 95% mean confidence interval for cycles value: -7649.38 -3773.00 95% mean confidence interval for cycles %-change: -2.71% -1.99% Cycles are helped. Reviewed-by: Alyssa Rosenzweig <[email protected]> [v2] Reviewed-by: Jason Ekstrand <[email protected]>
* nir/algebraic: Reassociate add-and-shift to be shift-and-addIan Romanick2019-08-141-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A common thing in many shaders: uniform vs { vec4 bones[...]; }; ... x = some_calculation(bones[i + 0]); y = some_calculation(bones[i + 1]); z = some_calculation(bones[i + 2]); This turns into stuff like vec1 32 ssa_12 = iadd ssa_11, ssa_0 vec1 32 ssa_13 = ishl ssa_12, ssa_3 vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0) vec1 32 ssa_15 = iadd ssa_11, ssa_1 vec1 32 ssa_16 = ishl ssa_15, ssa_3 vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0) vec1 32 ssa_18 = iadd ssa_11, ssa_2 vec1 32 ssa_19 = ishl ssa_18, ssa_3 vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0) By reassociating the shift and the add, we can reduce this to vec1 32 ssa_12 = ishl ssa_11, ssa_3 vec1 32 ssa_13 = iadd ssa_12, ssa_0 vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0) vec1 32 ssa_16 = iadd ssa_12, ssa_1 vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0) vec1 32 ssa_19 = iadd ssa_12, ssa_2 vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0) v2: Add some commentary from Rhys Perry's nearly identical patch. All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16277758 -> 16250704 (-0.17%) instructions in affected programs: 1440284 -> 1413230 (-1.88%) helped: 4920 HURT: 6 helped stats (abs) min: 1 max: 69 x̄: 5.50 x̃: 4 helped stats (rel) min: 0.10% max: 18.33% x̄: 2.21% x̃: 1.79% HURT stats (abs) min: 1 max: 12 x̄: 4.50 x̃: 3 HURT stats (rel) min: 0.18% max: 3.23% x̄: 1.91% x̃: 2.55% 95% mean confidence interval for instructions value: -5.67 -5.31 95% mean confidence interval for instructions %-change: -2.26% -2.16% Instructions are helped. total cycles in shared programs: 367118526 -> 365895358 (-0.33%) cycles in affected programs: 93504145 -> 92280977 (-1.31%) helped: 2754 HURT: 1269 helped stats (abs) min: 1 max: 47039 x̄: 460.66 x̃: 16 helped stats (rel) min: <.01% max: 34.93% x̄: 3.77% x̃: 1.12% HURT stats (abs) min: 1 max: 1500 x̄: 35.85 x̃: 9 HURT stats (rel) min: 0.01% max: 17.35% x̄: 2.18% x̃: 0.75% 95% mean confidence interval for cycles value: -387.31 -220.78 95% mean confidence interval for cycles %-change: -2.11% -1.68% Cycles are helped. LOST: 1 GAINED: 1 Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/find_array_copies: Reject copies with mismatched lengthsAndrii Simiklit2019-08-141-4/+8
| | | | | | | | | | | | | | | | | copy_deref for wildcard dereferences requires the same arrays lengths otherwise it leads to a crash in optimizations like 'nir_opt_copy_prop_vars' because these optimizations expect 'copy_deref' just for arrays with the same lengths. v2: check was moved to 'try_match_deref' to fix aoa cases (Jason Ekstrand <[email protected]>) v3: -fixed comment -the condition merged with other one (Jason Ekstrand <[email protected]>) Reviewed-by: Jason Ekstrand <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111286 Signed-off-by: Andrii Simiklit <[email protected]>
* pan/midgard: Prefix blobber-db output for greppingAlyssa Rosenzweig2019-08-144-6/+14
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Implement blobber-dbAlyssa Rosenzweig2019-08-144-8/+47
| | | | | | | | | We wire through some shader-db-style stats on the current shader in the disassemble so we can get a quick estimate of shader complexity from a trace. Signed-off-by: Alyssa Rosenzweig <[email protected]> Suggested-by: Rob Clark <[email protected]>