summaryrefslogtreecommitdiffstats
path: root/cmd
Commit message (Collapse)AuthorAgeFilesLines
* Multipath autoreplace, control enclosure LEDs, event rate limitingTony Hutter2016-10-197-31/+216
| | | | | | | | | | | | | | | | | | | | | | | | | | 1. Enable multipath autoreplace support for FMA. This extends FMA autoreplace to work with multipath disks. This requires libdevmapper to be installed at build time. 2. Turn on/off fault LEDs when VDEVs become degraded/faulted/online Set ZED_USE_ENCLOSURE_LEDS=1 in zed.rc to have ZED turn on/off the enclosure LED for a drive when a drive becomes FAULTED/DEGRADED. Your enclosure must be supported by the Linux SES driver for this to work. The enclosure LED scripts work for multipath devices as well. The scripts will clear the LED when the fault is cleared. 3. Rate limit ZIO delay and checksum events so as not to flood ZED ZIO delay and checksum events are rate limited to 5/sec in the zfs module. Reviewed-by: Richard Laager <[email protected]> Reviewed by: Don Brady <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #2449 Closes #3017 Closes #5159
* Fix coverity defects: CID 147643, 152204, 49339GeLiXin2016-10-181-1/+2
| | | | | | | | | | | | | | | | | CID 147643: Type: String not null terminated - make sure that the string is null terminated before strlen and fprintf. CID 152204: Type: Copy into fixed size buffer - since strlcpy isn't availabe here, use strncpy and terminate the string manually. CID 49339: Type: Buffer not null terminated - since strlcpy isn't availabe here, terminate the string manually before fprintf. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: GeLiXin <[email protected]> Closes #5283
* Pass status_cbdata_t to print_status_config() and friendsHåkan Johansson2016-10-171-68/+66
| | | | | | | | | | | | | | | | | | First rename spare_cbdata_t cb -> spare_cb in print_status_config(), to free up cb. Using the structure removes the explicit parameters namewidth and name_flags from several functions. Also use status_cbdata_t for print_import_config(). This simplifies print_logs(). Remove the parameter 'verbose' for print_logs(). It does not really mean verbose, it selected between the print_status_config and print_import_config() paths. This selection is now done by cb_print_config of spare_cbdata_t. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Håkan Johansson <[email protected]> Closes #5259
* Fix coverity defects: CID 147606, 147609cao2016-10-121-1/+4
| | | | | | | | coverity scan CID:147606, Type:resource leak coverity scan CID:147609, Type:resource leak Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5245
* Fix coverity defects: CID 147639GeLiXin2016-10-101-5/+6
| | | | | | | | | When array is passed as a parameter it degenerates into a pointer so the sizeof(path) in is_shorthand_path() and always get return value of 8, instead of the string length we want. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: GeLiXin <[email protected]> Closes #5198
* Fix file permissionsBrian Behlendorf2016-10-081-0/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following new test cases need to have execute permissions set: userquota/groupspace_003_pos.ksh userquota/userquota_013_pos.ksh userquota/userspace_003_pos.ksh upgrade/upgrade_userobj_001_pos.ksh upgrade/setup.ksh upgrade/cleanup.ksh The following source files accidentally were marked executable: lib/libzpool/kernel.c lib/libshare/nfs.c lib/libzfs/libzfs_dataset.c lib/libzfs/libzfs_util.c tests/zfs-tests/cmd/rm_lnkcnt_zero_file/rm_lnkcnt_zero_file.c tests/zfs-tests/cmd/dir_rd_update/dir_rd_update.c cmd/zed/zed_exec.c module/icp/core/kcf_sched.c module/zfs/dsl_pool.c module/zfs/arc.c module/nvpair/nvpair.c man/man5/zfs-module-parameters.5 Reviewed-by: GeLiXin <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Jinshan Xiong <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #5241
* Fletcher4: Incremental updates and ctx calculationBrian Behlendorf2016-10-071-0/+78
|\ | | | | | | | | | | | | | | | | Fixes ABI issues with fletcher4 code, adds support for incremental updates, and adds ztest method for testing. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Signed-off-by: Gvozden Neskovic <[email protected]> Closes #5164
| * Fletcher4: Incremental using SIMDGvozden Neskovic2016-10-051-0/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Combine incrementally computed fletcher4 checksums. Checksums are combined a posteriori, allowing for parallel computation on chunks to be implemented if required. The algorithm is general, and does not add changes in each SIMD implementation. New test in ztest verifies incremental fletcher computations. Checksum combining matrix for two buffers `a` and `b`, where `Ca` and `Cb` are respective fletcher4 checksums, `Cab` is combined checksum, `s` is size of buffer `b` (divided by sizeof(uint32_t)) is: Cab[A] = Cb[A] + Ca[A] Cab[B] = Cb[B] + Ca[B] + s * Ca[A] Cab[C] = Cb[C] + Ca[C] + s * Ca[B] + s(s+1)/2 * Ca[A] Cab[D] = Cb[D] + Ca[D] + s * Ca[C] + s(s+1)/2 * Ca[B] + s(s+1)(s+2)/6 * Ca[A] NOTE: this calculation overflows for larger buffers. Thus, internally, the calculation is performed on 8MiB chunks. Signed-off-by: Gvozden Neskovic <[email protected]>
* | Add python style checkingBrian Behlendorf2016-10-073-60/+53
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a make recipe for flake8 to enable python style checking. Ensure all python scripts pass flake8. Return an error code of 0 for arcstat.py -v and dbufstat.py -v. Add test cases for python scripts. Reviewed by: Richard Laager <[email protected]> Reviewed-by: Richard Elling <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ian Lee <[email protected]> Closes #5230
| * | Correct exit code for dbufstat -v and arcstat -vGiuseppe Di Natale2016-10-062-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Both scripts were returning an error code of 1 when using the -v argument. -v should exit with an error code of 0. Signed-off-by: Giuseppe Di Natale <[email protected]>
| * | Correct style in arcstat and arc_summaryGiuseppe Di Natale2016-10-062-58/+51
| | | | | | | | | | | | | | | | | | | | | Fix arcstat and arc_summary so they pass flake8 python code style checks. Signed-off-by: Giuseppe Di Natale <[email protected]>
* | | Add support for user/group dnode accounting & quotaJinshan Xiong2016-10-072-13/+78
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch tracks dnode usage for each user/group in the DMU_USER/GROUPUSED_OBJECT ZAPs. ZAP entries dedicated to dnode accounting have the key prefixed with "obj-" followed by the UID/GID in string format (as done for the block accounting). A new SPA feature has been added for dnode accounting as well as a new ZPL version. The SPA feature must be enabled in the pool before upgrading the zfs filesystem. During the zfs version upgrade, a "quotacheck" will be executed by marking all dnode as dirty. ZoL-bug-id: https://github.com/zfsonlinux/zfs/issues/3500 Signed-off-by: Jinshan Xiong <[email protected]> Signed-off-by: Johann Lombardi <[email protected]>
* | Fix coverity defects: CID 150953, 147603, 147610luozhengzheng2016-10-041-2/+11
|/ | | | | | | | | coverity scan CID:150953,type: uninitialized scalar variable coverity scan CID:147603,type: Resource leak coverity scan CID:147610,type: Resource leak Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: luozhengzheng <[email protected]> Closes #5209
* OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-RTony Hutter2016-10-031-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reviewed by: George Wilson <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: Saso Kiselkov <[email protected]> Reviewed by: Richard Lowe <[email protected]> Approved by: Garrett D'Amore <[email protected]> Ported by: Tony Hutter <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/4185 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/45818ee Porting Notes: This code is ported on top of the Illumos Crypto Framework code: https://github.com/zfsonlinux/zfs/pull/4329/commits/b5e030c8dbb9cd393d313571dee4756fbba8c22d The list of porting changes includes: - Copied module/icp/include/sha2/sha2.h directly from illumos - Removed from module/icp/algs/sha2/sha2.c: #pragma inline(SHA256Init, SHA384Init, SHA512Init) - Added 'ctx' to lib/libzfs/libzfs_sendrecv.c:zio_checksum_SHA256() since it now takes in an extra parameter. - Added CTASSERT() to assert.h from for module/zfs/edonr_zfs.c - Added skein & edonr to libicp/Makefile.am - Added sha512.S. It was generated from sha512-x86_64.pl in Illumos. - Updated ztest.c with new fletcher_4_*() args; used NULL for new CTX argument. - In icp/algs/edonr/edonr_byteorder.h, Removed the #if defined(__linux) section to not #include the non-existant endian.h. - In skein_test.c, renane NULL to 0 in "no test vector" array entries to get around a compiler warning. - Fixup test files: - Rename <sys/varargs.h> -> <varargs.h>, <strings.h> -> <string.h>, - Remove <note.h> and define NOTE() as NOP. - Define u_longlong_t - Rename "#!/usr/bin/ksh" -> "#!/bin/ksh -p" - Rename NULL to 0 in "no test vector" array entries to get around a compiler warning. - Remove "for isa in $($ISAINFO); do" stuff - Add/update Makefiles - Add some userspace headers like stdio.h/stdlib.h in places of sys/types.h. - EXPORT_SYMBOL *_Init/*_Update/*_Final... routines in ICP modules. - Update scripts/zfs2zol-patch.sed - include <sys/sha2.h> in sha2_impl.h - Add sha2.h to include/sys/Makefile.am - Add skein and edonr dirs to icp Makefile - Add new checksums to zpool_get.cfg - Move checksum switch block from zfs_secpolicy_setprop() to zfs_check_settable() - Fix -Wuninitialized error in edonr_byteorder.h on PPC - Fix stack frame size errors on ARM32 - Don't unroll loops in Skein on 32-bit to save stack space - Add memory barriers in sha2.c on 32-bit to save stack space - Add filetest_001_pos.ksh checksum sanity test - Add option to write psudorandom data in file_write utility
* Add parity generation/rebuild using 128-bits NEON for Aarch64Romain Dolbeau2016-10-031-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This re-use the framework established for SSE2, SSSE3 and AVX2. However, GCC is using FP registers on Aarch64, so unlike SSE/AVX2 we can't rely on the registers being left alone between ASM statements. So instead, the NEON code uses C variables and GCC extended ASM syntax. Note that since the kernel explicitly disable vector registers, they have to be locally re-enabled explicitly. As we use the variable's number to define the symbolic name, and GCC won't allow duplicate symbolic names, numbers have to be unique. Even when the code is not going to be used (e.g. the case for 4 registers when using the macro with only 2). Only the actually used variables should be declared, otherwise the build will fails in debug mode. This requires the replacement of the XOR(X,X) syntax by a new ZERO(X) macro, which does the same thing but without repeating the argument. And perhaps someday there will be a machine where there is a more efficient way to zero a register than XOR with itself. This affects scalar, SSE2, SSSE3 and AVX2 as they need the new macro. It's possible to write faster implementations (different scheduling, different unrolling, interleaving NEON and scalar, ...) for various cores, but this one has the advantage of fitting in the current state of the code, and thus is likely easier to review/check/merge. The only difference between aarch64-neon and aarch64-neonx2 is that aarch64-neonx2 unroll some functions some more. Reviewed-by: Gvozden Neskovic <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Romain Dolbeau <[email protected]> Closes #4801
* Fix coverity defects: CID 147448, 147449, 147450, 147453, 147454luozhengzheng2016-10-022-2/+2
| | | | | | | | | | | coverity scan CID:147448,type: unchecked return value coverity scan CID:147449,type: unchecked return value coverity scan CID:147450,type: unchecked return value coverity scan CID:147453,type: unchecked return value coverity scan CID:147454,type: unchecked return value Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: luozhengzheng <[email protected]> Closes #5206
* Fix coverity defects: CID 147536, 147537, 147538GeLiXin2016-09-301-4/+5
| | | | | | | | | | | | coverity scan CID:147536, type: Argument cannot be negative - may write or close fd which is negative coverity scan CID:147537, type: Argument cannot be negative - may call dup2 with a negative fd coverity scan CID:147538, type: Argument cannot be negative - may read or fchown with a negative fd Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: GeLiXin <[email protected]> Closes #5185
* raidz_test: respect wall timeGvozden Neskovic2016-09-302-10/+28
| | | | | | | | When timeout is specified (-t), stop worker threads in the middle of work units. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Gvozden Neskovic <[email protected]> Issue #5180 Closes #5190
* Fix coverity defects: CID 147443, 147656, 147655, 147441, 147653BearBabyLiu2016-09-292-2/+2
| | | | | | | | | | | | coverity scan CID:147443, Type: Buffer not null terminated coverity scan CID:147656, Type: Copy into fixed size buffer coverity scan CID:147655, Type: Copy into fixed size buffer coverity scan CID:147441, Type: Buffer not null terminated coverity scan CID:147653, Type: Copy into fixed size buffer Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: liuhuang <[email protected]> Closes #5165
* Fix coverity defects: CID 147610, 147608, 147607cao2016-09-292-16/+19
| | | | | | | | | | coverity scan CID:147610, Type: Resource leak. coverity scan CID:147608, Type: Resource leak. coverity scan CID:147607, Type: Resource leak. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5143
* Fix coverity defects: CID 147602 147604cao2016-09-231-4/+8
| | | | | | | | | coverity scan CID:147604, Type: Resource leak. coverity scan CID:147602, Type: Resource leak. reason: safe_malloc calcvs, goto children but not free calcvs. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5155
* Fix coverity defects: CID 147613 147614 147616 147617luozhengzheng2016-09-231-4/+8
| | | | | | | | | | coverity scan CID:147617,type: resource leaks coverity scan CID:147616,type: resource leaks coverity scan CID:147614,type: resource leaks coverity scan CID:147613,type: resource leaks Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: luozhengzheng <[email protected]> Closes #5150
* Fix coverity defectsluozhengzheng2016-09-223-14/+57
| | | | | | | | | | | | | | | | | | | | | | | 1.coverity scan CID:147445 function zfs_do_send in zfs_main.c Buffer not null terminated (BUFFER_SIZE_WARNING) 2.coverity scan CID:147443 function zfs_do_bookmark in zfs_main.c Buffer not null terminated (BUFFER_SIZE_WARNING) 3.coverity scan CID:147660 function main in zinject.c Passing string argv[0] of unknown size to strcpy By the way, the leak of g_zfs is fixed. 4.coverity scan CID: 147442 function make_disks in zpool_vdev.c Buffer not null terminated (BUFFER_SIZE_WARNING) 5.coverity scan CID: 147661 function main in dir_rd_update.c passing string cp1 of unknown size to strcpy Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: luozhengzheng <[email protected]> Closes #5130
* Fix coverity defectscao2016-09-202-1/+4
| | | | | | | | | | | | | | | | Fix coverity defects: coverity scan CID:147623, Type: Resource leak. coverity scan CID:147622, Type: Resource leak. reason: zpool_open zhp, but not zpool_close zhp. so resource leak. coverity scan CID:147621, Type: Resource fd leak. coverity scan CID:147620, Type: Resource fd leak. reason: do_write do_read open file fd,but exception not close fd. delete unuse definition DMU_OS_IS_L2COMPRESSIBLE. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: cao.xuewen <[email protected]> Closes #5137
* Change /etc/mtab to /proc/self/mountsslashdd2016-09-203-19/+20
| | | | | | | | | | | Fix misleading error message: "The /dev/zfs device is missing and must be created.", if /etc/mtab is missing. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Eric Desrochers <[email protected]> Closes #4680 Closes #5029
* Fix Coverity defectsluozhengzheng2016-09-171-1/+1
| | | | | | | CID 147659, 150952 and 147645 Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: luozhengzheng <[email protected]> Closes #5103
* DLPX-40252 integrate EP-476 compressed zfs send/receiveDan Kimmel2016-09-132-10/+29
| | | | | | | | Authored by: Dan Kimmel <[email protected]> Reviewed by: Tom Caputi <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Ported by: David Quigley <[email protected]> Issue #5078
* OpenZFS 6950 - ARC should cache compressed dataGeorge Wilson2016-09-132-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Authored by: George Wilson <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Tom Caputi <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Ported by: David Quigley <[email protected]> This review covers the reading and writing of compressed arc headers, sharing data between the arc_hdr_t and the arc_buf_t, and the implementation of a new dbuf cache to keep frequently access data uncompressed. I've added a new member to l1 arc hdr called b_pdata. The b_pdata always hangs off the arc_buf_hdr_t (if an L1 hdr is in use) and points to the physical block for that DVA. The physical block may or may not be compressed. If compressed arc is enabled and the block on-disk is compressed, then the b_pdata will match the block on-disk and remain compressed in memory. If the block on disk is not compressed, then neither will the b_pdata. Lastly, if compressed arc is disabled, then b_pdata will always be an uncompressed version of the on-disk block. Typically the arc will cache only the arc_buf_hdr_t and will aggressively evict any arc_buf_t's that are no longer referenced. This means that the arc will primarily have compressed blocks as the arc_buf_t's are considered overhead and are always uncompressed. When a consumer reads a block we first look to see if the arc_buf_hdr_t is cached. If the hdr is cached then we allocate a new arc_buf_t and decompress the b_pdata contents into the arc_buf_t's b_data. If the hdr already has a arc_buf_t, then we will allocate an additional arc_buf_t and bcopy the uncompressed contents from the first arc_buf_t to the new one. Writing to the compressed arc requires that we first discard the b_pdata since the physical block is about to be rewritten. The new data contents will be passed in via an arc_buf_t (uncompressed) and during the I/O pipeline stages we will copy the physical block contents to a newly allocated b_pdata. When an l2arc is inuse it will also take advantage of the b_pdata. Now the l2arc will always write the contents of b_pdata to the l2arc. This means that when compressed arc is enabled that the l2arc blocks are identical to those stored in the main data pool. This provides a significant advantage since we can leverage the bp's checksum when reading from the l2arc to determine if the contents are valid. If the compressed arc is disabled, then we must first transform the read block to look like the physical block in the main data pool before comparing the checksum and determining it's valid. OpenZFS-issue: https://www.illumos.org/issues/6950 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7fc10f0 Issue #5078
* Fix memleak in zfs_do_* and zpool_do_*luozhengzheng2016-09-122-15/+58
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: luozhengzheng <[email protected]> Closes #5056
* Allow ZVOL bookmarks to be listed recursivelyloli10K2016-09-121-3/+3
| | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #4503 Closes #5072
* Fix memory/fd leak in check_file() and is_spare()liuhuang2016-09-121-2/+7
| | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: liuhuang <[email protected]> Closes #5085
* Bring over illumos ZFS FMA logic -- phase 1Don Brady2016-09-0110-3/+1497
| | | | | | | | | | | | | This first phase brings over the ZFS SLM module, zfs_mod.c, to handle auto operations in response to disk events. Disk event monitoring is provided from libudev and generates the expected payload schema for zfs_mod. This work leverages the recently added devid and phys_path strings in the vdev label. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Don Brady <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #4673
* Fix zhack argument processingBrian Behlendorf2016-08-311-4/+5
| | | | | | | | | | | | | | The argument processing is zhack makes the assumption that getopt() will not permute argv. This isn't true for the GNU implementation of getopt() unless the optstring is prefixed with a '+'. In which case this is equivalent to setting the POSIXLY_CORRECT environment variable In addition, update the usage() and optstrings to reflect the existing supported options. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: liaoyuxiangqin <[email protected]> Closes #5047
* zdb: fencepost error at zdb_cb.zcb_embedded_histogram[][]Gvozden Neskovic2016-08-161-1/+1
| | | | | | | | | | | | Erroneous access detected by gcc UndefinedBehaviorSanitizer: `zdb.c:2424:7: runtime error: index 112 out of bounds for type 'uint64_t [112]'` Fix: increase histogram size by 1 to accommodate all possible sizes. Signed-off-by: Gvozden Neskovic <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4934 Issue #4883
* OpenZFS 5997 - FRU field not set during pool creation and never updatedHans Rosenfeld2016-08-125-13/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Authored by: Hans Rosenfeld <[email protected]> Reviewed by: Dan Fields <[email protected]> Reviewed by: Josef Sipek <[email protected]> Reviewed by: Richard Elling <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Robert Mustacchi <[email protected]> Signed-off-by: Don Brady <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/5997 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/1437283 Porting Notes: In addition to the OpenZFS changes this patch realigns the events with those found in OpenZFS. Events which would be logged as sysevents on illumos have been been mapped to the 'sysevent' class for Linux. In addition, several subclass names have been changed to match what is used in OpenZFS. In all cases this means a '.' was changed to an '_' in the subclass. The scripts provided by ZoL have been updated, however users which provide scripts for any of the following events will need to rename them based on the new subclass names. ereport.fs.zfs.config.sync sysevent.fs.zfs.config_sync ereport.fs.zfs.zpool.destroy sysevent.fs.zfs.pool_destroy ereport.fs.zfs.zpool.reguid sysevent.fs.zfs.pool_reguid ereport.fs.zfs.vdev.remove sysevent.fs.zfs.vdev_remove ereport.fs.zfs.vdev.clear sysevent.fs.zfs.vdev_clear ereport.fs.zfs.vdev.check sysevent.fs.zfs.vdev_check ereport.fs.zfs.vdev.spare sysevent.fs.zfs.vdev_spare ereport.fs.zfs.vdev.autoexpand sysevent.fs.zfs.vdev_autoexpand ereport.fs.zfs.resilver.start sysevent.fs.zfs.resilver_start ereport.fs.zfs.resilver.finish sysevent.fs.zfs.resilver_finish ereport.fs.zfs.scrub.start sysevent.fs.zfs.scrub_start ereport.fs.zfs.scrub.finish sysevent.fs.zfs.scrub_finish ereport.fs.zfs.bootfs.vdev.attach sysevent.fs.zfs.bootfs_vdev_attach
* Fix infinite loop when zdb -R with d flagChunwei Chen2016-08-111-2/+2
| | | | | | | | | Also print decompress progress to stderr so it wouldn't pollute raw output with r flag. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4956
* Fix gcc -Warray-bounds check for dump_object() in zdbBrian Behlendorf2016-08-021-5/+14
| | | | | | | | | | | | | | | | As of gcc 6.1.1 20160621 (Red Hat 6.1.1-3) an array bounds warnings is detected in the zdb the dump_object() function. The analysis is correct but difficult to interpret because this is implemented as a macro. Rework the ZDB_OT_NAME in to a function and remove the case detected by gcc which is a side effect of the DMU_OT_IS_VALID() macro. zdb.c: In function ‘dump_object’: zdb.c:1931:288: error: array subscript is outside array bounds [-Werror=array-bounds] Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Gvozden Neskovic <[email protected]> Closes #4907
* ztest: memory leaks reported by AddressSanitizerGvozden Neskovic2016-07-291-1/+4
| | | | | | | | | | | | | | | | Leaks reported by using AddressSanitizer, GCC 6.1.0 Direct leak of 4097 byte(s) in 1 object(s) allocated from: #1 0x414f73 in process_options cmd/ztest/ztest.c:721 Direct leak of 5440 byte(s) in 17 object(s) allocated from: #1 0x41bfd5 in umem_alloc ../../lib/libspl/include/umem.h:88 #2 0x41bfd5 in ztest_zap_parallel cmd/ztest/ztest.c:4659 #3 0x4163a8 in ztest_execute cmd/ztest/ztest.c:5907 Signed-off-by: Gvozden Neskovic <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4896
* Fix zdb crash with 4K-only devicesBrian Behlendorf2016-07-271-3/+13
| | | | | | | | | | | | Here's the problem - on 4K native devices in userland on Linux using O_DIRECT, buffers must be 4K aligned or I/O will fail with EINVAL, causing zdb (and others) to coredump. Since userland probably doesn't need optimized buffer caches, we just force 4K alignment on everything. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Gvozden Neskovic <[email protected]> Closes #4479
* Multi-thread 'zpool import' for blkidBrian Behlendorf2016-07-271-19/+10
| | | | | | | | | | | | | | | | | | | | | | | | | Commit 519129f added support to multi-thread 'zpool import' for the case where block devices are scanned for under /dev/. This commit generalizes that logic and applies it to the case where device names are acquired from libblkid. The zpool_find_import_scan() and zpool_find_import_blkid() functions create an AVL tree containing each device name. Each entry in this tree is dispatched to a taskq where the function zpool_open_func() validates the device by opening it and reading the label. This may result in additional entries being added to the tree and those device paths being verified. This is largely how the upstream OpenZFS code behaves but due to significant differences the non-Linux code has been dropped for readability. Additionally, this code makes use of taskqs and kmutexs which are normally not available to the command line tools. Special care has been taken to allow their use in the import functions. Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #4794
* Fixes for issues found with cppcheck toolGvozden Neskovic2016-07-274-5/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | The patch fixes small number of errors/false positives reported by `cppcheck`, static analysis tool for C/C++. cppcheck 1.72 $ cppcheck . --force --quiet [cmd/zfs/zfs_main.c:4444]: (error) Possible null pointer dereference: who_perm [cmd/zfs/zfs_main.c:4445]: (error) Possible null pointer dereference: who_perm [cmd/zfs/zfs_main.c:4446]: (error) Possible null pointer dereference: who_perm [cmd/zpool/zpool_iter.c:317]: (error) Uninitialized variable: nvroot [cmd/zpool/zpool_vdev.c:1526]: (error) Memory leak: child [lib/libefi/rdwr_efi.c:1118]: (error) Memory leak: efi_label [lib/libuutil/uu_misc.c:207]: (error) va_list 'args' was opened but not closed by va_end(). [lib/libzfs/libzfs_import.c:1554]: (error) Dangerous usage of 'diskname' (strncpy doesn't always null-terminate it). [lib/libzfs/libzfs_sendrecv.c:3279]: (error) Dereferencing 'cp' after it is deallocated / released [tests/zfs-tests/cmd/file_write/file_write.c:154]: (error) Possible null pointer dereference: operation [tests/zfs-tests/cmd/randfree_file/randfree_file.c:90]: (error) Memory leak: buf [cmd/zinject/zinject.c:1068]: (error) Uninitialized variable: dataset [module/icp/io/sha2_mod.c:698]: (error) Uninitialized variable: blocks_per_int64 Signed-off-by: Gvozden Neskovic <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #1392
* Fixes and enhancements of SIMD raidz parityGvozden Neskovic2016-07-191-20/+6
| | | | | | | | | | | | | | | | | | | | | | | - Implementation lock replaced with atomic variable - Trailing whitespace is removed from user specified parameter, to enhance experience when using commands that add newline, e.g. `echo` - raidz_test: remove dependency on `getrusage()` and RUSAGE_THREAD, Issue #4813 - silence `cppcheck` in vdev_raidz, partial solution of Issue #1392 - Minor fixes and cleanups - Enable use of original parity methods in [fastest] configuration. New opaque original ops structure, representing native methods, is added to supported raidz methods. Original parity methods are executed if selected implementation has NULL fn pointer. Signed-off-by: Gvozden Neskovic <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #4813 Issue #1392
* Update arc_summary.py for prefetch changesBrian Behlendorf2016-07-191-146/+1
| | | | | | | | | | | Commit 7f60329 removed several kstats which arc_summary.py read. Remove these kstats from arc_summary.py in the same way this was handled in FreeNAS. FreeNAS-commit: https://github.com/freenas/freenas/commit/3901f73 Signed-off-by: Brian Behlendorf <[email protected]> Closes #4695
* Add RAID-Z routines for SSE2 instruction set, in x86_64 mode.Gvozden Neskovic2016-07-131-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch covers low-end and older x86 CPUs. Parity generation is equivalent to SSSE3 implementation, but reconstruction is somewhat slower. Previous 'sse' implementation is renamed to 'ssse3' to indicate highest instruction set used. Benchmark results: scalar_rec_p 4 720476442 scalar_rec_q 4 187462804 scalar_rec_r 4 138996096 scalar_rec_pq 4 140834951 scalar_rec_pr 4 129332035 scalar_rec_qr 4 81619194 scalar_rec_pqr 4 53376668 sse2_rec_p 4 2427757064 sse2_rec_q 4 747120861 sse2_rec_r 4 499871637 sse2_rec_pq 4 522403710 sse2_rec_pr 4 464632780 sse2_rec_qr 4 319124434 sse2_rec_pqr 4 205794190 ssse3_rec_p 4 2519939444 ssse3_rec_q 4 1003019289 ssse3_rec_r 4 616428767 ssse3_rec_pq 4 706326396 ssse3_rec_pr 4 570493618 ssse3_rec_qr 4 400185250 ssse3_rec_pqr 4 377541245 original_rec_p 4 691658568 original_rec_q 4 195510948 original_rec_r 4 26075538 original_rec_pq 4 103087368 original_rec_pr 4 15767058 original_rec_qr 4 15513175 original_rec_pqr 4 10746357 Signed-off-by: Gvozden Neskovic <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4783
* OpenZFS 6314 - buffer overflow in dsl_dataset_nameIgor Kozhukhov2016-06-285-85/+83
| | | | | | | | | | | Reviewed by: George Wilson <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: Igor Kozhukhov <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/6314 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/d6160ee
* OpenZFS 2605, 6980, 6902Matthew Ahrens2016-06-282-20/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 2605 want to resume interrupted zfs send Reviewed by: George Wilson <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Richard Elling <[email protected]> Reviewed by: Xin Li <[email protected]> Reviewed by: Arne Jansen <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: kernelOfTruth <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/2605 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/9c3fd12 6980 6902 causes zfs send to break due to 32-bit/64-bit struct mismatch Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/6980 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ea4a67f Porting notes: - All rsend and snapshop tests enabled and updated for Linux. - Fix misuse of input argument in traverse_visitbp(). - Fix ISO C90 warnings and errors. - Fix gcc 'missing braces around initializer' in 'struct send_thread_arg to_arg =' warning. - Replace 4 argument fletcher_4_native() with 3 argument version, this change was made in OpenZFS 4185 which has not been ported. - Part of the sections for 'zfs receive' and 'zfs send' was rewritten and reordered to approximate upstream. - Fix mktree xattr creation, 'user.' prefix required. - Minor fixes to newly enabled test cases - Long holds for volumes allowed during receive for minor registration.
* Implement large_dnode pool featureNed Bass2016-06-243-45/+180
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Justification ------------- This feature adds support for variable length dnodes. Our motivation is to eliminate the overhead associated with using spill blocks. Spill blocks are used to store system attribute data (i.e. file metadata) that does not fit in the dnode's bonus buffer. By allowing a larger bonus buffer area the use of a spill block can be avoided. Spill blocks potentially incur an additional read I/O for every dnode in a dnode block. As a worst case example, reading 32 dnodes from a 16k dnode block and all of the spill blocks could issue 33 separate reads. Now suppose those dnodes have size 1024 and therefore don't need spill blocks. Then the worst case number of blocks read is reduced to from 33 to two--one per dnode block. In practice spill blocks may tend to be co-located on disk with the dnode blocks so the reduction in I/O would not be this drastic. In a badly fragmented pool, however, the improvement could be significant. ZFS-on-Linux systems that make heavy use of extended attributes would benefit from this feature. In particular, ZFS-on-Linux supports the xattr=sa dataset property which allows file extended attribute data to be stored in the dnode bonus buffer as an alternative to the traditional directory-based format. Workloads such as SELinux and the Lustre distributed filesystem often store enough xattr data to force spill bocks when xattr=sa is in effect. Large dnodes may therefore provide a performance benefit to such systems. Other use cases that may benefit from this feature include files with large ACLs and symbolic links with long target names. Furthermore, this feature may be desirable on other platforms in case future applications or features are developed that could make use of a larger bonus buffer area. Implementation -------------- The size of a dnode may be a multiple of 512 bytes up to the size of a dnode block (currently 16384 bytes). A dn_extra_slots field was added to the current on-disk dnode_phys_t structure to describe the size of the physical dnode on disk. The 8 bits for this field were taken from the zero filled dn_pad2 field. The field represents how many "extra" dnode_phys_t slots a dnode consumes in its dnode block. This convention results in a value of 0 for 512 byte dnodes which preserves on-disk format compatibility with older software. Similarly, the in-memory dnode_t structure has a new dn_num_slots field to represent the total number of dnode_phys_t slots consumed on disk. Thus dn->dn_num_slots is 1 greater than the corresponding dnp->dn_extra_slots. This difference in convention was adopted because, unlike on-disk structures, backward compatibility is not a concern for in-memory objects, so we used a more natural way to represent size for a dnode_t. The default size for newly created dnodes is determined by the value of a new "dnodesize" dataset property. By default the property is set to "legacy" which is compatible with older software. Setting the property to "auto" will allow the filesystem to choose the most suitable dnode size. Currently this just sets the default dnode size to 1k, but future code improvements could dynamically choose a size based on observed workload patterns. Dnodes of varying sizes can coexist within the same dataset and even within the same dnode block. For example, to enable automatically-sized dnodes, run # zfs set dnodesize=auto tank/fish The user can also specify literal values for the dnodesize property. These are currently limited to powers of two from 1k to 16k. The power-of-2 limitation is only for simplicity of the user interface. Internally the implementation can handle any multiple of 512 up to 16k, and consumers of the DMU API can specify any legal dnode value. The size of a new dnode is determined at object allocation time and stored as a new field in the znode in-memory structure. New DMU interfaces are added to allow the consumer to specify the dnode size that a newly allocated object should use. Existing interfaces are unchanged to avoid having to update every call site and to preserve compatibility with external consumers such as Lustre. The new interfaces names are given below. The versions of these functions that don't take a dnodesize parameter now just call the _dnsize() versions with a dnodesize of 0, which means use the legacy dnode size. New DMU interfaces: dmu_object_alloc_dnsize() dmu_object_claim_dnsize() dmu_object_reclaim_dnsize() New ZAP interfaces: zap_create_dnsize() zap_create_norm_dnsize() zap_create_flags_dnsize() zap_create_claim_norm_dnsize() zap_create_link_dnsize() The constant DN_MAX_BONUSLEN is renamed to DN_OLD_MAX_BONUSLEN. The spa_maxdnodesize() function should be used to determine the maximum bonus length for a pool. These are a few noteworthy changes to key functions: * The prototype for dnode_hold_impl() now takes a "slots" parameter. When the DNODE_MUST_BE_FREE flag is set, this parameter is used to ensure the hole at the specified object offset is large enough to hold the dnode being created. The slots parameter is also used to ensure a dnode does not span multiple dnode blocks. In both of these cases, if a failure occurs, ENOSPC is returned. Keep in mind, these failure cases are only possible when using DNODE_MUST_BE_FREE. If the DNODE_MUST_BE_ALLOCATED flag is set, "slots" must be 0. dnode_hold_impl() will check if the requested dnode is already consumed as an extra dnode slot by an large dnode, in which case it returns ENOENT. * The function dmu_object_alloc() advances to the next dnode block if dnode_hold_impl() returns an error for a requested object. This is because the beginning of the next dnode block is the only location it can safely assume to either be a hole or a valid starting point for a dnode. * dnode_next_offset_level() and other functions that iterate through dnode blocks may no longer use a simple array indexing scheme. These now use the current dnode's dn_num_slots field to advance to the next dnode in the block. This is to ensure we properly skip the current dnode's bonus area and don't interpret it as a valid dnode. zdb --- The zdb command was updated to display a dnode's size under the "dnsize" column when the object is dumped. For ZIL create log records, zdb will now display the slot count for the object. ztest ----- Ztest chooses a random dnodesize for every newly created object. The random distribution is more heavily weighted toward small dnodes to better simulate real-world datasets. Unused bonus buffer space is filled with non-zero values computed from the object number, dataset id, offset, and generation number. This helps ensure that the dnode traversal code properly skips the interior regions of large dnodes, and that these interior regions are not overwritten by data belonging to other dnodes. A new test visits each object in a dataset. It verifies that the actual dnode size matches what was stored in the ztest block tag when it was created. It also verifies that the unused bonus buffer space is filled with the expected data patterns. ZFS Test Suite -------------- Added six new large dnode-specific tests, and integrated the dnodesize property into existing tests for zfs allow and send/recv. Send/Receive ------------ ZFS send streams for datasets containing large dnodes cannot be received on pools that don't support the large_dnode feature. A send stream with large dnodes sets a DMU_BACKUP_FEATURE_LARGE_DNODE flag which will be unrecognized by an incompatible receiving pool so that the zfs receive will fail gracefully. While not implemented here, it may be possible to generate a backward-compatible send stream from a dataset containing large dnodes. The implementation may be tricky, however, because the send object record for a large dnode would need to be resized to a 512 byte dnode, possibly kicking in a spill block in the process. This means we would need to construct a new SA layout and possibly register it in the SA layout object. The SA layout is normally just sent as an ordinary object record. But if we are constructing new layouts while generating the send stream we'd have to build the SA layout object dynamically and send it at the end of the stream. For sending and receiving between pools that do support large dnodes, the drr_object send record type is extended with a new field to store the dnode slot count. This field was repurposed from unused padding in the structure. ZIL Replay ---------- The dnode slot count is stored in the uppermost 8 bits of the lr_foid field. The bits were unused as the object id is currently capped at 48 bits. Resizing Dnodes --------------- It should be possible to resize a dnode when it is dirtied if the current dnodesize dataset property differs from the dnode's size, but this functionality is not currently implemented. Clearly a dnode can only grow if there are sufficient contiguous unused slots in the dnode block, but it should always be possible to shrink a dnode. Growing dnodes may be useful to reduce fragmentation in a pool with many spill blocks in use. Shrinking dnodes may be useful to allow sending a dataset to a pool that doesn't support the large_dnode feature. Feature Reference Counting -------------------------- The reference count for the large_dnode pool feature tracks the number of datasets that have ever contained a dnode of size larger than 512 bytes. The first time a large dnode is created in a dataset the dataset is converted to an extensible dataset. This is a one-way operation and the only way to decrement the feature count is to destroy the dataset, even if the dataset no longer contains any large dnodes. The complexity of reference counting on a per-dnode basis was too high, so we chose to track it on a per-dataset basis similarly to the large_block feature. Signed-off-by: Ned Bass <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #3542
* Revert "Add a test case for dmu_free_long_range() to ztest"Brian Behlendorf2016-06-241-178/+0
| | | | | | | | | | This reverts commit d0de2e82df579f4e4edf5643b674a1464fae485f which introduced a new test case to ztest which is failing occasionally during automated testing. The change is being reverted until the issue can be fully investigated. Signed-off-by: Brian Behlendorf <[email protected]> Issue #4754
* Add a test case for dmu_free_long_range() to ztestBoris Protopopov2016-06-211-0/+178
| | | | | | Signed-off-by: Boris Protopopov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4754
* SIMD implementation of vdev_raidz generate and reconstruct routinesGvozden Neskovic2016-06-216-1/+1140
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #4328