summaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* OpenZFS 9166 - zfs storage pool checkpointSerapheim Dimitropoulos2018-06-2622-51/+229
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Details about the motivation of this feature and its usage can be found in this blogpost: https://sdimitro.github.io/post/zpool-checkpoint/ A lightning talk of this feature can be found here: https://www.youtube.com/watch?v=fPQA8K40jAM Implementation details can be found in big block comment of spa_checkpoint.c Side-changes that are relevant to this commit but not explained elsewhere: * renames members of "struct metaslab trees to be shorter without losing meaning * space_map_{alloc,truncate}() accept a block size as a parameter. The reason is that in the current state all space maps that we allocate through the DMU use a global tunable (space_map_blksz) which defauls to 4KB. This is ok for metaslab space maps in terms of bandwirdth since they are scattered all over the disk. But for other space maps this default is probably not what we want. Examples are device removal's vdev_obsolete_sm or vdev_chedkpoint_sm from this review. Both of these have a 1:1 relationship with each vdev and could benefit from a bigger block size. Porting notes: * The part of dsl_scan_sync() which handles async destroys has been moved into the new dsl_process_async_destroys() function. * Remove "VERIFY(!(flags & FWRITE))" in "kernel.c" so zhack can write to block device backed pools. * ZTS: * Fix get_txg() in zpool_sync_001_pos due to "checkpoint_txg". * Don't use large dd block sizes on /dev/urandom under Linux in checkpoint_capacity. * Adopt Delphix-OS's setting of 4 (spa_asize_inflation = SPA_DVAS_PER_BP + 1) for the checkpoint_capacity test to speed its attempts to fill the pool * Create the base and nested pools with sync=disabled to speed up the "setup" phase. * Clear labels in test pool between checkpoint tests to avoid duplicate pool issues. * The import_rewind_device_replaced test has been marked as "known to fail" for the reasons listed in its DISCLAIMER. * New module parameters: zfs_spa_discard_memory_limit, zfs_remove_max_bytes_pause (not documented - debugging only) vdev_max_ms_count (formerly metaslabs_per_vdev) vdev_min_ms_count Authored by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: John Kennedy <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Richard Lowe <[email protected]> Ported-by: Tim Chase <[email protected]> Signed-off-by: Tim Chase <[email protected]> OpenZFS-issue: https://illumos.org/issues/9166 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7159fdb8 Closes #7570
* Linux 4.14 compat: blk_queue_stackable()Brian Behlendorf2018-06-191-11/+0
| | | | | | | | | | | | | | | | | | | | | The blk_queue_stackable() function was replaced in the 4.14 kernel by queue_is_rq_based(), commit torvalds/linux@5fdee212. This change resulted in the default elevator being used which can negatively impact performance. Rather than adding additional compatibility code to detect the new interface unconditionally attempt to set the elevator. Since we expect this to fail for block devices without an elevator the error message has been moved in to zfs_dbgmsg(). Finally, it was observed that the elevator_change() was removed from the 4.12 kernel, commit torvalds/linux@c033269. Update the comment to clearly specify which are expected to export the elevator_change() symbol. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7645
* Linux 4.18 compat: inode timespec -> timespec64Brian Behlendorf2018-06-1912-34/+63
| | | | | | | | | | | | | | | | | | | | | | | | Commit torvalds/linux@95582b0 changes the inode i_atime, i_mtime, and i_ctime members form timespec's to timespec64's to make them 2038 safe. As part of this change the current_time() function was also updated to return the timespec64 type. Resolve this issue by introducing a new inode_timespec_t type which is defined to match the timespec type used by the inode. It should be used when working with inode timestamps to ensure matching types. The timestruc_t type under Illumos was used in a similar fashion but was specified to always be a timespec_t. Rather than incorrectly define this type all timespec_t types have been replaced by the new inode_timespec_t type. Finally, the kernel and user space 'sys/time.h' headers were aligned with each other. They define as appropriate for the context several constants as macros and include static inline implementation of gethrestime(), gethrestime_sec(), and gethrtime(). Reviewed-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7643
* Add ASSERT to debug encryption key mapping issuesTom Caputi2018-06-181-0/+1
| | | | | | | | | | | This patch simply adds an ASSERT that confirms that the last decrypting reference on a dataset waits until the dataset is no longer dirty. This should help to debug issues where the ZIO layer cannot find encryption keys after a dataset has been disowned. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #7637
* Add tunables for channel programsJohn Gallagher2018-06-151-3/+3
| | | | | | | | | | | | This patch adds tunables for modifying the maximum memory limit and maximum instruction limit that can be specified when running a channel program. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected] Reviewed-by: Sara Hartse <[email protected]> Signed-off-by: John Gallagher <[email protected]> External-issue: LX-1085 Closes #7618
* Linux compat 4.18: check_disk_size_change()Brian Behlendorf2018-06-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added support for the bops->check_events() interface which was added in the 2.6.38 kernel to replace bops->media_changed(). Fully implementing this functionality allows the volume resize code to rely on revalidate_disk(), which is the preferred mechanism, and removes the need to use check_disk_size_change(). In order for bops->check_events() to lookup the zvol_state_t stored in the disk->private_data the zvol_state_lock needs to be held. Since the check events interface may poll the mutex has been converted to a rwlock for better concurrently. The rwlock need only be taken as a writer in the zvol_free() path when disk->private_data is set to NULL. The configure checks for the block_device_operations structure were consolidated in a single kernel-block-device-operations.m4 file. The ZFS_AC_KERNEL_BDEV_BLOCK_DEVICE_OPERATIONS configure checks and assoicated dead code was removed. This interface was added to the 2.6.28 kernel which predates the oldest supported 2.6.32 kernel and will therefore always be available. Updated maximum Linux version in META file. The 4.17 kernel was released on 2018-06-03 and ZoL is compatible with the finalized kernel. Reviewed-by: Boris Protopopov <[email protected]> Reviewed-by: Sara Hartse <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7611
* Reserve DMU_BACKUP_FEATURE for ZSTDAllan Jude2018-06-141-0/+1
| | | | | | | | | Reserve bit 25 for the ZSTD compression feature from FreeBSD. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Signed-off-by: Allan Jude <[email protected]> Closes #7626
* OpenZFS 9577 - remove zfs_dbuf_evict_key tsdMatthew Ahrens2018-06-132-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The zfs_dbuf_evict_key TSD (thread-specific data) is not necessary - we can instead pass a flag down in a few places to prevent recursive dbuf eviction. Making this change has 3 benefits: 1. The code semantics are easier to understand. 2. On Linux, performance is improved, because creating/removing TSD values (by setting to NULL vs non-NULL) is expensive, and we do it very often. 3. According to Nexenta, the current semantics can cause a deadlock when concurrently calling dmu_objset_evict_dbufs() (which is rare today, but they are working on a "parallel unmount" change that triggers this more easily): Porting Notes: * Minor conflict with OpenZFS 9337 which has not yet been ported. Authored by: Matthew Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Brad Lewis <[email protected]> Reviewed-by: George Melikov <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://illumos.org/issues/9577 OpenZFS-commit: https://github.com/openzfs/openzfs/pull/645 External-issue: DLPX-58547 Closes #7602
* Raw receive functions must not decrypt dataTom Caputi2018-06-061-1/+2
| | | | | | | | | | | | | | | | This patch fixes a small bug found where receive_spill() sometimes attempted to decrypt spill blocks when doing a raw receive. In addition, this patch fixes another small issue in arc_buf_fill()'s error handling where a decryption failure (which could be caused by the first bug) would attempt to set the arc header's IO_ERROR flag without holding the header's lock. Reviewed-by: Matthew Thode <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #7564 Closes #7584 Closes #7592
* OpenZFS 8484 - Implement aggregate sum and use for arc countersPaul Dagnelie2018-06-064-0/+104
| | | | | | | | | | | | | | | | | | | In pursuit of improving performance on multi-core systems, we should implements fanned out counters and use them to improve the performance of some of the arc statistics. These stats are updated extremely frequently, and can consume a significant amount of CPU time. Authored by: Paul Dagnelie <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Dan McDonald <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: Paul Dagnelie <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8484 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7028a8b92b7 Issue #3752 Closes #7462
* Add pool state /proc entry, "SUSPENDED" poolsTony Hutter2018-06-063-4/+8
| | | | | | | | | | | | | | | | | | | 1. Add a proc entry to display the pool's state: $ cat /proc/spl/kstat/zfs/tank/state ONLINE This is done without using the spa config locks, so it will never hang. 2. Fix 'zpool status' and 'zpool list -o health' output to print "SUSPENDED" instead of "ONLINE" for suspended pools. Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Richard Elling <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #7331 Closes #7563
* Remove rwlock wrappersBrian Behlendorf2018-06-041-1/+0
| | | | | | | | | | | | | | | | | | | The only remaining consumer of the rwlock compatibility wrappers is ztest. Remove the wrappers and convert the few remaining calls to the underlying pthread functions. rwlock_init() -> pthread_rwlock_init() rwlock_destroy() -> pthread_rwlock_destroy() rw_rdlock() -> pthread_rwlock_rdlock() rw_wrlock() -> pthread_rwlock_wrlock() rw_unlock() -> pthread_rwlock_unlock() Note pthread_rwlock_init() defaults to PTHREAD_PROCESS_PRIVATE which is equivilant to the USYNC_THREAD behavior. There is no functional change. Reviewed-by: Olaf Faaland <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7591
* OpenZFS 9464 - txg_kick() fails to see that we are quiescingSerapheim Dimitropoulos2018-06-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | txg_kick() fails to see that we are quiescing, forcing transactions to their next stages without leaving them accumulate changes Creating a fragmented pool in a DCenter VM and continuously writing to it with multiple instances of randwritecomp, we get the following output from txg.d: 0ms 311MB in 4114ms (95% p1) 75MB/s 544MB (76%) 336us 153ms 0ms 0ms 8MB in 51ms ( 0% p1) 163MB/s 474MB (66%) 129us 34ms 0ms 0ms 366MB in 4454ms (93% p1) 82MB/s 572MB (79%) 498us 20ms 0ms 0ms 406MB in 5212ms (95% p1) 77MB/s 591MB (82%) 661us 37ms 0ms 0ms 340MB in 5110ms (94% p1) 66MB/s 622MB (86%) 1048us 41ms 1ms 0ms 3MB in 61ms ( 0% p1) 51MB/s 419MB (58%) 33us 0ms 0ms 0ms 361MB in 3555ms (88% p1) 101MB/s 542MB (75%) 335us 40ms 0ms 0ms 356MB in 4592ms (92% p1) 77MB/s 561MB (78%) 430us 89ms 1ms 0ms 11MB in 129ms (13% p1) 90MB/s 507MB (70%) 222us 15ms 0ms 0ms 281MB in 2520ms (89% p1) 111MB/s 542MB (75%) 334us 42ms 0ms 0ms 383MB in 3666ms (91% p1) 104MB/s 557MB (77%) 411us 133ms 0ms 0ms 404MB in 5757ms (94% p1) 70MB/s 635MB (88%) 1274us 123ms 2ms 4ms 367MB in 4172ms (89% p1) 88MB/s 556MB (77%) 401us 51ms 0ms 0ms 42MB in 470ms (44% p1) 90MB/s 557MB (77%) 412us 43ms 0ms 0ms 261MB in 2273ms (88% p1) 114MB/s 556MB (77%) 407us 27ms 0ms 0ms 394MB in 3646ms (85% p1) 108MB/s 552MB (77%) 393us 304ms 0ms 0ms 275MB in 2416ms (89% p1) 113MB/s 510MB (71%) 200us 53ms 0ms 0ms 9MB in 53ms ( 0% p1) 169MB/s 483MB (67%) 140us 100ms 1ms The TXGs that are getting synced and don't have lots of changes are pushed by txg_kick() which basically forces the current open txg to get to the quiesced state: if (tx->tx_syncing_txg == 0 && tx->tx_quiesce_txg_waiting <= tx->tx_open_txg && tx->tx_sync_txg_waiting <= tx->tx_synced_txg && tx->tx_quiesced_txg <= tx->tx_synced_txg) { tx->tx_quiesce_txg_waiting = tx->tx_open_txg + 1; cv_broadcast(&tx->tx_quiesce_more_cv); } The problem is that the above code doesn't check if we are currently quiescing anything (only if a quiesce or a sync has been requested, ..etc) so the following scenario can happen: 1] We have an open txg A that had enough dirty data (more than zfs_dirty_data_sync) and it was pushed to the quiesced state, and opened a new txg B. No txg is currently being synced. 2] Immediately after the opening of B, txg_kick() was run by some other write (and because of A's dirty data) and saw that we are not currently syncing any txg and no one has requested quiescing so it requests one by bumping tx_quiesce_txg_waiting and broadcasts the quiesce thread. 3] The quiesce thread just passed txg A to be synced and sees that a quiescing request has been sent to it so it immediately grabs B without letting it gather enough data, putting it in a quiesced state and opening a new txg C. In this scenario txg B, is an example of how the entries of interest show up in the txg.d output. Ideally we would like txg_kick() to get triggered only when we are sure that we are not syncing AND not quiescing any txg. This way we can kick an open TXG to the quiescing state when we are sure that there is nothing going on and we would benefit from the different states running concurrently. Authored by: Serapheim Dimitropoulos <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Brad Lewis <[email protected]> Reviewed by: Andriy Gapon <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://illumos.org/issues/9464 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/1cd7635b Closes #7587
* OpenZFS 9235 - rename zpool_rewind_policy_t to zpool_load_policy_tPavel Zakharov2018-06-043-16/+16
| | | | | | | | | | | | | | | | | | | | | | | We want to be able to pass various settings during import/open of a pool, which are not only related to rewind. Instead of adding a new policy and duplicate a bunch of code, we should just rename rewind_policy to a more generic term like load_policy. For instance, we'd like to set spa->spa_import_flags from the nvlist, rather from a flags parameter passed to spa_import as in some cases we want those flags not only for the import case, but also for the open case. One such flag could be ZFS_IMPORT_MISSING_LOG (as used in zdb) which would allow zfs to open a pool when logs are missing. Authored by: Pavel Zakharov <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Approved by: Robert Mustacchi <[email protected]> Ported-by: Brian Behlendorf <[email protected]> OpenZFS-issue: https://illumos.org/issues/9235 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/d2b1e44 Closes #7532
* zpool reopen should detect expanded devicesSara Hartse2018-05-311-0/+12
| | | | | | | | | | | | | | | | | | | | Update bdev_capacity to have wholedisk vdevs query the size of the underlying block device (correcting for the size of the efi parition and partition alignment) and therefore detect expanded space. Correct vdev_get_stats_ex so that the expandsize is aligned to metaslab size and new space is only reported if it is large enough for a new metaslab. Reviewed by: Don Brady <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed by: John Wren Kennedy <[email protected]> Signed-off-by: sara hartse <[email protected]> External-issue: LX-165 Closes #7546 Issue #7582
* Update build system and packagingBrian Behlendorf2018-05-2925-34/+226
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Minimal changes required to integrate the SPL sources in to the ZFS repository build infrastructure and packaging. Build system and packaging: * Renamed SPL_* autoconf m4 macros to ZFS_*. * Removed redundant SPL_* autoconf m4 macros. * Updated the RPM spec files to remove SPL package dependency. * The zfs package obsoletes the spl package, and the zfs-kmod package obsoletes the spl-kmod package. * The zfs-kmod-devel* packages were updated to add compatibility symlinks under /usr/src/spl-x.y.z until all dependent packages can be updated. They will be removed in a future release. * Updated copy-builtin script for in-kernel builds. * Updated DKMS package to include the spl.ko. * Updated stale AUTHORS file to include all contributors. * Updated stale COPYRIGHT and included the SPL as an exception. * Renamed README.markdown to README.md * Renamed OPENSOLARIS.LICENSE to LICENSE. * Renamed DISCLAIMER to NOTICE. Required code changes: * Removed redundant HAVE_SPL macro. * Removed _BOOT from nvpairs since it doesn't apply for Linux. * Initial header cleanup (removal of empty headers, refactoring). * Remove SPL repository clone/build from zimport.sh. * Use of DEFINE_RATELIMIT_STATE and DEFINE_SPINLOCK removed due to build issues when forcing C99 compilation. * Replaced legacy ACCESS_ONCE with READ_ONCE. * Include needed headers for `current` and `EXPORT_SYMBOL`. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Pavel Zakharov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> TEST_ZIMPORT_SKIP="yes" Closes #7556
* Merge branch 'zfsonlinux/merge-spl'Brian Behlendorf2018-05-2955-0/+4946
|\ | | | | | | | | | | | | | | | | | | Merge a minimal version of the zfsonlinux/spl repository in to the zfsonlinux/zfs repository. Care was taken to prevent file conflicts when merging and to preserve the spl repository history. The spl kernel module remains under the GPLv2 license as documented by the additional THIRDPARTYLICENSE.gplv2 file. Signed-off-by: Brian Behlendorf <[email protected]>
| * Prepare SPL repo to merge with ZFS repoBrian Behlendorf2018-05-29136-3020/+233
| | | | | | | | | | | | | | | | | | This commit removes everything from the repository except the core SPL implementation for Linux. Those files which remain have been moved to non-conflicting locations to facilitate the merge. The README.md and associated files have been updated accordingly. Signed-off-by: Brian Behlendorf <[email protected]>
| * Linux compat 4.16: SECTOR_SIZEGiuseppe Di Natale2018-04-091-0/+6
| | | | | | | | | | | | | | | | | | | | As of https://github.com/torvalds/linux/commit/233bde21, SECTOR_SIZE is defined in linux/blkdev.h. Define SECTOR_SIZE in sunldi.h only if it's not already defined. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #697
| * Remove syseventsBrian Behlendorf2018-04-044-71/+1
| | | | | | | | | | | | | | | | These headers are provided in the ZFS repository and never used by the SPL. Remove them to ensure the right ones are included. Reviewed-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #696
| * Fix multiple evaluations of VERIFY() and ASSERT() on failuresDeHackEd2018-02-211-6/+9
| | | | | | | | | | | | | | Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: DHE <[email protected]> Closes #684 Closes #685
| * Fix cstyle warningsBrian Behlendorf2018-02-07104-739/+760
| | | | | | | | | | | | | | | | This patch contains no functional changes. It is solely intended to resolve cstyle warnings in order to facilitate moving the spl source code in to the zfs repository. Signed-off-by: Brian Behlendorf <[email protected]> Closes #681
| * Add cv_timedwait_io()Brian Behlendorf2018-01-241-0/+2
| | | | | | | | | | | | | | | | Add missing helper function cv_timedwait_io(), it should be used when waiting on IO with a specified timeout. Reviewed-by: Tim Chase <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #674
| * Linux 4.14 compat: vfs_read & vfs_writeBrian Behlendorf2017-11-151-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel_read & kernel_write functions have always wrapped the vfs_read & vfs_write functions respectively. However, they could not be used by vn_rdwr() since the offset wasn't passed as a pointer. This prevented us from being able to properly update the file offset. Linux 4.14 unexported vfs_read & vfs_write but also changed the signature of kernel_read & kernel_write to provide the needed functionality. Use these updated functions when available. Reviewed-by: Pritam Baral <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #656 Closes #667
| * Remove vn_rename and vn_removeBrian Behlendorf2017-10-271-2/+0
| | | | | | | | | | | | | | | | | | | | Both vn_rename and vn_remove have been historically problematic to implement reliably. Rather than fixing them yet again they are being removed. Reviewed-by: Arkadiusz Bubala <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #648 Closes #661
| * Add parenthesis to btop and ptob macrosBrian Behlendorf2017-10-101-2/+2
| | | | | | | | | | | | | | | | | | Add missing parenthesis around btop and ptob macros to ensure operation ordering is preserved after expansion. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #660
| * Make include/linux/ conform to ZFS style standardOlaf Faaland2017-10-0911-53/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No semantic changes. Fix the following types of style issues: blank after preprocessor # #define followed by space instead of tab improper first line of block comment indent by spaces instead of tabs last line in file is blank missing blank after open comment missing space before left brace non-continuation indented 4 spaces spaces instead of tabs unparenthesized return expression Signed-off-by: Olaf Faaland <[email protected]>
| * Make file headers conform to ZFS style standardOlaf Faaland2017-10-09121-242/+242
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No semantic changes. Change /************\ and \************/ to /* and */ Signed-off-by: Olaf Faaland <[email protected]>
| * Allow longer SPA names in statsgaurkuma2017-08-111-1/+1
| | | | | | | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: gaurkuma <[email protected]> Closes #641
| * Remove misguided HAVE_MUTEX_OWNER check, take 2Oleg Drokin2017-08-021-15/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is just plain unsafe to peek inside in-kernel mutex structure and make assumptions about what kernel does with those internal fields like owner. Kernel is all too happy to stop doing the expected things like tracing lock owner once you load a tainted module like spl/zfs that is not GPL. As such you will get instant assertion failures like this: VERIFY3(((*(volatile typeof((&((&zo->zo_lock)->m_mutex))->owner) *)& ((&((&zo->zo_lock)->m_mutex))->owner))) == ((void *)0)) failed (ffff88030be28500 == (null)) PANIC at zfs_onexit.c:104:zfs_onexit_destroy() Showing stack for process 3626 CPU: 0 PID: 3626 Comm: mkfs.lustre Tainted: P OE ------------ 3.10.0-debug #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: dump_stack+0x19/0x1b spl_dumpstack+0x44/0x50 [spl] spl_panic+0xbf/0xf0 [spl] zfs_onexit_destroy+0x17c/0x280 [zfs] zfsdev_release+0x48/0xd0 [zfs] Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Gvozden Neskovic <[email protected]> Signed-off-by: Oleg Drokin <[email protected]> Closes #639 Closes #632
| * spl-mutex: fix race in mutex_exitGvozden Neskovic2017-08-021-3/+4
| | | | | | | | | | | | | | | | | | | | Prevent race on accessing kmutex_t when the mutex is embedded in a ref counted structure. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Signed-off-by: Gvozden Neskovic <[email protected]> Closes zfsonlinux/zfs#6401 Closes #637
| * Revert "Remove misguided HAVE_MUTEX_OWNER check"Brian Behlendorf2017-08-021-0/+10
| | | | | | | | | | | | | | | | | | This reverts commit d89616fda88bc030aaff758d37ede7d35e58841a which introduced some build failures which need to be resolved before this can be merged. Signed-off-by: Brian Behlendorf <[email protected]> Issue #633
| * Remove misguided HAVE_MUTEX_OWNER checkOleg Drokin2017-08-021-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is just plain unsafe to peek inside in-kernel mutex structure and make assumptions about what kernel does with those internal fields like owner. Kernel is all too happy to stop doing the expected things like tracing lock owner once you load a tainted module like spl/zfs that is not GPL. As such you will get instant assertion failures like this: VERIFY3(((*(volatile typeof((&((&zo->zo_lock)->m_mutex))->owner) *)& ((&((&zo->zo_lock)->m_mutex))->owner))) == ((void *)0)) failed (ffff88030be28500 == (null)) PANIC at zfs_onexit.c:104:zfs_onexit_destroy() Showing stack for process 3626 CPU: 0 PID: 3626 Comm: mkfs.lustre Tainted: P OE ------------ 3.10.0-debug #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: dump_stack+0x19/0x1b spl_dumpstack+0x44/0x50 [spl] spl_panic+0xbf/0xf0 [spl] zfs_onexit_destroy+0x17c/0x280 [zfs] zfsdev_release+0x48/0xd0 [zfs] Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Chunwei Chen <[email protected]> Signed-off-by: Oleg Drokin <[email protected]> Closes #632 Closes #633
| * Linux 4.13 compat: wait queuesBrian Behlendorf2017-07-234-7/+17
| | | | | | | | | | | | | | | | | | | | | | | | Commit torvalds/linux@ac6424b9 - Renamed struct wait_queue -> struct wait_queue_entry. Commit torvalds/linux@2055da97 - Renamed wait_queue_head::task_list -> wait_queue_head::head - Renamed wait_queue_entry::task_list -> wait_queue_entry::entry Reviewed-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #629
| * Add ASSERT3B/VERIFY3B/USEC2NSEC/NSEC2USEC macrosPrakash Surya2017-07-132-2/+10
| | | | | | | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Prakash Surya <[email protected]> Closes #627
| * Linux 4.12 compat: PF_FSTRANS was removedChunwei Chen2017-05-091-4/+33
| | | | | | | | | | | | | | | | Change SPL_FSTRANS to optionally contains PF_FSTRANS. Also, add __spl_pf_fstrans_check for the checks specifically for PF_FSTRANS. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #614
| * Linux 4.11 compat: add linux/sched/signal.hOlaf Faaland2017-03-201-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | In Linux 4.11, torvalds/linux@2a1f062, signal handling related functions were moved from sched.h into sched/signal.h. Add configure checks to detect this and include the new file where needed. Reviewed-by: Chunwei Chen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #608
| * Add a PAGESHIFT definitionDavid Quigley2017-01-311-0/+5
| | | | | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: David Quigley <[email protected]> Closes #598
| * Add support for rw semaphore under PREEMPT_RT_FULLClemens Fruhwirth2016-12-191-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main complication from the RT patch set is that the RW semaphore locks change such that read locks on an rwsem can be taken only by a single thread. All other threads are locked out. This single thread can take a read lock multiple times though. The underlying implementation changes to a mutex with an additional read_depth count. The implementation can be best understood by inspecting the RT patch. rwsem_rt.h and rt.c give the best insight into how RT rwsem works. My implementation for rwsem_tryupgrade is basically an inversion of rt_downgrade_write found in rt.c. Please see the comments in the code. Unfortunately, I have to drop SPLAT rwlock test4 completely as this test tries to take multiple locks from different threads, which RT rwsems do not support. Otherwise SPLAT, zconfig.sh, zpios-sanity.sh and zfs-tests.sh pass on my Debian-testing VM with the kernel linux-image-4.8.0-1-rt-amd64. Tested-by: kernelOfTruth <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Clemens Fruhwirth <[email protected]> Closes zfsonlinux/zfs#5491 Closes #589 Closes #308
| * Remove stale comment from rw_tryupgrade()Clemens Fruhwirth2016-12-191-8/+0
| | | | | | | | | | | | | | | | | | Commit f58040c0fc8bc6490fcc75db7fc3e709dfc3c656 should have removed this comment which is no longer relevant. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Clemens Fruhwirth <[email protected]> Issue #589
| * Add system_delay_taskq for long delayChunwei Chen2016-12-081-0/+2
| | | | | | | | | | | | | | | | | | Add a dedicated system_delay_taskq for long delay like spa_deadman and zpl_posix_acl_free. This will allow us to use system_taskq in the manner of dispatch multiple tasks and call taskq_wait_outstanding. Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #588
| * Add TASKQID_INVALID and TASKQID_INITIAL macrosUbuntu2016-11-021-0/+6
| | | | | | | | | | | | | | | | Add the TASKQID_INVALID and TASKQID_INITIAL macros and update the taskq implementation and test cases to use them. This is solely for the purposes of readability and introduces no functional change. Signed-off-by: Brian Behlendorf <[email protected]>
| * Linux 4.9 compat: group_info changesChunwei Chen2016-10-201-0/+5
| | | | | | | | | | | | | | | | | | In Linux 4.9, torvalds/linux@81243ea, group_info changed from 2d array via ->blocks to 1d array via ->gid. We change the spl cred functions accordingly. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #581
| * Linux 4.8 compat: Fix RW_READ_HELDtuxoko2016-10-071-1/+6
| | | | | | | | | | | | | | | | | | | | Linux 4.8, starting from torvalds/linux@19c5d690e, will set owner to 1 when read held instead of leave it NULL. So we change the condition to `rw_owner(rwp) <= 1` in RW_READ_HELD. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes zfsonlinux/zfs#5233 Closes #577
| * Cleanup in cred.htuxoko2016-09-141-12/+0
| | | | | | | | | | | | | | Remove the code that doesn't make any sense. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes #569
| * Linux 4.8 compat: rw_semaphore atomic_long_t countBrian Behlendorf2016-07-291-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | For non-rwsem-spinlocks the "count" member was changed from a "long" to "atomic_long_t" type. A configure check has been added to detect this change along with new versions of the _rwsem_tryupgrade() function and RWSEM_COUNT() macro. See https://github.com/torvalds/linux/commit/8ee62b18 for complete details. Signed-off-by: Tim Chase <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #563
| * Added highbit() and lowbit() macrosTom Caputi2016-07-201-0/+3
| | | | | | | | | | | | | | Signed-off-by: Tom Caputi <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #562
| * Add _ALIGNMENT_REQUIRED to isa_defs.h for checksumsTony Hutter2016-06-211-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | _ALIGNMENT_REQUIRED needs to be #defined in isa_defs.h in order to port the Illumos checksum code to ZoL: 4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R OpenZFS-issue: https://www.illumos.org/issues/4185 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/45818ee Signed-off-by: Tony Hutter <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #561
| * Implement a proper rw_tryupgradeChunwei Chen2016-05-312-7/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current rw_tryupgrade does rw_exit and then rw_tryenter(RW_RWITER), and then does rw_enter(RW_READER) if it fails. This violate the assumption that rw_tryupgrade should be atomic and could cause extra contention or even lock inversion. This patch we implement a proper rw_tryupgrade. For rwsem-spinlock, we take the spinlock to check rwsem->count and rwsem->wait_list. For normal rwsem, we use cmpxchg on rwsem->count to change the value from single reader to single writer. Signed-off-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes zfsonlinux/zfs#4692 Closes #554
| * Add isa_defs for MIPSYunQiang Su2016-05-311-1/+22
| | | | | | | | | | | | | | | | | | GCC for MIPS only defines _LP64 when 64bit, while no _ILP32 defined when 32bit. Signed-off-by: YunQiang Su <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #558