summaryrefslogtreecommitdiffstats
path: root/man/man5
diff options
context:
space:
mode:
Diffstat (limited to 'man/man5')
-rw-r--r--man/man5/Makefile.am4
-rw-r--r--man/man5/vdev_id.conf.5188
-rw-r--r--man/man5/zfs-events.5929
-rw-r--r--man/man5/zfs-module-parameters.52754
-rw-r--r--man/man5/zpool-features.5720
5 files changed, 4595 insertions, 0 deletions
diff --git a/man/man5/Makefile.am b/man/man5/Makefile.am
new file mode 100644
index 000000000..4746914c5
--- /dev/null
+++ b/man/man5/Makefile.am
@@ -0,0 +1,4 @@
+dist_man_MANS = vdev_id.conf.5 zpool-features.5 zfs-module-parameters.5 zfs-events.5
+
+install-data-local:
+ $(INSTALL) -d -m 0755 "$(DESTDIR)$(mandir)/man5"
diff --git a/man/man5/vdev_id.conf.5 b/man/man5/vdev_id.conf.5
new file mode 100644
index 000000000..50caa92c0
--- /dev/null
+++ b/man/man5/vdev_id.conf.5
@@ -0,0 +1,188 @@
+.TH vdev_id.conf 5
+.SH NAME
+vdev_id.conf \- Configuration file for vdev_id
+.SH DESCRIPTION
+.I vdev_id.conf
+is the configuration file for
+.BR vdev_id (8).
+It controls the default behavior of
+.BR vdev_id (8)
+while it is mapping a disk device name to an alias.
+.PP
+The
+.I vdev_id.conf
+file uses a simple format consisting of a keyword followed by one or
+more values on a single line. Any line not beginning with a recognized
+keyword is ignored. Comments may optionally begin with a hash
+character.
+
+The following keywords and values are used.
+.TP
+\fIalias\fR <name> <devlink>
+Maps a device link in the /dev directory hierarchy to a new device
+name. The udev rule defining the device link must have run prior to
+.BR vdev_id (8).
+A defined alias takes precedence over a topology-derived name, but the
+two naming methods can otherwise coexist. For example, one might name
+drives in a JBOD with the sas_direct topology while naming an internal
+L2ARC device with an alias.
+
+\fIname\fR - the name of the link to the device that will by created in
+/dev/disk/by-vdev.
+
+\fIdevlink\fR - the name of the device link that has already been
+defined by udev. This may be an absolute path or the base filename.
+
+.TP
+\fIchannel\fR [pci_slot] <port> <name>
+Maps a physical path to a channel name (typically representing a single
+disk enclosure).
+
+\fIpci_slot\fR - specifies the PCI SLOT of the HBA
+hosting the disk enclosure being mapped, as found in the output of
+.BR lspci (8).
+This argument is not used in sas_switch mode.
+
+\fIport\fR - specifies the numeric identifier of the HBA or SAS switch port
+connected to the disk enclosure being mapped.
+
+\fIname\fR - specifies the name of the channel.
+
+.TP
+\fIslot\fR <old> <new> [channel]
+Maps a disk slot number as reported by the operating system to an
+alternative slot number. If the \fIchannel\fR parameter is specified
+then the mapping is only applied to slots in the named channel,
+otherwise the mapping is applied to all channels. The first-specified
+\fIslot\fR rule that can match a slot takes precedence. Therefore a
+channel-specific mapping for a given slot should generally appear before
+a generic mapping for the same slot. In this way a custom mapping may
+be applied to a particular channel and a default mapping applied to the
+others.
+
+.TP
+\fImultipath\fR <yes|no>
+Specifies whether
+.BR vdev_id (8)
+will handle only dm-multipath devices. If set to "yes" then
+.BR vdev_id (8)
+will examine the first running component disk of a dm-multipath
+device as listed by the
+.BR multipath (8)
+command to determine the physical path.
+.TP
+\fItopology\fR <sas_direct|sas_switch>
+Identifies a physical topology that governs how physical paths are
+mapped to channels.
+
+\fIsas_direct\fR - in this mode a channel is uniquely identified by
+a PCI slot and a HBA port number
+
+\fIsas_switch\fR - in this mode a channel is uniquely identified by
+a SAS switch port number
+
+.TP
+\fIphys_per_port\fR <num>
+Specifies the number of PHY devices associated with a SAS HBA port or SAS
+switch port.
+.BR vdev_id (8)
+internally uses this value to determine which HBA or switch port a
+device is connected to. The default is 4.
+
+.TP
+\fIslot\fR <bay|phy|port|id|lun|ses>
+Specifies from which element of a SAS identifier the slot number is
+taken. The default is bay.
+
+\fIbay\fR - read the slot number from the bay identifier.
+
+\fIphy\fR - read the slot number from the phy identifier.
+
+\fIport\fR - use the SAS port as the slot number.
+
+\fIid\fR - use the scsi id as the slot number.
+
+\fIlun\fR - use the scsi lun as the slot number.
+
+\fIses\fR - use the SCSI Enclosure Services (SES) enclosure device slot number,
+as reported by
+.BR sg_ses (8).
+This is intended for use only on systems where \fIbay\fR is unsupported,
+noting that \fIport\fR and \fIid\fR may be unstable across disk replacement.
+.SH EXAMPLES
+A non-multipath configuration with direct-attached SAS enclosures and an
+arbitrary slot re-mapping.
+.P
+.nf
+ multipath no
+ topology sas_direct
+ phys_per_port 4
+ slot bay
+
+ # PCI_SLOT HBA PORT CHANNEL NAME
+ channel 85:00.0 1 A
+ channel 85:00.0 0 B
+ channel 86:00.0 1 C
+ channel 86:00.0 0 D
+
+ # Custom mapping for Channel A
+
+ # Linux Mapped
+ # Slot Slot Channel
+ slot 1 7 A
+ slot 2 10 A
+ slot 3 3 A
+ slot 4 6 A
+
+ # Default mapping for B, C, and D
+
+ slot 1 4
+ slot 2 2
+ slot 3 1
+ slot 4 3
+.fi
+.P
+A SAS-switch topology. Note that the
+.I channel
+keyword takes only two arguments in this example.
+.P
+.nf
+ topology sas_switch
+
+ # SWITCH PORT CHANNEL NAME
+ channel 1 A
+ channel 2 B
+ channel 3 C
+ channel 4 D
+.fi
+.P
+A multipath configuration. Note that channel names have multiple
+definitions - one per physical path.
+.P
+.nf
+ multipath yes
+
+ # PCI_SLOT HBA PORT CHANNEL NAME
+ channel 85:00.0 1 A
+ channel 85:00.0 0 B
+ channel 86:00.0 1 A
+ channel 86:00.0 0 B
+.fi
+.P
+A configuration using device link aliases.
+.P
+.nf
+ # by-vdev
+ # name fully qualified or base name of device link
+ alias d1 /dev/disk/by-id/wwn-0x5000c5002de3b9ca
+ alias d2 wwn-0x5000c5002def789e
+.fi
+.P
+
+.SH FILES
+.TP
+.I /etc/zfs/vdev_id.conf
+The configuration file for
+.BR vdev_id (8).
+.SH SEE ALSO
+.BR vdev_id (8)
diff --git a/man/man5/zfs-events.5 b/man/man5/zfs-events.5
new file mode 100644
index 000000000..4c60eecc5
--- /dev/null
+++ b/man/man5/zfs-events.5
@@ -0,0 +1,929 @@
+'\" te
+.\" Copyright (c) 2013 by Turbo Fredriksson <[email protected]>. All rights reserved.
+.\" The contents of this file are subject to the terms of the Common Development
+.\" and Distribution License (the "License"). You may not use this file except
+.\" in compliance with the License. You can obtain a copy of the license at
+.\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing.
+.\"
+.\" See the License for the specific language governing permissions and
+.\" limitations under the License. When distributing Covered Code, include this
+.\" CDDL HEADER in each file and include the License file at
+.\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this
+.\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your
+.\" own identifying information:
+.\" Portions Copyright [yyyy] [name of copyright owner]
+.TH ZFS-EVENTS 5 "Jun 6, 2015"
+.SH NAME
+zfs\-events \- Events created by the ZFS filesystem.
+.SH DESCRIPTION
+.sp
+.LP
+Description of the different events generated by the ZFS stack.
+.sp
+Most of these don't have any description. The events generated by ZFS
+have never been publicly documented. What is here is intended as a
+starting point to provide documentation for all possible events.
+.sp
+To view all events created since the loading of the ZFS infrastructure
+(i.e, "the module"), run
+.P
+.nf
+\fBzpool events\fR
+.fi
+.P
+to get a short list, and
+.P
+.nf
+\fBzpool events -v\fR
+.fi
+.P
+to get a full detail of the events and what information
+is available about it.
+.sp
+This man page lists the different subclasses that are issued
+in the case of an event. The full event name would be
+\fIereport.fs.zfs.SUBCLASS\fR, but we only list the last
+part here.
+
+.SS "EVENTS (SUBCLASS)"
+.sp
+.LP
+
+.sp
+.ne 2
+.na
+\fBchecksum\fR
+.ad
+.RS 12n
+Issued when a checksum error has been detected.
+.RE
+
+.sp
+.ne 2
+.na
+\fBio\fR
+.ad
+.RS 12n
+Issued when there is an I/O error in a vdev in the pool.
+.RE
+
+.sp
+.ne 2
+.na
+\fBdata\fR
+.ad
+.RS 12n
+Issued when there have been data errors in the pool.
+.RE
+
+.sp
+.ne 2
+.na
+\fBdeadman\fR
+.ad
+.RS 12n
+Issued when an I/O is determined to be "hung", this can be caused by lost
+completion events due to flaky hardware or drivers. See the
+\fBzfs_deadman_failmode\fR module option description for additional
+information regarding "hung" I/O detection and configuration.
+.RE
+
+.sp
+.ne 2
+.na
+\fBdelay\fR
+.ad
+.RS 12n
+Issued when a completed I/O exceeds the maximum allowed time specified
+by the \fBzio_delay_max\fR module option. This can be an indicator of
+problems with the underlying storage device.
+.RE
+
+.sp
+.ne 2
+.na
+\fBconfig.sync\fR
+.ad
+.RS 12n
+Issued every time a vdev change have been done to the pool.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzpool\fR
+.ad
+.RS 12n
+Issued when a pool cannot be imported.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzpool.destroy\fR
+.ad
+.RS 12n
+Issued when a pool is destroyed.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzpool.export\fR
+.ad
+.RS 12n
+Issued when a pool is exported.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzpool.import\fR
+.ad
+.RS 12n
+Issued when a pool is imported.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzpool.reguid\fR
+.ad
+.RS 12n
+Issued when a REGUID (new unique identifier for the pool have been regenerated) have been detected.
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev.unknown\fR
+.ad
+.RS 12n
+Issued when the vdev is unknown. Such as trying to clear device
+errors on a vdev that have failed/been kicked from the system/pool
+and is no longer available.
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev.open_failed\fR
+.ad
+.RS 12n
+Issued when a vdev could not be opened (because it didn't exist for example).
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev.corrupt_data\fR
+.ad
+.RS 12n
+Issued when corrupt data have been detected on a vdev.
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev.no_replicas\fR
+.ad
+.RS 12n
+Issued when there are no more replicas to sustain the pool.
+This would lead to the pool being \fIDEGRADED\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev.bad_guid_sum\fR
+.ad
+.RS 12n
+Issued when a missing device in the pool have been detected.
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev.too_small\fR
+.ad
+.RS 12n
+Issued when the system (kernel) have removed a device, and ZFS
+notices that the device isn't there any more. This is usually
+followed by a \fBprobe_failure\fR event.
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev.bad_label\fR
+.ad
+.RS 12n
+Issued when the label is OK but invalid.
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev.bad_ashift\fR
+.ad
+.RS 12n
+Issued when the ashift alignment requirement has increased.
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev.remove\fR
+.ad
+.RS 12n
+Issued when a vdev is detached from a mirror (or a spare detached from a
+vdev where it have been used to replace a failed drive - only works if
+the original drive have been readded).
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev.clear\fR
+.ad
+.RS 12n
+Issued when clearing device errors in a pool. Such as running \fBzpool clear\fR
+on a device in the pool.
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev.check\fR
+.ad
+.RS 12n
+Issued when a check to see if a given vdev could be opened is started.
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev.spare\fR
+.ad
+.RS 12n
+Issued when a spare have kicked in to replace a failed device.
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev.autoexpand\fR
+.ad
+.RS 12n
+Issued when a vdev can be automatically expanded.
+.RE
+
+.sp
+.ne 2
+.na
+\fBio_failure\fR
+.ad
+.RS 12n
+Issued when there is an I/O failure in a vdev in the pool.
+.RE
+
+.sp
+.ne 2
+.na
+\fBprobe_failure\fR
+.ad
+.RS 12n
+Issued when a probe fails on a vdev. This would occur if a vdev
+have been kicked from the system outside of ZFS (such as the kernel
+have removed the device).
+.RE
+
+.sp
+.ne 2
+.na
+\fBlog_replay\fR
+.ad
+.RS 12n
+Issued when the intent log cannot be replayed. The can occur in the case
+of a missing or damaged log device.
+.RE
+
+.sp
+.ne 2
+.na
+\fBresilver.start\fR
+.ad
+.RS 12n
+Issued when a resilver is started.
+.RE
+
+.sp
+.ne 2
+.na
+\fBresilver.finish\fR
+.ad
+.RS 12n
+Issued when the running resilver have finished.
+.RE
+
+.sp
+.ne 2
+.na
+\fBscrub.start\fR
+.ad
+.RS 12n
+Issued when a scrub is started on a pool.
+.RE
+
+.sp
+.ne 2
+.na
+\fBscrub.finish\fR
+.ad
+.RS 12n
+Issued when a pool has finished scrubbing.
+.RE
+
+.sp
+.ne 2
+.na
+\fBscrub.abort\fR
+.ad
+.RS 12n
+Issued when a scrub is aborted on a pool.
+.RE
+
+.sp
+.ne 2
+.na
+\fBscrub.resume\fR
+.ad
+.RS 12n
+Issued when a scrub is resumed on a pool.
+.RE
+
+.sp
+.ne 2
+.na
+\fBscrub.paused\fR
+.ad
+.RS 12n
+Issued when a scrub is paused on a pool.
+.RE
+
+.sp
+.ne 2
+.na
+\fBbootfs.vdev.attach\fR
+.ad
+.RS 12n
+.RE
+
+.SS "PAYLOADS"
+.sp
+.LP
+This is the payload (data, information) that accompanies an
+event.
+.sp
+For
+.BR zed (8),
+these are set to uppercase and prefixed with \fBZEVENT_\fR.
+
+.sp
+.ne 2
+.na
+\fBpool\fR
+.ad
+.RS 12n
+Pool name.
+.RE
+
+.sp
+.ne 2
+.na
+\fBpool_failmode\fR
+.ad
+.RS 12n
+Failmode - \fBwait\fR, \fBcontinue\fR or \fBpanic\fR.
+See
+.BR pool (8)
+(\fIfailmode\fR property) for more information.
+.RE
+
+.sp
+.ne 2
+.na
+\fBpool_guid\fR
+.ad
+.RS 12n
+The GUID of the pool.
+.RE
+
+.sp
+.ne 2
+.na
+\fBpool_context\fR
+.ad
+.RS 12n
+The load state for the pool (0=none, 1=open, 2=import, 3=tryimport, 4=recover
+5=error).
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev_guid\fR
+.ad
+.RS 12n
+The GUID of the vdev in question (the vdev failing or operated upon with
+\fBzpool clear\fR etc).
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev_type\fR
+.ad
+.RS 12n
+Type of vdev - \fBdisk\fR, \fBfile\fR, \fBmirror\fR etc. See
+.BR zpool (8)
+under \fBVirtual Devices\fR for more information on possible values.
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev_path\fR
+.ad
+.RS 12n
+Full path of the vdev, including any \fI-partX\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev_devid\fR
+.ad
+.RS 12n
+ID of vdev (if any).
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev_fru\fR
+.ad
+.RS 12n
+Physical FRU location.
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev_state\fR
+.ad
+.RS 12n
+State of vdev (0=uninitialized, 1=closed, 2=offline, 3=removed, 4=failed to open, 5=faulted, 6=degraded, 7=healthy).
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev_ashift\fR
+.ad
+.RS 12n
+The ashift value of the vdev.
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev_complete_ts\fR
+.ad
+.RS 12n
+The time the last I/O completed for the specified vdev.
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev_delta_ts\fR
+.ad
+.RS 12n
+The time since the last I/O completed for the specified vdev.
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev_spare_paths\fR
+.ad
+.RS 12n
+List of spares, including full path and any \fI-partX\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev_spare_guids\fR
+.ad
+.RS 12n
+GUID(s) of spares.
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev_read_errors\fR
+.ad
+.RS 12n
+How many read errors that have been detected on the vdev.
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev_write_errors\fR
+.ad
+.RS 12n
+How many write errors that have been detected on the vdev.
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev_cksum_errors\fR
+.ad
+.RS 12n
+How many checkum errors that have been detected on the vdev.
+.RE
+
+.sp
+.ne 2
+.na
+\fBparent_guid\fR
+.ad
+.RS 12n
+GUID of the vdev parent.
+.RE
+
+.sp
+.ne 2
+.na
+\fBparent_type\fR
+.ad
+.RS 12n
+Type of parent. See \fBvdev_type\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBparent_path\fR
+.ad
+.RS 12n
+Path of the vdev parent (if any).
+.RE
+
+.sp
+.ne 2
+.na
+\fBparent_devid\fR
+.ad
+.RS 12n
+ID of the vdev parent (if any).
+.RE
+
+.sp
+.ne 2
+.na
+\fBzio_objset\fR
+.ad
+.RS 12n
+The object set number for a given I/O.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzio_object\fR
+.ad
+.RS 12n
+The object number for a given I/O.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzio_level\fR
+.ad
+.RS 12n
+The block level for a given I/O.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzio_blkid\fR
+.ad
+.RS 12n
+The block ID for a given I/O.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzio_err\fR
+.ad
+.RS 12n
+The errno for a failure when handling a given I/O.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzio_offset\fR
+.ad
+.RS 12n
+The offset in bytes of where to write the I/O for the specified vdev.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzio_size\fR
+.ad
+.RS 12n
+The size in bytes of the I/O.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzio_flags\fR
+.ad
+.RS 12n
+The current flags describing how the I/O should be handled. See the
+\fBI/O FLAGS\fR section for the full list of I/O flags.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzio_stage\fR
+.ad
+.RS 12n
+The current stage of the I/O in the pipeline. See the \fBI/O STAGES\fR
+section for a full list of all the I/O stages.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzio_pipeline\fR
+.ad
+.RS 12n
+The valid pipeline stages for the I/O. See the \fBI/O STAGES\fR section for a
+full list of all the I/O stages.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzio_delay\fR
+.ad
+.RS 12n
+The time in ticks (HZ) required for the block layer to service the I/O. Unlike
+\fBzio_delta\fR this does not include any vdev queuing time and is therefore
+solely a measure of the block layer performance. On most modern Linux systems
+HZ is defined as 1000 making a tick equivalent to 1 millisecond.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzio_timestamp\fR
+.ad
+.RS 12n
+The time when a given I/O was submitted.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzio_delta\fR
+.ad
+.RS 12n
+The time required to service a given I/O.
+.RE
+
+.sp
+.ne 2
+.na
+\fBprev_state\fR
+.ad
+.RS 12n
+The previous state of the vdev.
+.RE
+
+.sp
+.ne 2
+.na
+\fBcksum_expected\fR
+.ad
+.RS 12n
+The expected checksum value.
+.RE
+
+.sp
+.ne 2
+.na
+\fBcksum_actual\fR
+.ad
+.RS 12n
+The actual/current checksum value.
+.RE
+
+.sp
+.ne 2
+.na
+\fBcksum_algorithm\fR
+.ad
+.RS 12n
+Checksum algorithm used. See \fBzfs\fR(8) for more information on checksum algorithms available.
+.RE
+
+.sp
+.ne 2
+.na
+\fBcksum_byteswap\fR
+.ad
+.RS 12n
+Checksum value is byte swapped.
+.RE
+
+.sp
+.ne 2
+.na
+\fBbad_ranges\fR
+.ad
+.RS 12n
+Checksum bad offset ranges.
+.RE
+
+.sp
+.ne 2
+.na
+\fBbad_ranges_min_gap\fR
+.ad
+.RS 12n
+Checksum allowed minimum gap.
+.RE
+
+.sp
+.ne 2
+.na
+\fBbad_range_sets\fR
+.ad
+.RS 12n
+Checksum for each range the number of bits set.
+.RE
+
+.sp
+.ne 2
+.na
+\fBbad_range_clears\fR
+.ad
+.RS 12n
+Checksum for each range the number of bits cleared.
+.RE
+
+.sp
+.ne 2
+.na
+\fBbad_set_bits\fR
+.ad
+.RS 12n
+Checksum array of bits set.
+.RE
+
+.sp
+.ne 2
+.na
+\fBbad_cleared_bits\fR
+.ad
+.RS 12n
+Checksum array of bits cleared.
+.RE
+
+.sp
+.ne 2
+.na
+\fBbad_set_histogram\fR
+.ad
+.RS 12n
+Checksum histogram of set bits by bit number in a 64-bit word.
+.RE
+
+.sp
+.ne 2
+.na
+\fBbad_cleared_histogram\fR
+.ad
+.RS 12n
+Checksum histogram of cleared bits by bit number in a 64-bit word.
+.RE
+
+.SS "I/O STAGES"
+.sp
+.LP
+The ZFS I/O pipeline is comprised of various stages which are defined
+below. The individual stages are used to construct these basic I/O
+operations: Read, Write, Free, Claim, and Ioctl. These stages may be
+set on an event to describe the life cycle of a given I/O.
+
+.TS
+tab(:);
+l l l .
+Stage:Bit Mask:Operations
+_:_:_
+ZIO_STAGE_OPEN:0x00000001:RWFCI
+
+ZIO_STAGE_READ_BP_INIT:0x00000002:R----
+ZIO_STAGE_FREE_BP_INIT:0x00000004:--F--
+ZIO_STAGE_ISSUE_ASYNC:0x00000008:RWF--
+ZIO_STAGE_WRITE_BP_INIT:0x00000010:-W---
+
+ZIO_STAGE_CHECKSUM_GENERATE:0x00000020:-W---
+
+ZIO_STAGE_NOP_WRITE:0x00000040:-W---
+
+ZIO_STAGE_DDT_READ_START:0x00000080:R----
+ZIO_STAGE_DDT_READ_DONE:0x00000100:R----
+ZIO_STAGE_DDT_WRITE:0x00000200:-W---
+ZIO_STAGE_DDT_FREE:0x00000400:--F--
+
+ZIO_STAGE_GANG_ASSEMBLE:0x00000800:RWFC-
+ZIO_STAGE_GANG_ISSUE:0x00001000:RWFC-
+
+ZIO_STAGE_DVA_ALLOCATE:0x00002000:-W---
+ZIO_STAGE_DVA_FREE:0x00004000:--F--
+ZIO_STAGE_DVA_CLAIM:0x00008000:---C-
+
+ZIO_STAGE_READY:0x00010000:RWFCI
+
+ZIO_STAGE_VDEV_IO_START:0x00020000:RW--I
+ZIO_STAGE_VDEV_IO_DONE:0x00040000:RW--I
+ZIO_STAGE_VDEV_IO_ASSESS:0x00080000:RW--I
+
+ZIO_STAGE_CHECKSUM_VERIFY0:0x00100000:R----
+
+ZIO_STAGE_DONE:0x00200000:RWFCI
+.TE
+
+.SS "I/O FLAGS"
+.sp
+.LP
+Every I/O in the pipeline contains a set of flags which describe its
+function and are used to govern its behavior. These flags will be set
+in an event as an \fBzio_flags\fR payload entry.
+
+.TS
+tab(:);
+l l .
+Flag:Bit Mask
+_:_
+ZIO_FLAG_DONT_AGGREGATE:0x00000001
+ZIO_FLAG_IO_REPAIR:0x00000002
+ZIO_FLAG_SELF_HEAL:0x00000004
+ZIO_FLAG_RESILVER:0x00000008
+ZIO_FLAG_SCRUB:0x00000010
+ZIO_FLAG_SCAN_THREAD:0x00000020
+ZIO_FLAG_PHYSICAL:0x00000040
+
+ZIO_FLAG_CANFAIL:0x00000080
+ZIO_FLAG_SPECULATIVE:0x00000100
+ZIO_FLAG_CONFIG_WRITER:0x00000200
+ZIO_FLAG_DONT_RETRY:0x00000400
+ZIO_FLAG_DONT_CACHE:0x00000800
+ZIO_FLAG_NODATA:0x00001000
+ZIO_FLAG_INDUCE_DAMAGE:0x00002000
+
+ZIO_FLAG_IO_RETRY:0x00004000
+ZIO_FLAG_PROBE:0x00008000
+ZIO_FLAG_TRYHARD:0x00010000
+ZIO_FLAG_OPTIONAL:0x00020000
+
+ZIO_FLAG_DONT_QUEUE:0x00040000
+ZIO_FLAG_DONT_PROPAGATE:0x00080000
+ZIO_FLAG_IO_BYPASS:0x00100000
+ZIO_FLAG_IO_REWRITE:0x00200000
+ZIO_FLAG_RAW:0x00400000
+ZIO_FLAG_GANG_CHILD:0x00800000
+ZIO_FLAG_DDT_CHILD:0x01000000
+ZIO_FLAG_GODFATHER:0x02000000
+ZIO_FLAG_NOPWRITE:0x04000000
+ZIO_FLAG_REEXECUTED:0x08000000
+ZIO_FLAG_DELEGATED:0x10000000
+ZIO_FLAG_FASTWRITE:0x20000000
+.TE
diff --git a/man/man5/zfs-module-parameters.5 b/man/man5/zfs-module-parameters.5
new file mode 100644
index 000000000..dbfa8806a
--- /dev/null
+++ b/man/man5/zfs-module-parameters.5
@@ -0,0 +1,2754 @@
+'\" te
+.\" Copyright (c) 2013 by Turbo Fredriksson <[email protected]>. All rights reserved.
+.\" Copyright (c) 2017 Datto Inc.
+.\" The contents of this file are subject to the terms of the Common Development
+.\" and Distribution License (the "License"). You may not use this file except
+.\" in compliance with the License. You can obtain a copy of the license at
+.\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing.
+.\"
+.\" See the License for the specific language governing permissions and
+.\" limitations under the License. When distributing Covered Code, include this
+.\" CDDL HEADER in each file and include the License file at
+.\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this
+.\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your
+.\" own identifying information:
+.\" Portions Copyright [yyyy] [name of copyright owner]
+.TH ZFS-MODULE-PARAMETERS 5 "Oct 28, 2017"
+.SH NAME
+zfs\-module\-parameters \- ZFS module parameters
+.SH DESCRIPTION
+.sp
+.LP
+Description of the different parameters to the ZFS module.
+
+.SS "Module parameters"
+.sp
+.LP
+
+.sp
+.ne 2
+.na
+\fBdbuf_cache_max_bytes\fR (ulong)
+.ad
+.RS 12n
+Maximum size in bytes of the dbuf cache. When \fB0\fR this value will default
+to \fB1/2^dbuf_cache_shift\fR (1/32) of the target ARC size, otherwise the
+provided value in bytes will be used. The behavior of the dbuf cache and its
+associated settings can be observed via the \fB/proc/spl/kstat/zfs/dbufstats\fR
+kstat.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBdbuf_cache_hiwater_pct\fR (uint)
+.ad
+.RS 12n
+The percentage over \fBdbuf_cache_max_bytes\fR when dbufs must be evicted
+directly.
+.sp
+Default value: \fB10\fR%.
+.RE
+
+.sp
+.ne 2
+.na
+\fBdbuf_cache_lowater_pct\fR (uint)
+.ad
+.RS 12n
+The percentage below \fBdbuf_cache_max_bytes\fR when the evict thread stops
+evicting dbufs.
+.sp
+Default value: \fB10\fR%.
+.RE
+
+.sp
+.ne 2
+.na
+\fBdbuf_cache_shift\fR (int)
+.ad
+.RS 12n
+Set the size of the dbuf cache, \fBdbuf_cache_max_bytes\fR, to a log2 fraction
+of the target arc size.
+.sp
+Default value: \fB5\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBignore_hole_birth\fR (int)
+.ad
+.RS 12n
+When set, the hole_birth optimization will not be used, and all holes will
+always be sent on zfs send. Useful if you suspect your datasets are affected
+by a bug in hole_birth.
+.sp
+Use \fB1\fR for on (default) and \fB0\fR for off.
+.RE
+
+.sp
+.ne 2
+.na
+\fBl2arc_feed_again\fR (int)
+.ad
+.RS 12n
+Turbo L2ARC warm-up. When the L2ARC is cold the fill interval will be set as
+fast as possible.
+.sp
+Use \fB1\fR for yes (default) and \fB0\fR to disable.
+.RE
+
+.sp
+.ne 2
+.na
+\fBl2arc_feed_min_ms\fR (ulong)
+.ad
+.RS 12n
+Min feed interval in milliseconds. Requires \fBl2arc_feed_again=1\fR and only
+applicable in related situations.
+.sp
+Default value: \fB200\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBl2arc_feed_secs\fR (ulong)
+.ad
+.RS 12n
+Seconds between L2ARC writing
+.sp
+Default value: \fB1\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBl2arc_headroom\fR (ulong)
+.ad
+.RS 12n
+How far through the ARC lists to search for L2ARC cacheable content, expressed
+as a multiplier of \fBl2arc_write_max\fR
+.sp
+Default value: \fB2\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBl2arc_headroom_boost\fR (ulong)
+.ad
+.RS 12n
+Scales \fBl2arc_headroom\fR by this percentage when L2ARC contents are being
+successfully compressed before writing. A value of 100 disables this feature.
+.sp
+Default value: \fB200\fR%.
+.RE
+
+.sp
+.ne 2
+.na
+\fBl2arc_noprefetch\fR (int)
+.ad
+.RS 12n
+Do not write buffers to L2ARC if they were prefetched but not used by
+applications
+.sp
+Use \fB1\fR for yes (default) and \fB0\fR to disable.
+.RE
+
+.sp
+.ne 2
+.na
+\fBl2arc_norw\fR (int)
+.ad
+.RS 12n
+No reads during writes
+.sp
+Use \fB1\fR for yes and \fB0\fR for no (default).
+.RE
+
+.sp
+.ne 2
+.na
+\fBl2arc_write_boost\fR (ulong)
+.ad
+.RS 12n
+Cold L2ARC devices will have \fBl2arc_write_max\fR increased by this amount
+while they remain cold.
+.sp
+Default value: \fB8,388,608\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBl2arc_write_max\fR (ulong)
+.ad
+.RS 12n
+Max write bytes per interval
+.sp
+Default value: \fB8,388,608\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBmetaslab_aliquot\fR (ulong)
+.ad
+.RS 12n
+Metaslab granularity, in bytes. This is roughly similar to what would be
+referred to as the "stripe size" in traditional RAID arrays. In normal
+operation, ZFS will try to write this amount of data to a top-level vdev
+before moving on to the next one.
+.sp
+Default value: \fB524,288\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBmetaslab_bias_enabled\fR (int)
+.ad
+.RS 12n
+Enable metaslab group biasing based on its vdev's over- or under-utilization
+relative to the pool.
+.sp
+Use \fB1\fR for yes (default) and \fB0\fR for no.
+.RE
+
+.sp
+.ne 2
+.na
+\fBmetaslab_force_ganging\fR (ulong)
+.ad
+.RS 12n
+Make some blocks above a certain size be gang blocks. This option is used
+by the test suite to facilitate testing.
+.sp
+Default value: \fB16,777,217\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_metaslab_segment_weight_enabled\fR (int)
+.ad
+.RS 12n
+Enable/disable segment-based metaslab selection.
+.sp
+Use \fB1\fR for yes (default) and \fB0\fR for no.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_metaslab_switch_threshold\fR (int)
+.ad
+.RS 12n
+When using segment-based metaslab selection, continue allocating
+from the active metaslab until \fBzfs_metaslab_switch_threshold\fR
+worth of buckets have been exhausted.
+.sp
+Default value: \fB2\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBmetaslab_debug_load\fR (int)
+.ad
+.RS 12n
+Load all metaslabs during pool import.
+.sp
+Use \fB1\fR for yes and \fB0\fR for no (default).
+.RE
+
+.sp
+.ne 2
+.na
+\fBmetaslab_debug_unload\fR (int)
+.ad
+.RS 12n
+Prevent metaslabs from being unloaded.
+.sp
+Use \fB1\fR for yes and \fB0\fR for no (default).
+.RE
+
+.sp
+.ne 2
+.na
+\fBmetaslab_fragmentation_factor_enabled\fR (int)
+.ad
+.RS 12n
+Enable use of the fragmentation metric in computing metaslab weights.
+.sp
+Use \fB1\fR for yes (default) and \fB0\fR for no.
+.RE
+
+.sp
+.ne 2
+.na
+\fBmetaslabs_per_vdev\fR (int)
+.ad
+.RS 12n
+When a vdev is added, it will be divided into approximately (but no more than) this number of metaslabs.
+.sp
+Default value: \fB200\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBmetaslab_preload_enabled\fR (int)
+.ad
+.RS 12n
+Enable metaslab group preloading.
+.sp
+Use \fB1\fR for yes (default) and \fB0\fR for no.
+.RE
+
+.sp
+.ne 2
+.na
+\fBmetaslab_lba_weighting_enabled\fR (int)
+.ad
+.RS 12n
+Give more weight to metaslabs with lower LBAs, assuming they have
+greater bandwidth as is typically the case on a modern constant
+angular velocity disk drive.
+.sp
+Use \fB1\fR for yes (default) and \fB0\fR for no.
+.RE
+
+.sp
+.ne 2
+.na
+\fBspa_config_path\fR (charp)
+.ad
+.RS 12n
+SPA config file
+.sp
+Default value: \fB/etc/zfs/zpool.cache\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBspa_asize_inflation\fR (int)
+.ad
+.RS 12n
+Multiplication factor used to estimate actual disk consumption from the
+size of data being written. The default value is a worst case estimate,
+but lower values may be valid for a given pool depending on its
+configuration. Pool administrators who understand the factors involved
+may wish to specify a more realistic inflation factor, particularly if
+they operate close to quota or capacity limits.
+.sp
+Default value: \fB24\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBspa_load_print_vdev_tree\fR (int)
+.ad
+.RS 12n
+Whether to print the vdev tree in the debugging message buffer during pool import.
+Use 0 to disable and 1 to enable.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBspa_load_verify_data\fR (int)
+.ad
+.RS 12n
+Whether to traverse data blocks during an "extreme rewind" (\fB-X\fR)
+import. Use 0 to disable and 1 to enable.
+
+An extreme rewind import normally performs a full traversal of all
+blocks in the pool for verification. If this parameter is set to 0,
+the traversal skips non-metadata blocks. It can be toggled once the
+import has started to stop or start the traversal of non-metadata blocks.
+.sp
+Default value: \fB1\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBspa_load_verify_metadata\fR (int)
+.ad
+.RS 12n
+Whether to traverse blocks during an "extreme rewind" (\fB-X\fR)
+pool import. Use 0 to disable and 1 to enable.
+
+An extreme rewind import normally performs a full traversal of all
+blocks in the pool for verification. If this parameter is set to 0,
+the traversal is not performed. It can be toggled once the import has
+started to stop or start the traversal.
+.sp
+Default value: \fB1\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBspa_load_verify_maxinflight\fR (int)
+.ad
+.RS 12n
+Maximum concurrent I/Os during the traversal performed during an "extreme
+rewind" (\fB-X\fR) pool import.
+.sp
+Default value: \fB10000\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBspa_slop_shift\fR (int)
+.ad
+.RS 12n
+Normally, we don't allow the last 3.2% (1/(2^spa_slop_shift)) of space
+in the pool to be consumed. This ensures that we don't run the pool
+completely out of space, due to unaccounted changes (e.g. to the MOS).
+It also limits the worst-case time to allocate space. If we have
+less than this amount of free space, most ZPL operations (e.g. write,
+create) will return ENOSPC.
+.sp
+Default value: \fB5\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBvdev_removal_max_span\fR (int)
+.ad
+.RS 12n
+During top-level vdev removal, chunks of data are copied from the vdev
+which may include free space in order to trade bandwidth for IOPS.
+This parameter determines the maximum span of free space (in bytes)
+which will be included as "unnecessary" data in a chunk of copied data.
+
+The default value here was chosen to align with
+\fBzfs_vdev_read_gap_limit\fR, which is a similar concept when doing
+regular reads (but there's no reason it has to be the same).
+.sp
+Default value: \fB32,768\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfetch_array_rd_sz\fR (ulong)
+.ad
+.RS 12n
+If prefetching is enabled, disable prefetching for reads larger than this size.
+.sp
+Default value: \fB1,048,576\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfetch_max_distance\fR (uint)
+.ad
+.RS 12n
+Max bytes to prefetch per stream (default 8MB).
+.sp
+Default value: \fB8,388,608\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfetch_max_streams\fR (uint)
+.ad
+.RS 12n
+Max number of streams per zfetch (prefetch streams per file).
+.sp
+Default value: \fB8\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfetch_min_sec_reap\fR (uint)
+.ad
+.RS 12n
+Min time before an active prefetch stream can be reclaimed
+.sp
+Default value: \fB2\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_arc_dnode_limit\fR (ulong)
+.ad
+.RS 12n
+When the number of bytes consumed by dnodes in the ARC exceeds this number of
+bytes, try to unpin some of it in response to demand for non-metadata. This
+value acts as a ceiling to the amount of dnode metadata, and defaults to 0 which
+indicates that a percent which is based on \fBzfs_arc_dnode_limit_percent\fR of
+the ARC meta buffers that may be used for dnodes.
+
+See also \fBzfs_arc_meta_prune\fR which serves a similar purpose but is used
+when the amount of metadata in the ARC exceeds \fBzfs_arc_meta_limit\fR rather
+than in response to overall demand for non-metadata.
+
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_arc_dnode_limit_percent\fR (ulong)
+.ad
+.RS 12n
+Percentage that can be consumed by dnodes of ARC meta buffers.
+.sp
+See also \fBzfs_arc_dnode_limit\fR which serves a similar purpose but has a
+higher priority if set to nonzero value.
+.sp
+Default value: \fB10\fR%.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_arc_dnode_reduce_percent\fR (ulong)
+.ad
+.RS 12n
+Percentage of ARC dnodes to try to scan in response to demand for non-metadata
+when the number of bytes consumed by dnodes exceeds \fBzfs_arc_dnode_limit\fR.
+
+.sp
+Default value: \fB10\fR% of the number of dnodes in the ARC.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_arc_average_blocksize\fR (int)
+.ad
+.RS 12n
+The ARC's buffer hash table is sized based on the assumption of an average
+block size of \fBzfs_arc_average_blocksize\fR (default 8K). This works out
+to roughly 1MB of hash table per 1GB of physical memory with 8-byte pointers.
+For configurations with a known larger average block size this value can be
+increased to reduce the memory footprint.
+
+.sp
+Default value: \fB8192\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_arc_evict_batch_limit\fR (int)
+.ad
+.RS 12n
+Number ARC headers to evict per sub-list before proceeding to another sub-list.
+This batch-style operation prevents entire sub-lists from being evicted at once
+but comes at a cost of additional unlocking and locking.
+.sp
+Default value: \fB10\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_arc_grow_retry\fR (int)
+.ad
+.RS 12n
+If set to a non zero value, it will replace the arc_grow_retry value with this value.
+The arc_grow_retry value (default 5) is the number of seconds the ARC will wait before
+trying to resume growth after a memory pressure event.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_arc_lotsfree_percent\fR (int)
+.ad
+.RS 12n
+Throttle I/O when free system memory drops below this percentage of total
+system memory. Setting this value to 0 will disable the throttle.
+.sp
+Default value: \fB10\fR%.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_arc_max\fR (ulong)
+.ad
+.RS 12n
+Max arc size of ARC in bytes. If set to 0 then it will consume 1/2 of system
+RAM. This value must be at least 67108864 (64 megabytes).
+.sp
+This value can be changed dynamically with some caveats. It cannot be set back
+to 0 while running and reducing it below the current ARC size will not cause
+the ARC to shrink without memory pressure to induce shrinking.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_arc_meta_adjust_restarts\fR (ulong)
+.ad
+.RS 12n
+The number of restart passes to make while scanning the ARC attempting
+the free buffers in order to stay below the \fBzfs_arc_meta_limit\fR.
+This value should not need to be tuned but is available to facilitate
+performance analysis.
+.sp
+Default value: \fB4096\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_arc_meta_limit\fR (ulong)
+.ad
+.RS 12n
+The maximum allowed size in bytes that meta data buffers are allowed to
+consume in the ARC. When this limit is reached meta data buffers will
+be reclaimed even if the overall arc_c_max has not been reached. This
+value defaults to 0 which indicates that a percent which is based on
+\fBzfs_arc_meta_limit_percent\fR of the ARC may be used for meta data.
+.sp
+This value my be changed dynamically except that it cannot be set back to 0
+for a specific percent of the ARC; it must be set to an explicit value.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_arc_meta_limit_percent\fR (ulong)
+.ad
+.RS 12n
+Percentage of ARC buffers that can be used for meta data.
+
+See also \fBzfs_arc_meta_limit\fR which serves a similar purpose but has a
+higher priority if set to nonzero value.
+
+.sp
+Default value: \fB75\fR%.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_arc_meta_min\fR (ulong)
+.ad
+.RS 12n
+The minimum allowed size in bytes that meta data buffers may consume in
+the ARC. This value defaults to 0 which disables a floor on the amount
+of the ARC devoted meta data.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_arc_meta_prune\fR (int)
+.ad
+.RS 12n
+The number of dentries and inodes to be scanned looking for entries
+which can be dropped. This may be required when the ARC reaches the
+\fBzfs_arc_meta_limit\fR because dentries and inodes can pin buffers
+in the ARC. Increasing this value will cause to dentry and inode caches
+to be pruned more aggressively. Setting this value to 0 will disable
+pruning the inode and dentry caches.
+.sp
+Default value: \fB10,000\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_arc_meta_strategy\fR (int)
+.ad
+.RS 12n
+Define the strategy for ARC meta data buffer eviction (meta reclaim strategy).
+A value of 0 (META_ONLY) will evict only the ARC meta data buffers.
+A value of 1 (BALANCED) indicates that additional data buffers may be evicted if
+that is required to in order to evict the required number of meta data buffers.
+.sp
+Default value: \fB1\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_arc_min\fR (ulong)
+.ad
+.RS 12n
+Min arc size of ARC in bytes. If set to 0 then arc_c_min will default to
+consuming the larger of 32M or 1/32 of total system memory.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_arc_min_prefetch_ms\fR (int)
+.ad
+.RS 12n
+Minimum time prefetched blocks are locked in the ARC, specified in ms.
+A value of \fB0\fR will default to 1000 ms.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_arc_min_prescient_prefetch_ms\fR (int)
+.ad
+.RS 12n
+Minimum time "prescient prefetched" blocks are locked in the ARC, specified
+in ms. These blocks are meant to be prefetched fairly aggresively ahead of
+the code that may use them. A value of \fB0\fR will default to 6000 ms.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_max_missing_tvds\fR (int)
+.ad
+.RS 12n
+Number of missing top-level vdevs which will be allowed during
+pool import (only in read-only mode).
+.sp
+Default value: \fB0\fR
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_multilist_num_sublists\fR (int)
+.ad
+.RS 12n
+To allow more fine-grained locking, each ARC state contains a series
+of lists for both data and meta data objects. Locking is performed at
+the level of these "sub-lists". This parameters controls the number of
+sub-lists per ARC state, and also applies to other uses of the
+multilist data structure.
+.sp
+Default value: \fB4\fR or the number of online CPUs, whichever is greater
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_arc_overflow_shift\fR (int)
+.ad
+.RS 12n
+The ARC size is considered to be overflowing if it exceeds the current
+ARC target size (arc_c) by a threshold determined by this parameter.
+The threshold is calculated as a fraction of arc_c using the formula
+"arc_c >> \fBzfs_arc_overflow_shift\fR".
+
+The default value of 8 causes the ARC to be considered to be overflowing
+if it exceeds the target size by 1/256th (0.3%) of the target size.
+
+When the ARC is overflowing, new buffer allocations are stalled until
+the reclaim thread catches up and the overflow condition no longer exists.
+.sp
+Default value: \fB8\fR.
+.RE
+
+.sp
+.ne 2
+.na
+
+\fBzfs_arc_p_min_shift\fR (int)
+.ad
+.RS 12n
+If set to a non zero value, this will update arc_p_min_shift (default 4)
+with the new value.
+arc_p_min_shift is used to shift of arc_c for calculating both min and max
+max arc_p
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_arc_p_dampener_disable\fR (int)
+.ad
+.RS 12n
+Disable arc_p adapt dampener
+.sp
+Use \fB1\fR for yes (default) and \fB0\fR to disable.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_arc_shrink_shift\fR (int)
+.ad
+.RS 12n
+If set to a non zero value, this will update arc_shrink_shift (default 7)
+with the new value.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_arc_pc_percent\fR (uint)
+.ad
+.RS 12n
+Percent of pagecache to reclaim arc to
+
+This tunable allows ZFS arc to play more nicely with the kernel's LRU
+pagecache. It can guarantee that the arc size won't collapse under scanning
+pressure on the pagecache, yet still allows arc to be reclaimed down to
+zfs_arc_min if necessary. This value is specified as percent of pagecache
+size (as measured by NR_FILE_PAGES) where that percent may exceed 100. This
+only operates during memory pressure/reclaim.
+.sp
+Default value: \fB0\fR% (disabled).
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_arc_sys_free\fR (ulong)
+.ad
+.RS 12n
+The target number of bytes the ARC should leave as free memory on the system.
+Defaults to the larger of 1/64 of physical memory or 512K. Setting this
+option to a non-zero value will override the default.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_autoimport_disable\fR (int)
+.ad
+.RS 12n
+Disable pool import at module load by ignoring the cache file (typically \fB/etc/zfs/zpool.cache\fR).
+.sp
+Use \fB1\fR for yes (default) and \fB0\fR for no.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_checksums_per_second\fR (int)
+.ad
+.RS 12n
+Rate limit checksum events to this many per second. Note that this should
+not be set below the zed thresholds (currently 10 checksums over 10 sec)
+or else zed may not trigger any action.
+.sp
+Default value: 20
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_commit_timeout_pct\fR (int)
+.ad
+.RS 12n
+This controls the amount of time that a ZIL block (lwb) will remain "open"
+when it isn't "full", and it has a thread waiting for it to be committed to
+stable storage. The timeout is scaled based on a percentage of the last lwb
+latency to avoid significantly impacting the latency of each individual
+transaction record (itx).
+.sp
+Default value: \fB5\fR%.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_condense_indirect_vdevs_enable\fR (int)
+.ad
+.RS 12n
+Enable condensing indirect vdev mappings. When set to a non-zero value,
+attempt to condense indirect vdev mappings if the mapping uses more than
+\fBzfs_condense_min_mapping_bytes\fR bytes of memory and if the obsolete
+space map object uses more than \fBzfs_condense_max_obsolete_bytes\fR
+bytes on-disk. The condensing process is an attempt to save memory by
+removing obsolete mappings.
+.sp
+Default value: \fB1\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_condense_max_obsolete_bytes\fR (ulong)
+.ad
+.RS 12n
+Only attempt to condense indirect vdev mappings if the on-disk size
+of the obsolete space map object is greater than this number of bytes
+(see \fBfBzfs_condense_indirect_vdevs_enable\fR).
+.sp
+Default value: \fB1,073,741,824\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_condense_min_mapping_bytes\fR (ulong)
+.ad
+.RS 12n
+Minimum size vdev mapping to attempt to condense (see
+\fBzfs_condense_indirect_vdevs_enable\fR).
+.sp
+Default value: \fB131,072\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_dbgmsg_enable\fR (int)
+.ad
+.RS 12n
+Internally ZFS keeps a small log to facilitate debugging. By default the log
+is disabled, to enable it set this option to 1. The contents of the log can
+be accessed by reading the /proc/spl/kstat/zfs/dbgmsg file. Writing 0 to
+this proc file clears the log.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_dbgmsg_maxsize\fR (int)
+.ad
+.RS 12n
+The maximum size in bytes of the internal ZFS debug log.
+.sp
+Default value: \fB4M\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_dbuf_state_index\fR (int)
+.ad
+.RS 12n
+This feature is currently unused. It is normally used for controlling what
+reporting is available under /proc/spl/kstat/zfs.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_deadman_enabled\fR (int)
+.ad
+.RS 12n
+When a pool sync operation takes longer than \fBzfs_deadman_synctime_ms\fR
+milliseconds, or when an individual I/O takes longer than
+\fBzfs_deadman_ziotime_ms\fR milliseconds, then the operation is considered to
+be "hung". If \fBzfs_deadman_enabled\fR is set then the deadman behavior is
+invoked as described by the \fBzfs_deadman_failmode\fR module option.
+By default the deadman is enabled and configured to \fBwait\fR which results
+in "hung" I/Os only being logged. The deadman is automatically disabled
+when a pool gets suspended.
+.sp
+Default value: \fB1\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_deadman_failmode\fR (charp)
+.ad
+.RS 12n
+Controls the failure behavior when the deadman detects a "hung" I/O. Valid
+values are \fBwait\fR, \fBcontinue\fR, and \fBpanic\fR.
+.sp
+\fBwait\fR - Wait for a "hung" I/O to complete. For each "hung" I/O a
+"deadman" event will be posted describing that I/O.
+.sp
+\fBcontinue\fR - Attempt to recover from a "hung" I/O by re-dispatching it
+to the I/O pipeline if possible.
+.sp
+\fBpanic\fR - Panic the system. This can be used to facilitate an automatic
+fail-over to a properly configured fail-over partner.
+.sp
+Default value: \fBwait\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_deadman_checktime_ms\fR (int)
+.ad
+.RS 12n
+Check time in milliseconds. This defines the frequency at which we check
+for hung I/O and potentially invoke the \fBzfs_deadman_failmode\fR behavior.
+.sp
+Default value: \fB60,000\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_deadman_synctime_ms\fR (ulong)
+.ad
+.RS 12n
+Interval in milliseconds after which the deadman is triggered and also
+the interval after which a pool sync operation is considered to be "hung".
+Once this limit is exceeded the deadman will be invoked every
+\fBzfs_deadman_checktime_ms\fR milliseconds until the pool sync completes.
+.sp
+Default value: \fB600,000\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_deadman_ziotime_ms\fR (ulong)
+.ad
+.RS 12n
+Interval in milliseconds after which the deadman is triggered and an
+individual IO operation is considered to be "hung". As long as the I/O
+remains "hung" the deadman will be invoked every \fBzfs_deadman_checktime_ms\fR
+milliseconds until the I/O completes.
+.sp
+Default value: \fB300,000\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_dedup_prefetch\fR (int)
+.ad
+.RS 12n
+Enable prefetching dedup-ed blks
+.sp
+Use \fB1\fR for yes and \fB0\fR to disable (default).
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_delay_min_dirty_percent\fR (int)
+.ad
+.RS 12n
+Start to delay each transaction once there is this amount of dirty data,
+expressed as a percentage of \fBzfs_dirty_data_max\fR.
+This value should be >= zfs_vdev_async_write_active_max_dirty_percent.
+See the section "ZFS TRANSACTION DELAY".
+.sp
+Default value: \fB60\fR%.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_delay_scale\fR (int)
+.ad
+.RS 12n
+This controls how quickly the transaction delay approaches infinity.
+Larger values cause longer delays for a given amount of dirty data.
+.sp
+For the smoothest delay, this value should be about 1 billion divided
+by the maximum number of operations per second. This will smoothly
+handle between 10x and 1/10th this number.
+.sp
+See the section "ZFS TRANSACTION DELAY".
+.sp
+Note: \fBzfs_delay_scale\fR * \fBzfs_dirty_data_max\fR must be < 2^64.
+.sp
+Default value: \fB500,000\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_delays_per_second\fR (int)
+.ad
+.RS 12n
+Rate limit IO delay events to this many per second.
+.sp
+Default value: 20
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_delete_blocks\fR (ulong)
+.ad
+.RS 12n
+This is the used to define a large file for the purposes of delete. Files
+containing more than \fBzfs_delete_blocks\fR will be deleted asynchronously
+while smaller files are deleted synchronously. Decreasing this value will
+reduce the time spent in an unlink(2) system call at the expense of a longer
+delay before the freed space is available.
+.sp
+Default value: \fB20,480\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_dirty_data_max\fR (int)
+.ad
+.RS 12n
+Determines the dirty space limit in bytes. Once this limit is exceeded, new
+writes are halted until space frees up. This parameter takes precedence
+over \fBzfs_dirty_data_max_percent\fR.
+See the section "ZFS TRANSACTION DELAY".
+.sp
+Default value: \fB10\fR% of physical RAM, capped at \fBzfs_dirty_data_max_max\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_dirty_data_max_max\fR (int)
+.ad
+.RS 12n
+Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed in bytes.
+This limit is only enforced at module load time, and will be ignored if
+\fBzfs_dirty_data_max\fR is later changed. This parameter takes
+precedence over \fBzfs_dirty_data_max_max_percent\fR. See the section
+"ZFS TRANSACTION DELAY".
+.sp
+Default value: \fB25\fR% of physical RAM.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_dirty_data_max_max_percent\fR (int)
+.ad
+.RS 12n
+Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed as a
+percentage of physical RAM. This limit is only enforced at module load
+time, and will be ignored if \fBzfs_dirty_data_max\fR is later changed.
+The parameter \fBzfs_dirty_data_max_max\fR takes precedence over this
+one. See the section "ZFS TRANSACTION DELAY".
+.sp
+Default value: \fB25\fR%.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_dirty_data_max_percent\fR (int)
+.ad
+.RS 12n
+Determines the dirty space limit, expressed as a percentage of all
+memory. Once this limit is exceeded, new writes are halted until space frees
+up. The parameter \fBzfs_dirty_data_max\fR takes precedence over this
+one. See the section "ZFS TRANSACTION DELAY".
+.sp
+Default value: \fB10\fR%, subject to \fBzfs_dirty_data_max_max\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_dirty_data_sync\fR (int)
+.ad
+.RS 12n
+Start syncing out a transaction group if there is at least this much dirty data.
+.sp
+Default value: \fB67,108,864\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_fletcher_4_impl\fR (string)
+.ad
+.RS 12n
+Select a fletcher 4 implementation.
+.sp
+Supported selectors are: \fBfastest\fR, \fBscalar\fR, \fBsse2\fR, \fBssse3\fR,
+\fBavx2\fR, \fBavx512f\fR, and \fBaarch64_neon\fR.
+All of the selectors except \fBfastest\fR and \fBscalar\fR require instruction
+set extensions to be available and will only appear if ZFS detects that they are
+present at runtime. If multiple implementations of fletcher 4 are available,
+the \fBfastest\fR will be chosen using a micro benchmark. Selecting \fBscalar\fR
+results in the original, CPU based calculation, being used. Selecting any option
+other than \fBfastest\fR and \fBscalar\fR results in vector instructions from
+the respective CPU instruction set being used.
+.sp
+Default value: \fBfastest\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_free_bpobj_enabled\fR (int)
+.ad
+.RS 12n
+Enable/disable the processing of the free_bpobj object.
+.sp
+Default value: \fB1\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_async_block_max_blocks\fR (ulong)
+.ad
+.RS 12n
+Maximum number of blocks freed in a single txg.
+.sp
+Default value: \fB100,000\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_override_estimate_recordsize\fR (ulong)
+.ad
+.RS 12n
+Record size calculation override for zfs send estimates.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_async_read_max_active\fR (int)
+.ad
+.RS 12n
+Maximum asynchronous read I/Os active to each device.
+See the section "ZFS I/O SCHEDULER".
+.sp
+Default value: \fB3\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_async_read_min_active\fR (int)
+.ad
+.RS 12n
+Minimum asynchronous read I/Os active to each device.
+See the section "ZFS I/O SCHEDULER".
+.sp
+Default value: \fB1\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_async_write_active_max_dirty_percent\fR (int)
+.ad
+.RS 12n
+When the pool has more than
+\fBzfs_vdev_async_write_active_max_dirty_percent\fR dirty data, use
+\fBzfs_vdev_async_write_max_active\fR to limit active async writes. If
+the dirty data is between min and max, the active I/O limit is linearly
+interpolated. See the section "ZFS I/O SCHEDULER".
+.sp
+Default value: \fB60\fR%.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_async_write_active_min_dirty_percent\fR (int)
+.ad
+.RS 12n
+When the pool has less than
+\fBzfs_vdev_async_write_active_min_dirty_percent\fR dirty data, use
+\fBzfs_vdev_async_write_min_active\fR to limit active async writes. If
+the dirty data is between min and max, the active I/O limit is linearly
+interpolated. See the section "ZFS I/O SCHEDULER".
+.sp
+Default value: \fB30\fR%.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_async_write_max_active\fR (int)
+.ad
+.RS 12n
+Maximum asynchronous write I/Os active to each device.
+See the section "ZFS I/O SCHEDULER".
+.sp
+Default value: \fB10\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_async_write_min_active\fR (int)
+.ad
+.RS 12n
+Minimum asynchronous write I/Os active to each device.
+See the section "ZFS I/O SCHEDULER".
+.sp
+Lower values are associated with better latency on rotational media but poorer
+resilver performance. The default value of 2 was chosen as a compromise. A
+value of 3 has been shown to improve resilver performance further at a cost of
+further increasing latency.
+.sp
+Default value: \fB2\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_max_active\fR (int)
+.ad
+.RS 12n
+The maximum number of I/Os active to each device. Ideally, this will be >=
+the sum of each queue's max_active. It must be at least the sum of each
+queue's min_active. See the section "ZFS I/O SCHEDULER".
+.sp
+Default value: \fB1,000\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_scrub_max_active\fR (int)
+.ad
+.RS 12n
+Maximum scrub I/Os active to each device.
+See the section "ZFS I/O SCHEDULER".
+.sp
+Default value: \fB2\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_scrub_min_active\fR (int)
+.ad
+.RS 12n
+Minimum scrub I/Os active to each device.
+See the section "ZFS I/O SCHEDULER".
+.sp
+Default value: \fB1\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_sync_read_max_active\fR (int)
+.ad
+.RS 12n
+Maximum synchronous read I/Os active to each device.
+See the section "ZFS I/O SCHEDULER".
+.sp
+Default value: \fB10\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_sync_read_min_active\fR (int)
+.ad
+.RS 12n
+Minimum synchronous read I/Os active to each device.
+See the section "ZFS I/O SCHEDULER".
+.sp
+Default value: \fB10\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_sync_write_max_active\fR (int)
+.ad
+.RS 12n
+Maximum synchronous write I/Os active to each device.
+See the section "ZFS I/O SCHEDULER".
+.sp
+Default value: \fB10\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_sync_write_min_active\fR (int)
+.ad
+.RS 12n
+Minimum synchronous write I/Os active to each device.
+See the section "ZFS I/O SCHEDULER".
+.sp
+Default value: \fB10\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_queue_depth_pct\fR (int)
+.ad
+.RS 12n
+Maximum number of queued allocations per top-level vdev expressed as
+a percentage of \fBzfs_vdev_async_write_max_active\fR which allows the
+system to detect devices that are more capable of handling allocations
+and to allocate more blocks to those devices. It allows for dynamic
+allocation distribution when devices are imbalanced as fuller devices
+will tend to be slower than empty devices.
+
+See also \fBzio_dva_throttle_enabled\fR.
+.sp
+Default value: \fB1000\fR%.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_expire_snapshot\fR (int)
+.ad
+.RS 12n
+Seconds to expire .zfs/snapshot
+.sp
+Default value: \fB300\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_admin_snapshot\fR (int)
+.ad
+.RS 12n
+Allow the creation, removal, or renaming of entries in the .zfs/snapshot
+directory to cause the creation, destruction, or renaming of snapshots.
+When enabled this functionality works both locally and over NFS exports
+which have the 'no_root_squash' option set. This functionality is disabled
+by default.
+.sp
+Use \fB1\fR for yes and \fB0\fR for no (default).
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_flags\fR (int)
+.ad
+.RS 12n
+Set additional debugging flags. The following flags may be bitwise-or'd
+together.
+.sp
+.TS
+box;
+rB lB
+lB lB
+r l.
+Value Symbolic Name
+ Description
+_
+1 ZFS_DEBUG_DPRINTF
+ Enable dprintf entries in the debug log.
+_
+2 ZFS_DEBUG_DBUF_VERIFY *
+ Enable extra dbuf verifications.
+_
+4 ZFS_DEBUG_DNODE_VERIFY *
+ Enable extra dnode verifications.
+_
+8 ZFS_DEBUG_SNAPNAMES
+ Enable snapshot name verification.
+_
+16 ZFS_DEBUG_MODIFY
+ Check for illegally modified ARC buffers.
+_
+64 ZFS_DEBUG_ZIO_FREE
+ Enable verification of block frees.
+_
+128 ZFS_DEBUG_HISTOGRAM_VERIFY
+ Enable extra spacemap histogram verifications.
+_
+256 ZFS_DEBUG_METASLAB_VERIFY
+ Verify space accounting on disk matches in-core range_trees.
+_
+512 ZFS_DEBUG_SET_ERROR
+ Enable SET_ERROR and dprintf entries in the debug log.
+.TE
+.sp
+* Requires debug build.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_free_leak_on_eio\fR (int)
+.ad
+.RS 12n
+If destroy encounters an EIO while reading metadata (e.g. indirect
+blocks), space referenced by the missing metadata can not be freed.
+Normally this causes the background destroy to become "stalled", as
+it is unable to make forward progress. While in this stalled state,
+all remaining space to free from the error-encountering filesystem is
+"temporarily leaked". Set this flag to cause it to ignore the EIO,
+permanently leak the space from indirect blocks that can not be read,
+and continue to free everything else that it can.
+
+The default, "stalling" behavior is useful if the storage partially
+fails (i.e. some but not all i/os fail), and then later recovers. In
+this case, we will be able to continue pool operations while it is
+partially failed, and when it recovers, we can continue to free the
+space, with no leaks. However, note that this case is actually
+fairly rare.
+
+Typically pools either (a) fail completely (but perhaps temporarily,
+e.g. a top-level vdev going offline), or (b) have localized,
+permanent errors (e.g. disk returns the wrong data due to bit flip or
+firmware bug). In case (a), this setting does not matter because the
+pool will be suspended and the sync thread will not be able to make
+forward progress regardless. In case (b), because the error is
+permanent, the best we can do is leak the minimum amount of space,
+which is what setting this flag will do. Therefore, it is reasonable
+for this flag to normally be set, but we chose the more conservative
+approach of not setting it, so that there is no possibility of
+leaking space in the "partial temporary" failure case.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_free_min_time_ms\fR (int)
+.ad
+.RS 12n
+During a \fBzfs destroy\fR operation using \fBfeature@async_destroy\fR a minimum
+of this much time will be spent working on freeing blocks per txg.
+.sp
+Default value: \fB1,000\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_immediate_write_sz\fR (long)
+.ad
+.RS 12n
+Largest data block to write to zil. Larger blocks will be treated as if the
+dataset being written to had the property setting \fBlogbias=throughput\fR.
+.sp
+Default value: \fB32,768\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_max_recordsize\fR (int)
+.ad
+.RS 12n
+We currently support block sizes from 512 bytes to 16MB. The benefits of
+larger blocks, and thus larger IO, need to be weighed against the cost of
+COWing a giant block to modify one byte. Additionally, very large blocks
+can have an impact on i/o latency, and also potentially on the memory
+allocator. Therefore, we do not allow the recordsize to be set larger than
+zfs_max_recordsize (default 1MB). Larger blocks can be created by changing
+this tunable, and pools with larger blocks can always be imported and used,
+regardless of this setting.
+.sp
+Default value: \fB1,048,576\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_metaslab_fragmentation_threshold\fR (int)
+.ad
+.RS 12n
+Allow metaslabs to keep their active state as long as their fragmentation
+percentage is less than or equal to this value. An active metaslab that
+exceeds this threshold will no longer keep its active status allowing
+better metaslabs to be selected.
+.sp
+Default value: \fB70\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_mg_fragmentation_threshold\fR (int)
+.ad
+.RS 12n
+Metaslab groups are considered eligible for allocations if their
+fragmentation metric (measured as a percentage) is less than or equal to
+this value. If a metaslab group exceeds this threshold then it will be
+skipped unless all metaslab groups within the metaslab class have also
+crossed this threshold.
+.sp
+Default value: \fB85\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_mg_noalloc_threshold\fR (int)
+.ad
+.RS 12n
+Defines a threshold at which metaslab groups should be eligible for
+allocations. The value is expressed as a percentage of free space
+beyond which a metaslab group is always eligible for allocations.
+If a metaslab group's free space is less than or equal to the
+threshold, the allocator will avoid allocating to that group
+unless all groups in the pool have reached the threshold. Once all
+groups have reached the threshold, all groups are allowed to accept
+allocations. The default value of 0 disables the feature and causes
+all metaslab groups to be eligible for allocations.
+
+This parameter allows one to deal with pools having heavily imbalanced
+vdevs such as would be the case when a new vdev has been added.
+Setting the threshold to a non-zero percentage will stop allocations
+from being made to vdevs that aren't filled to the specified percentage
+and allow lesser filled vdevs to acquire more allocations than they
+otherwise would under the old \fBzfs_mg_alloc_failures\fR facility.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_multihost_history\fR (int)
+.ad
+.RS 12n
+Historical statistics for the last N multihost updates will be available in
+\fB/proc/spl/kstat/zfs/<pool>/multihost\fR
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_multihost_interval\fR (ulong)
+.ad
+.RS 12n
+Used to control the frequency of multihost writes which are performed when the
+\fBmultihost\fR pool property is on. This is one factor used to determine
+the length of the activity check during import.
+.sp
+The multihost write period is \fBzfs_multihost_interval / leaf-vdevs\fR milliseconds.
+This means that on average a multihost write will be issued for each leaf vdev every
+\fBzfs_multihost_interval\fR milliseconds. In practice, the observed period can
+vary with the I/O load and this observed value is the delay which is stored in
+the uberblock.
+.sp
+On import the activity check waits a minimum amount of time determined by
+\fBzfs_multihost_interval * zfs_multihost_import_intervals\fR. The activity
+check time may be further extended if the value of mmp delay found in the best
+uberblock indicates actual multihost updates happened at longer intervals than
+\fBzfs_multihost_interval\fR. A minimum value of \fB100ms\fR is enforced.
+.sp
+Default value: \fB1000\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_multihost_import_intervals\fR (uint)
+.ad
+.RS 12n
+Used to control the duration of the activity test on import. Smaller values of
+\fBzfs_multihost_import_intervals\fR will reduce the import time but increase
+the risk of failing to detect an active pool. The total activity check time is
+never allowed to drop below one second. A value of 0 is ignored and treated as
+if it was set to 1
+.sp
+Default value: \fB10\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_multihost_fail_intervals\fR (uint)
+.ad
+.RS 12n
+Controls the behavior of the pool when multihost write failures are detected.
+.sp
+When \fBzfs_multihost_fail_intervals = 0\fR then multihost write failures are ignored.
+The failures will still be reported to the ZED which depending on its
+configuration may take action such as suspending the pool or offlining a device.
+.sp
+When \fBzfs_multihost_fail_intervals > 0\fR then sequential multihost write failures
+will cause the pool to be suspended. This occurs when
+\fBzfs_multihost_fail_intervals * zfs_multihost_interval\fR milliseconds have
+passed since the last successful multihost write. This guarantees the activity test
+will see multihost writes if the pool is imported.
+.sp
+Default value: \fB5\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_no_scrub_io\fR (int)
+.ad
+.RS 12n
+Set for no scrub I/O. This results in scrubs not actually scrubbing data and
+simply doing a metadata crawl of the pool instead.
+.sp
+Use \fB1\fR for yes and \fB0\fR for no (default).
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_no_scrub_prefetch\fR (int)
+.ad
+.RS 12n
+Set to disable block prefetching for scrubs.
+.sp
+Use \fB1\fR for yes and \fB0\fR for no (default).
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_nocacheflush\fR (int)
+.ad
+.RS 12n
+Disable cache flush operations on disks when writing. Beware, this may cause
+corruption if disks re-order writes.
+.sp
+Use \fB1\fR for yes and \fB0\fR for no (default).
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_nopwrite_enabled\fR (int)
+.ad
+.RS 12n
+Enable NOP writes
+.sp
+Use \fB1\fR for yes (default) and \fB0\fR to disable.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_dmu_offset_next_sync\fR (int)
+.ad
+.RS 12n
+Enable forcing txg sync to find holes. When enabled forces ZFS to act
+like prior versions when SEEK_HOLE or SEEK_DATA flags are used, which
+when a dnode is dirty causes txg's to be synced so that this data can be
+found.
+.sp
+Use \fB1\fR for yes and \fB0\fR to disable (default).
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_pd_bytes_max\fR (int)
+.ad
+.RS 12n
+The number of bytes which should be prefetched during a pool traversal
+(eg: \fBzfs send\fR or other data crawling operations)
+.sp
+Default value: \fB52,428,800\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_per_txg_dirty_frees_percent \fR (ulong)
+.ad
+.RS 12n
+Tunable to control percentage of dirtied blocks from frees in one TXG.
+After this threshold is crossed, additional dirty blocks from frees
+wait until the next TXG.
+A value of zero will disable this throttle.
+.sp
+Default value: \fB30\fR and \fB0\fR to disable.
+.RE
+
+
+
+.sp
+.ne 2
+.na
+\fBzfs_prefetch_disable\fR (int)
+.ad
+.RS 12n
+This tunable disables predictive prefetch. Note that it leaves "prescient"
+prefetch (e.g. prefetch for zfs send) intact. Unlike predictive prefetch,
+prescient prefetch never issues i/os that end up not being needed, so it
+can't hurt performance.
+.sp
+Use \fB1\fR for yes and \fB0\fR for no (default).
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_read_chunk_size\fR (long)
+.ad
+.RS 12n
+Bytes to read per chunk
+.sp
+Default value: \fB1,048,576\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_read_history\fR (int)
+.ad
+.RS 12n
+Historical statistics for the last N reads will be available in
+\fB/proc/spl/kstat/zfs/<pool>/reads\fR
+.sp
+Default value: \fB0\fR (no data is kept).
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_read_history_hits\fR (int)
+.ad
+.RS 12n
+Include cache hits in read history
+.sp
+Use \fB1\fR for yes and \fB0\fR for no (default).
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_reconstruct_indirect_combinations_max\fR (int)
+.ad
+.RS 12na
+If an indirect split block contains more than this many possible unique
+combinations when being reconstructed, consider it too computationally
+expensive to check them all. Instead, try at most
+\fBzfs_reconstruct_indirect_combinations_max\fR randomly-selected
+combinations each time the block is accessed. This allows all segment
+copies to participate fairly in the reconstruction when all combinations
+cannot be checked and prevents repeated use of one bad copy.
+.sp
+Default value: \fB100\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_recover\fR (int)
+.ad
+.RS 12n
+Set to attempt to recover from fatal errors. This should only be used as a
+last resort, as it typically results in leaked space, or worse.
+.sp
+Use \fB1\fR for yes and \fB0\fR for no (default).
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_resilver_min_time_ms\fR (int)
+.ad
+.RS 12n
+Resilvers are processed by the sync thread. While resilvering it will spend
+at least this much time working on a resilver between txg flushes.
+.sp
+Default value: \fB3,000\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_scan_ignore_errors\fR (int)
+.ad
+.RS 12n
+If set to a nonzero value, remove the DTL (dirty time list) upon
+completion of a pool scan (scrub) even if there were unrepairable
+errors. It is intended to be used during pool repair or recovery to
+stop resilvering when the pool is next imported.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_scrub_min_time_ms\fR (int)
+.ad
+.RS 12n
+Scrubs are processed by the sync thread. While scrubbing it will spend
+at least this much time working on a scrub between txg flushes.
+.sp
+Default value: \fB1,000\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_scan_checkpoint_intval\fR (int)
+.ad
+.RS 12n
+To preserve progress across reboots the sequential scan algorithm periodically
+needs to stop metadata scanning and issue all the verifications I/Os to disk.
+The frequency of this flushing is determined by the
+\fBfBzfs_scan_checkpoint_intval\fR tunable.
+.sp
+Default value: \fB7200\fR seconds (every 2 hours).
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_scan_fill_weight\fR (int)
+.ad
+.RS 12n
+This tunable affects how scrub and resilver I/O segments are ordered. A higher
+number indicates that we care more about how filled in a segment is, while a
+lower number indicates we care more about the size of the extent without
+considering the gaps within a segment. This value is only tunable upon module
+insertion. Changing the value afterwards will have no affect on scrub or
+resilver performance.
+.sp
+Default value: \fB3\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_scan_issue_strategy\fR (int)
+.ad
+.RS 12n
+Determines the order that data will be verified while scrubbing or resilvering.
+If set to \fB1\fR, data will be verified as sequentially as possible, given the
+amount of memory reserved for scrubbing (see \fBzfs_scan_mem_lim_fact\fR). This
+may improve scrub performance if the pool's data is very fragmented. If set to
+\fB2\fR, the largest mostly-contiguous chunk of found data will be verified
+first. By deferring scrubbing of small segments, we may later find adjacent data
+to coalesce and increase the segment size. If set to \fB0\fR, zfs will use
+strategy \fB1\fR during normal verification and strategy \fB2\fR while taking a
+checkpoint.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_scan_legacy\fR (int)
+.ad
+.RS 12n
+A value of 0 indicates that scrubs and resilvers will gather metadata in
+memory before issuing sequential I/O. A value of 1 indicates that the legacy
+algorithm will be used where I/O is initiated as soon as it is discovered.
+Changing this value to 0 will not affect scrubs or resilvers that are already
+in progress.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_scan_max_ext_gap\fR (int)
+.ad
+.RS 12n
+Indicates the largest gap in bytes between scrub / resilver I/Os that will still
+be considered sequential for sorting purposes. Changing this value will not
+affect scrubs or resilvers that are already in progress.
+.sp
+Default value: \fB2097152 (2 MB)\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_scan_mem_lim_fact\fR (int)
+.ad
+.RS 12n
+Maximum fraction of RAM used for I/O sorting by sequential scan algorithm.
+This tunable determines the hard limit for I/O sorting memory usage.
+When the hard limit is reached we stop scanning metadata and start issuing
+data verification I/O. This is done until we get below the soft limit.
+.sp
+Default value: \fB20\fR which is 5% of RAM (1/20).
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_scan_mem_lim_soft_fact\fR (int)
+.ad
+.RS 12n
+The fraction of the hard limit used to determined the soft limit for I/O sorting
+by the sequential scan algorithm. When we cross this limit from bellow no action
+is taken. When we cross this limit from above it is because we are issuing
+verification I/O. In this case (unless the metadata scan is done) we stop
+issuing verification I/O and start scanning metadata again until we get to the
+hard limit.
+.sp
+Default value: \fB20\fR which is 5% of the hard limit (1/20).
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_scan_vdev_limit\fR (int)
+.ad
+.RS 12n
+Maximum amount of data that can be concurrently issued at once for scrubs and
+resilvers per leaf device, given in bytes.
+.sp
+Default value: \fB41943040\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_send_corrupt_data\fR (int)
+.ad
+.RS 12n
+Allow sending of corrupt data (ignore read/checksum errors when sending data)
+.sp
+Use \fB1\fR for yes and \fB0\fR for no (default).
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_send_queue_length\fR (int)
+.ad
+.RS 12n
+The maximum number of bytes allowed in the \fBzfs send\fR queue. This value
+must be at least twice the maximum block size in use.
+.sp
+Default value: \fB16,777,216\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_recv_queue_length\fR (int)
+.ad
+.RS 12n
+.sp
+The maximum number of bytes allowed in the \fBzfs receive\fR queue. This value
+must be at least twice the maximum block size in use.
+.sp
+Default value: \fB16,777,216\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_sync_pass_deferred_free\fR (int)
+.ad
+.RS 12n
+Flushing of data to disk is done in passes. Defer frees starting in this pass
+.sp
+Default value: \fB2\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_sync_pass_dont_compress\fR (int)
+.ad
+.RS 12n
+Don't compress starting in this pass
+.sp
+Default value: \fB5\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_sync_pass_rewrite\fR (int)
+.ad
+.RS 12n
+Rewrite new block pointers starting in this pass
+.sp
+Default value: \fB2\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_sync_taskq_batch_pct\fR (int)
+.ad
+.RS 12n
+This controls the number of threads used by the dp_sync_taskq. The default
+value of 75% will create a maximum of one thread per cpu.
+.sp
+Default value: \fB75\fR%.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_txg_history\fR (int)
+.ad
+.RS 12n
+Historical statistics for the last N txgs will be available in
+\fB/proc/spl/kstat/zfs/<pool>/txgs\fR
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_txg_timeout\fR (int)
+.ad
+.RS 12n
+Flush dirty data to disk at least every N seconds (maximum txg duration)
+.sp
+Default value: \fB5\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_aggregation_limit\fR (int)
+.ad
+.RS 12n
+Max vdev I/O aggregation size
+.sp
+Default value: \fB131,072\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_cache_bshift\fR (int)
+.ad
+.RS 12n
+Shift size to inflate reads too
+.sp
+Default value: \fB16\fR (effectively 65536).
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_cache_max\fR (int)
+.ad
+.RS 12n
+Inflate reads smaller than this value to meet the \fBzfs_vdev_cache_bshift\fR
+size (default 64k).
+.sp
+Default value: \fB16384\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_cache_size\fR (int)
+.ad
+.RS 12n
+Total size of the per-disk cache in bytes.
+.sp
+Currently this feature is disabled as it has been found to not be helpful
+for performance and in some cases harmful.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_mirror_rotating_inc\fR (int)
+.ad
+.RS 12n
+A number by which the balancing algorithm increments the load calculation for
+the purpose of selecting the least busy mirror member when an I/O immediately
+follows its predecessor on rotational vdevs for the purpose of making decisions
+based on load.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_mirror_rotating_seek_inc\fR (int)
+.ad
+.RS 12n
+A number by which the balancing algorithm increments the load calculation for
+the purpose of selecting the least busy mirror member when an I/O lacks
+locality as defined by the zfs_vdev_mirror_rotating_seek_offset. I/Os within
+this that are not immediately following the previous I/O are incremented by
+half.
+.sp
+Default value: \fB5\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_mirror_rotating_seek_offset\fR (int)
+.ad
+.RS 12n
+The maximum distance for the last queued I/O in which the balancing algorithm
+considers an I/O to have locality.
+See the section "ZFS I/O SCHEDULER".
+.sp
+Default value: \fB1048576\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_mirror_non_rotating_inc\fR (int)
+.ad
+.RS 12n
+A number by which the balancing algorithm increments the load calculation for
+the purpose of selecting the least busy mirror member on non-rotational vdevs
+when I/Os do not immediately follow one another.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_mirror_non_rotating_seek_inc\fR (int)
+.ad
+.RS 12n
+A number by which the balancing algorithm increments the load calculation for
+the purpose of selecting the least busy mirror member when an I/O lacks
+locality as defined by the zfs_vdev_mirror_rotating_seek_offset. I/Os within
+this that are not immediately following the previous I/O are incremented by
+half.
+.sp
+Default value: \fB1\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_read_gap_limit\fR (int)
+.ad
+.RS 12n
+Aggregate read I/O operations if the gap on-disk between them is within this
+threshold.
+.sp
+Default value: \fB32,768\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_scheduler\fR (charp)
+.ad
+.RS 12n
+Set the Linux I/O scheduler on whole disk vdevs to this scheduler. Valid options
+are noop, cfq, bfq & deadline
+.sp
+Default value: \fBnoop\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_write_gap_limit\fR (int)
+.ad
+.RS 12n
+Aggregate write I/O over gap
+.sp
+Default value: \fB4,096\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_vdev_raidz_impl\fR (string)
+.ad
+.RS 12n
+Parameter for selecting raidz parity implementation to use.
+
+Options marked (always) below may be selected on module load as they are
+supported on all systems.
+The remaining options may only be set after the module is loaded, as they
+are available only if the implementations are compiled in and supported
+on the running system.
+
+Once the module is loaded, the content of
+/sys/module/zfs/parameters/zfs_vdev_raidz_impl will show available options
+with the currently selected one enclosed in [].
+Possible options are:
+ fastest - (always) implementation selected using built-in benchmark
+ original - (always) original raidz implementation
+ scalar - (always) scalar raidz implementation
+ sse2 - implementation using SSE2 instruction set (64bit x86 only)
+ ssse3 - implementation using SSSE3 instruction set (64bit x86 only)
+ avx2 - implementation using AVX2 instruction set (64bit x86 only)
+ avx512f - implementation using AVX512F instruction set (64bit x86 only)
+ avx512bw - implementation using AVX512F & AVX512BW instruction sets (64bit x86 only)
+ aarch64_neon - implementation using NEON (Aarch64/64 bit ARMv8 only)
+ aarch64_neonx2 - implementation using NEON with more unrolling (Aarch64/64 bit ARMv8 only)
+.sp
+Default value: \fBfastest\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_zevent_cols\fR (int)
+.ad
+.RS 12n
+When zevents are logged to the console use this as the word wrap width.
+.sp
+Default value: \fB80\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_zevent_console\fR (int)
+.ad
+.RS 12n
+Log events to the console
+.sp
+Use \fB1\fR for yes and \fB0\fR for no (default).
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_zevent_len_max\fR (int)
+.ad
+.RS 12n
+Max event queue length. A value of 0 will result in a calculated value which
+increases with the number of CPUs in the system (minimum 64 events). Events
+in the queue can be viewed with the \fBzpool events\fR command.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_zil_clean_taskq_maxalloc\fR (int)
+.ad
+.RS 12n
+The maximum number of taskq entries that are allowed to be cached. When this
+limit is exceeded transaction records (itxs) will be cleaned synchronously.
+.sp
+Default value: \fB1048576\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_zil_clean_taskq_minalloc\fR (int)
+.ad
+.RS 12n
+The number of taskq entries that are pre-populated when the taskq is first
+created and are immediately available for use.
+.sp
+Default value: \fB1024\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_zil_clean_taskq_nthr_pct\fR (int)
+.ad
+.RS 12n
+This controls the number of threads used by the dp_zil_clean_taskq. The default
+value of 100% will create a maximum of one thread per cpu.
+.sp
+Default value: \fB100\fR%.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzil_replay_disable\fR (int)
+.ad
+.RS 12n
+Disable intent logging replay. Can be disabled for recovery from corrupted
+ZIL
+.sp
+Use \fB1\fR for yes and \fB0\fR for no (default).
+.RE
+
+.sp
+.ne 2
+.na
+\fBzil_slog_bulk\fR (ulong)
+.ad
+.RS 12n
+Limit SLOG write size per commit executed with synchronous priority.
+Any writes above that will be executed with lower (asynchronous) priority
+to limit potential SLOG device abuse by single active ZIL writer.
+.sp
+Default value: \fB786,432\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzio_delay_max\fR (int)
+.ad
+.RS 12n
+A zevent will be logged if a ZIO operation takes more than N milliseconds to
+complete. Note that this is only a logging facility, not a timeout on
+operations.
+.sp
+Default value: \fB30,000\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzio_dva_throttle_enabled\fR (int)
+.ad
+.RS 12n
+Throttle block allocations in the ZIO pipeline. This allows for
+dynamic allocation distribution when devices are imbalanced.
+When enabled, the maximum number of pending allocations per top-level vdev
+is limited by \fBzfs_vdev_queue_depth_pct\fR.
+.sp
+Default value: \fB1\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzio_requeue_io_start_cut_in_line\fR (int)
+.ad
+.RS 12n
+Prioritize requeued I/O
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzio_taskq_batch_pct\fR (uint)
+.ad
+.RS 12n
+Percentage of online CPUs (or CPU cores, etc) which will run a worker thread
+for IO. These workers are responsible for IO work such as compression and
+checksum calculations. Fractional number of CPUs will be rounded down.
+.sp
+The default value of 75 was chosen to avoid using all CPUs which can result in
+latency issues and inconsistent application performance, especially when high
+compression is enabled.
+.sp
+Default value: \fB75\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzvol_inhibit_dev\fR (uint)
+.ad
+.RS 12n
+Do not create zvol device nodes. This may slightly improve startup time on
+systems with a very large number of zvols.
+.sp
+Use \fB1\fR for yes and \fB0\fR for no (default).
+.RE
+
+.sp
+.ne 2
+.na
+\fBzvol_major\fR (uint)
+.ad
+.RS 12n
+Major number for zvol block devices
+.sp
+Default value: \fB230\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzvol_max_discard_blocks\fR (ulong)
+.ad
+.RS 12n
+Discard (aka TRIM) operations done on zvols will be done in batches of this
+many blocks, where block size is determined by the \fBvolblocksize\fR property
+of a zvol.
+.sp
+Default value: \fB16,384\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzvol_prefetch_bytes\fR (uint)
+.ad
+.RS 12n
+When adding a zvol to the system prefetch \fBzvol_prefetch_bytes\fR
+from the start and end of the volume. Prefetching these regions
+of the volume is desirable because they are likely to be accessed
+immediately by \fBblkid(8)\fR or by the kernel scanning for a partition
+table.
+.sp
+Default value: \fB131,072\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzvol_request_sync\fR (uint)
+.ad
+.RS 12n
+When processing I/O requests for a zvol submit them synchronously. This
+effectively limits the queue depth to 1 for each I/O submitter. When set
+to 0 requests are handled asynchronously by a thread pool. The number of
+requests which can be handled concurrently is controller by \fBzvol_threads\fR.
+.sp
+Default value: \fB0\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzvol_threads\fR (uint)
+.ad
+.RS 12n
+Max number of threads which can handle zvol I/O requests concurrently.
+.sp
+Default value: \fB32\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzvol_volmode\fR (uint)
+.ad
+.RS 12n
+Defines zvol block devices behaviour when \fBvolmode\fR is set to \fBdefault\fR.
+Valid values are \fB1\fR (full), \fB2\fR (dev) and \fB3\fR (none).
+.sp
+Default value: \fB1\fR.
+.RE
+
+.sp
+.ne 2
+.na
+\fBzfs_qat_disable\fR (int)
+.ad
+.RS 12n
+This tunable disables qat hardware acceleration for gzip compression and.
+AES-GCM encryption. It is available only if qat acceleration is compiled in
+and the qat driver is present.
+.sp
+Use \fB1\fR for yes and \fB0\fR for no (default).
+.RE
+
+.SH ZFS I/O SCHEDULER
+ZFS issues I/O operations to leaf vdevs to satisfy and complete I/Os.
+The I/O scheduler determines when and in what order those operations are
+issued. The I/O scheduler divides operations into five I/O classes
+prioritized in the following order: sync read, sync write, async read,
+async write, and scrub/resilver. Each queue defines the minimum and
+maximum number of concurrent operations that may be issued to the
+device. In addition, the device has an aggregate maximum,
+\fBzfs_vdev_max_active\fR. Note that the sum of the per-queue minimums
+must not exceed the aggregate maximum. If the sum of the per-queue
+maximums exceeds the aggregate maximum, then the number of active I/Os
+may reach \fBzfs_vdev_max_active\fR, in which case no further I/Os will
+be issued regardless of whether all per-queue minimums have been met.
+.sp
+For many physical devices, throughput increases with the number of
+concurrent operations, but latency typically suffers. Further, physical
+devices typically have a limit at which more concurrent operations have no
+effect on throughput or can actually cause it to decrease.
+.sp
+The scheduler selects the next operation to issue by first looking for an
+I/O class whose minimum has not been satisfied. Once all are satisfied and
+the aggregate maximum has not been hit, the scheduler looks for classes
+whose maximum has not been satisfied. Iteration through the I/O classes is
+done in the order specified above. No further operations are issued if the
+aggregate maximum number of concurrent operations has been hit or if there
+are no operations queued for an I/O class that has not hit its maximum.
+Every time an I/O is queued or an operation completes, the I/O scheduler
+looks for new operations to issue.
+.sp
+In general, smaller max_active's will lead to lower latency of synchronous
+operations. Larger max_active's may lead to higher overall throughput,
+depending on underlying storage.
+.sp
+The ratio of the queues' max_actives determines the balance of performance
+between reads, writes, and scrubs. E.g., increasing
+\fBzfs_vdev_scrub_max_active\fR will cause the scrub or resilver to complete
+more quickly, but reads and writes to have higher latency and lower throughput.
+.sp
+All I/O classes have a fixed maximum number of outstanding operations
+except for the async write class. Asynchronous writes represent the data
+that is committed to stable storage during the syncing stage for
+transaction groups. Transaction groups enter the syncing state
+periodically so the number of queued async writes will quickly burst up
+and then bleed down to zero. Rather than servicing them as quickly as
+possible, the I/O scheduler changes the maximum number of active async
+write I/Os according to the amount of dirty data in the pool. Since
+both throughput and latency typically increase with the number of
+concurrent operations issued to physical devices, reducing the
+burstiness in the number of concurrent operations also stabilizes the
+response time of operations from other -- and in particular synchronous
+-- queues. In broad strokes, the I/O scheduler will issue more
+concurrent operations from the async write queue as there's more dirty
+data in the pool.
+.sp
+Async Writes
+.sp
+The number of concurrent operations issued for the async write I/O class
+follows a piece-wise linear function defined by a few adjustable points.
+.nf
+
+ | o---------| <-- zfs_vdev_async_write_max_active
+ ^ | /^ |
+ | | / | |
+active | / | |
+ I/O | / | |
+count | / | |
+ | / | |
+ |-------o | | <-- zfs_vdev_async_write_min_active
+ 0|_______^______|_________|
+ 0% | | 100% of zfs_dirty_data_max
+ | |
+ | `-- zfs_vdev_async_write_active_max_dirty_percent
+ `--------- zfs_vdev_async_write_active_min_dirty_percent
+
+.fi
+Until the amount of dirty data exceeds a minimum percentage of the dirty
+data allowed in the pool, the I/O scheduler will limit the number of
+concurrent operations to the minimum. As that threshold is crossed, the
+number of concurrent operations issued increases linearly to the maximum at
+the specified maximum percentage of the dirty data allowed in the pool.
+.sp
+Ideally, the amount of dirty data on a busy pool will stay in the sloped
+part of the function between \fBzfs_vdev_async_write_active_min_dirty_percent\fR
+and \fBzfs_vdev_async_write_active_max_dirty_percent\fR. If it exceeds the
+maximum percentage, this indicates that the rate of incoming data is
+greater than the rate that the backend storage can handle. In this case, we
+must further throttle incoming writes, as described in the next section.
+
+.SH ZFS TRANSACTION DELAY
+We delay transactions when we've determined that the backend storage
+isn't able to accommodate the rate of incoming writes.
+.sp
+If there is already a transaction waiting, we delay relative to when
+that transaction will finish waiting. This way the calculated delay time
+is independent of the number of threads concurrently executing
+transactions.
+.sp
+If we are the only waiter, wait relative to when the transaction
+started, rather than the current time. This credits the transaction for
+"time already served", e.g. reading indirect blocks.
+.sp
+The minimum time for a transaction to take is calculated as:
+.nf
+ min_time = zfs_delay_scale * (dirty - min) / (max - dirty)
+ min_time is then capped at 100 milliseconds.
+.fi
+.sp
+The delay has two degrees of freedom that can be adjusted via tunables. The
+percentage of dirty data at which we start to delay is defined by
+\fBzfs_delay_min_dirty_percent\fR. This should typically be at or above
+\fBzfs_vdev_async_write_active_max_dirty_percent\fR so that we only start to
+delay after writing at full speed has failed to keep up with the incoming write
+rate. The scale of the curve is defined by \fBzfs_delay_scale\fR. Roughly speaking,
+this variable determines the amount of delay at the midpoint of the curve.
+.sp
+.nf
+delay
+ 10ms +-------------------------------------------------------------*+
+ | *|
+ 9ms + *+
+ | *|
+ 8ms + *+
+ | * |
+ 7ms + * +
+ | * |
+ 6ms + * +
+ | * |
+ 5ms + * +
+ | * |
+ 4ms + * +
+ | * |
+ 3ms + * +
+ | * |
+ 2ms + (midpoint) * +
+ | | ** |
+ 1ms + v *** +
+ | zfs_delay_scale ----------> ******** |
+ 0 +-------------------------------------*********----------------+
+ 0% <- zfs_dirty_data_max -> 100%
+.fi
+.sp
+Note that since the delay is added to the outstanding time remaining on the
+most recent transaction, the delay is effectively the inverse of IOPS.
+Here the midpoint of 500us translates to 2000 IOPS. The shape of the curve
+was chosen such that small changes in the amount of accumulated dirty data
+in the first 3/4 of the curve yield relatively small differences in the
+amount of delay.
+.sp
+The effects can be easier to understand when the amount of delay is
+represented on a log scale:
+.sp
+.nf
+delay
+100ms +-------------------------------------------------------------++
+ + +
+ | |
+ + *+
+ 10ms + *+
+ + ** +
+ | (midpoint) ** |
+ + | ** +
+ 1ms + v **** +
+ + zfs_delay_scale ----------> ***** +
+ | **** |
+ + **** +
+100us + ** +
+ + * +
+ | * |
+ + * +
+ 10us + * +
+ + +
+ | |
+ + +
+ +--------------------------------------------------------------+
+ 0% <- zfs_dirty_data_max -> 100%
+.fi
+.sp
+Note here that only as the amount of dirty data approaches its limit does
+the delay start to increase rapidly. The goal of a properly tuned system
+should be to keep the amount of dirty data out of that range by first
+ensuring that the appropriate limits are set for the I/O scheduler to reach
+optimal throughput on the backend storage, and then by changing the value
+of \fBzfs_delay_scale\fR to increase the steepness of the curve.
diff --git a/man/man5/zpool-features.5 b/man/man5/zpool-features.5
new file mode 100644
index 000000000..ce34a05a2
--- /dev/null
+++ b/man/man5/zpool-features.5
@@ -0,0 +1,720 @@
+'\" te
+.\" Copyright (c) 2013, 2016 by Delphix. All rights reserved.
+.\" Copyright (c) 2013 by Saso Kiselkov. All rights reserved.
+.\" Copyright (c) 2014, Joyent, Inc. All rights reserved.
+.\" The contents of this file are subject to the terms of the Common Development
+.\" and Distribution License (the "License"). You may not use this file except
+.\" in compliance with the License. You can obtain a copy of the license at
+.\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing.
+.\"
+.\" See the License for the specific language governing permissions and
+.\" limitations under the License. When distributing Covered Code, include this
+.\" CDDL HEADER in each file and include the License file at
+.\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this
+.\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your
+.\" own identifying information:
+.\" Portions Copyright [yyyy] [name of copyright owner]
+.TH ZPOOL-FEATURES 5 "Aug 27, 2013"
+.SH NAME
+zpool\-features \- ZFS pool feature descriptions
+.SH DESCRIPTION
+.sp
+.LP
+ZFS pool on\-disk format versions are specified via "features" which replace
+the old on\-disk format numbers (the last supported on\-disk format number is
+28). To enable a feature on a pool use the \fBupgrade\fR subcommand of the
+\fBzpool\fR(8) command, or set the \fBfeature@\fR\fIfeature_name\fR property
+to \fBenabled\fR.
+.sp
+.LP
+The pool format does not affect file system version compatibility or the ability
+to send file systems between pools.
+.sp
+.LP
+Since most features can be enabled independently of each other the on\-disk
+format of the pool is specified by the set of all features marked as
+\fBactive\fR on the pool. If the pool was created by another software version
+this set may include unsupported features.
+.SS "Identifying features"
+.sp
+.LP
+Every feature has a guid of the form \fIcom.example:feature_name\fR. The reverse
+DNS name ensures that the feature's guid is unique across all ZFS
+implementations. When unsupported features are encountered on a pool they will
+be identified by their guids. Refer to the documentation for the ZFS
+implementation that created the pool for information about those features.
+.sp
+.LP
+Each supported feature also has a short name. By convention a feature's short
+name is the portion of its guid which follows the ':' (e.g.
+\fIcom.example:feature_name\fR would have the short name \fIfeature_name\fR),
+however a feature's short name may differ across ZFS implementations if
+following the convention would result in name conflicts.
+.SS "Feature states"
+.sp
+.LP
+Features can be in one of three states:
+.sp
+.ne 2
+.na
+\fB\fBactive\fR\fR
+.ad
+.RS 12n
+This feature's on\-disk format changes are in effect on the pool. Support for
+this feature is required to import the pool in read\-write mode. If this
+feature is not read-only compatible, support is also required to import the pool
+in read\-only mode (see "Read\-only compatibility").
+.RE
+
+.sp
+.ne 2
+.na
+\fB\fBenabled\fR\fR
+.ad
+.RS 12n
+An administrator has marked this feature as enabled on the pool, but the
+feature's on\-disk format changes have not been made yet. The pool can still be
+imported by software that does not support this feature, but changes may be made
+to the on\-disk format at any time which will move the feature to the
+\fBactive\fR state. Some features may support returning to the \fBenabled\fR
+state after becoming \fBactive\fR. See feature\-specific documentation for
+details.
+.RE
+
+.sp
+.ne 2
+.na
+\fBdisabled\fR
+.ad
+.RS 12n
+This feature's on\-disk format changes have not been made and will not be made
+unless an administrator moves the feature to the \fBenabled\fR state. Features
+cannot be disabled once they have been enabled.
+.RE
+
+.sp
+.LP
+The state of supported features is exposed through pool properties of the form
+\fIfeature@short_name\fR.
+.SS "Read\-only compatibility"
+.sp
+.LP
+Some features may make on\-disk format changes that do not interfere with other
+software's ability to read from the pool. These features are referred to as
+"read\-only compatible". If all unsupported features on a pool are read\-only
+compatible, the pool can be imported in read\-only mode by setting the
+\fBreadonly\fR property during import (see \fBzpool\fR(8) for details on
+importing pools).
+.SS "Unsupported features"
+.sp
+.LP
+For each unsupported feature enabled on an imported pool a pool property
+named \fIunsupported@feature_guid\fR will indicate why the import was allowed
+despite the unsupported feature. Possible values for this property are:
+
+.sp
+.ne 2
+.na
+\fB\fBinactive\fR\fR
+.ad
+.RS 12n
+The feature is in the \fBenabled\fR state and therefore the pool's on\-disk
+format is still compatible with software that does not support this feature.
+.RE
+
+.sp
+.ne 2
+.na
+\fB\fBreadonly\fR\fR
+.ad
+.RS 12n
+The feature is read\-only compatible and the pool has been imported in
+read\-only mode.
+.RE
+
+.SS "Feature dependencies"
+.sp
+.LP
+Some features depend on other features being enabled in order to function
+properly. Enabling a feature will automatically enable any features it
+depends on.
+.SH FEATURES
+.sp
+.LP
+The following features are supported on this system:
+.sp
+.ne 2
+.na
+\fB\fBasync_destroy\fR\fR
+.ad
+.RS 4n
+.TS
+l l .
+GUID com.delphix:async_destroy
+READ\-ONLY COMPATIBLE yes
+DEPENDENCIES none
+.TE
+
+Destroying a file system requires traversing all of its data in order to
+return its used space to the pool. Without \fBasync_destroy\fR the file system
+is not fully removed until all space has been reclaimed. If the destroy
+operation is interrupted by a reboot or power outage the next attempt to open
+the pool will need to complete the destroy operation synchronously.
+
+When \fBasync_destroy\fR is enabled the file system's data will be reclaimed
+by a background process, allowing the destroy operation to complete without
+traversing the entire file system. The background process is able to resume
+interrupted destroys after the pool has been opened, eliminating the need
+to finish interrupted destroys as part of the open operation. The amount
+of space remaining to be reclaimed by the background process is available
+through the \fBfreeing\fR property.
+
+This feature is only \fBactive\fR while \fBfreeing\fR is non\-zero.
+.RE
+
+.sp
+.ne 2
+.na
+\fB\fBempty_bpobj\fR\fR
+.ad
+.RS 4n
+.TS
+l l .
+GUID com.delphix:empty_bpobj
+READ\-ONLY COMPATIBLE yes
+DEPENDENCIES none
+.TE
+
+This feature increases the performance of creating and using a large
+number of snapshots of a single filesystem or volume, and also reduces
+the disk space required.
+
+When there are many snapshots, each snapshot uses many Block Pointer
+Objects (bpobj's) to track blocks associated with that snapshot.
+However, in common use cases, most of these bpobj's are empty. This
+feature allows us to create each bpobj on-demand, thus eliminating the
+empty bpobjs.
+
+This feature is \fBactive\fR while there are any filesystems, volumes,
+or snapshots which were created after enabling this feature.
+.RE
+
+.sp
+.ne 2
+.na
+\fB\fBfilesystem_limits\fR\fR
+.ad
+.RS 4n
+.TS
+l l .
+GUID com.joyent:filesystem_limits
+READ\-ONLY COMPATIBLE yes
+DEPENDENCIES extensible_dataset
+.TE
+
+This feature enables filesystem and snapshot limits. These limits can be used
+to control how many filesystems and/or snapshots can be created at the point in
+the tree on which the limits are set.
+
+This feature is \fBactive\fR once either of the limit properties has been
+set on a dataset. Once activated the feature is never deactivated.
+.RE
+
+.sp
+.ne 2
+.na
+\fB\fBlz4_compress\fR\fR
+.ad
+.RS 4n
+.TS
+l l .
+GUID org.illumos:lz4_compress
+READ\-ONLY COMPATIBLE no
+DEPENDENCIES none
+.TE
+
+\fBlz4\fR is a high-performance real-time compression algorithm that
+features significantly faster compression and decompression as well as a
+higher compression ratio than the older \fBlzjb\fR compression.
+Typically, \fBlz4\fR compression is approximately 50% faster on
+compressible data and 200% faster on incompressible data than
+\fBlzjb\fR. It is also approximately 80% faster on decompression, while
+giving approximately 10% better compression ratio.
+
+When the \fBlz4_compress\fR feature is set to \fBenabled\fR, the
+administrator can turn on \fBlz4\fR compression on any dataset on the
+pool using the \fBzfs\fR(8) command. Please note that doing so will
+immediately activate the \fBlz4_compress\fR feature on the underlying
+pool using the \fBzfs\fR(1M) command. Also, all newly written metadata
+will be compressed with \fBlz4\fR algorithm. Since this feature is not
+read-only compatible, this operation will render the pool unimportable
+on systems without support for the \fBlz4_compress\fR feature. Booting
+off of \fBlz4\fR-compressed root pools is supported.
+
+This feature becomes \fBactive\fR as soon as it is enabled and will
+never return to being \fBenabled\fB.
+.RE
+
+.sp
+.ne 2
+.na
+\fB\fBspacemap_histogram\fR\fR
+.ad
+.RS 4n
+.TS
+l l .
+GUID com.delphix:spacemap_histogram
+READ\-ONLY COMPATIBLE yes
+DEPENDENCIES none
+.TE
+
+This features allows ZFS to maintain more information about how free space
+is organized within the pool. If this feature is \fBenabled\fR, ZFS will
+set this feature to \fBactive\fR when a new space map object is created or
+an existing space map is upgraded to the new format. Once the feature is
+\fBactive\fR, it will remain in that state until the pool is destroyed.
+
+.RE
+
+.sp
+.ne 2
+.na
+\fB\fBmulti_vdev_crash_dump\fR\fR
+.ad
+.RS 4n
+.TS
+l l .
+GUID com.joyent:multi_vdev_crash_dump
+READ\-ONLY COMPATIBLE no
+DEPENDENCIES none
+.TE
+
+This feature allows a dump device to be configured with a pool comprised
+of multiple vdevs. Those vdevs may be arranged in any mirrored or raidz
+configuration.
+
+When the \fBmulti_vdev_crash_dump\fR feature is set to \fBenabled\fR,
+the administrator can use the \fBdumpadm\fR(1M) command to configure a
+dump device on a pool comprised of multiple vdevs.
+
+Under Linux this feature is registered for compatibility but not used.
+New pools created under Linux will have the feature \fBenabled\fR but
+will never transition to \fB\fBactive\fR. This functionality is not
+required in order to support crash dumps under Linux. Existing pools
+where this feature is \fB\fBactive\fR can be imported.
+.RE
+
+.sp
+.ne 2
+.na
+\fB\fBextensible_dataset\fR\fR
+.ad
+.RS 4n
+.TS
+l l .
+GUID com.delphix:extensible_dataset
+READ\-ONLY COMPATIBLE no
+DEPENDENCIES none
+.TE
+
+This feature allows more flexible use of internal ZFS data structures,
+and exists for other features to depend on.
+
+This feature will be \fBactive\fR when the first dependent feature uses it,
+and will be returned to the \fBenabled\fR state when all datasets that use
+this feature are destroyed.
+
+.RE
+
+.sp
+.ne 2
+.na
+\fB\fBbookmarks\fR\fR
+.ad
+.RS 4n
+.TS
+l l .
+GUID com.delphix:bookmarks
+READ\-ONLY COMPATIBLE yes
+DEPENDENCIES extensible_dataset
+.TE
+
+This feature enables use of the \fBzfs bookmark\fR subcommand.
+
+This feature is \fBactive\fR while any bookmarks exist in the pool.
+All bookmarks in the pool can be listed by running
+\fBzfs list -t bookmark -r \fIpoolname\fR\fR.
+
+.RE
+
+.sp
+.ne 2
+.na
+\fB\fBenabled_txg\fR\fR
+.ad
+.RS 4n
+.TS
+l l .
+GUID com.delphix:enabled_txg
+READ\-ONLY COMPATIBLE yes
+DEPENDENCIES none
+.TE
+
+Once this feature is enabled ZFS records the transaction group number
+in which new features are enabled. This has no user-visible impact,
+but other features may depend on this feature.
+
+This feature becomes \fBactive\fR as soon as it is enabled and will
+never return to being \fBenabled\fB.
+
+.RE
+
+.sp
+.ne 2
+.na
+\fB\fBhole_birth\fR\fR
+.ad
+.RS 4n
+.TS
+l l .
+GUID com.delphix:hole_birth
+READ\-ONLY COMPATIBLE no
+DEPENDENCIES enabled_txg
+.TE
+
+This feature improves performance of incremental sends ("zfs send -i")
+and receives for objects with many holes. The most common case of
+hole-filled objects is zvols.
+
+An incremental send stream from snapshot \fBA\fR to snapshot \fBB\fR
+contains information about every block that changed between \fBA\fR and
+\fBB\fR. Blocks which did not change between those snapshots can be
+identified and omitted from the stream using a piece of metadata called
+the 'block birth time', but birth times are not recorded for holes (blocks
+filled only with zeroes). Since holes created after \fBA\fR cannot be
+distinguished from holes created before \fBA\fR, information about every
+hole in the entire filesystem or zvol is included in the send stream.
+
+For workloads where holes are rare this is not a problem. However, when
+incrementally replicating filesystems or zvols with many holes (for
+example a zvol formatted with another filesystem) a lot of time will
+be spent sending and receiving unnecessary information about holes that
+already exist on the receiving side.
+
+Once the \fBhole_birth\fR feature has been enabled the block birth times
+of all new holes will be recorded. Incremental sends between snapshots
+created after this feature is enabled will use this new metadata to avoid
+sending information about holes that already exist on the receiving side.
+
+This feature becomes \fBactive\fR as soon as it is enabled and will
+never return to being \fBenabled\fB.
+
+.RE
+
+.sp
+.ne 2
+.na
+\fB\fBembedded_data\fR\fR
+.ad
+.RS 4n
+.TS
+l l .
+GUID com.delphix:embedded_data
+READ\-ONLY COMPATIBLE no
+DEPENDENCIES none
+.TE
+
+This feature improves the performance and compression ratio of
+highly-compressible blocks. Blocks whose contents can compress to 112 bytes
+or smaller can take advantage of this feature.
+
+When this feature is enabled, the contents of highly-compressible blocks are
+stored in the block "pointer" itself (a misnomer in this case, as it contains
+the compressed data, rather than a pointer to its location on disk). Thus
+the space of the block (one sector, typically 512 bytes or 4KB) is saved,
+and no additional i/o is needed to read and write the data block.
+
+This feature becomes \fBactive\fR as soon as it is enabled and will
+never return to being \fBenabled\fR.
+
+.RE
+.sp
+.ne 2
+.na
+\fB\fBdevice_removal\fR\fR
+.ad
+.RS 4n
+.TS
+l l .
+GUID com.delphix:device_removal
+READ\-ONLY COMPATIBLE no
+DEPENDENCIES none
+.TE
+
+This feature enables the "zpool remove" subcommand to remove top-level
+vdevs, evacuating them to reduce the total size of the pool.
+
+This feature becomes \fBactive\fR when the "zpool remove" command is used
+on a top-level vdev, and will never return to being \fBenabled\fR.
+
+.RE
+.sp
+.ne 2
+.na
+\fB\fBobsolete_counts\fR\fR
+.ad
+.RS 4n
+.TS
+l l .
+GUID com.delphix:obsolete_counts
+READ\-ONLY COMPATIBLE yes
+DEPENDENCIES device_removal
+.TE
+
+This feature is an enhancement of device_removal, which will over time
+reduce the memory used to track removed devices. When indirect blocks
+are freed or remapped, we note that their part of the indirect mapping
+is "obsolete", i.e. no longer needed. See also the \fBzfs remap\fR
+subcommand in \fBzfs\fR(1M).
+
+This feature becomes \fBactive\fR when the "zpool remove" command is
+used on a top-level vdev, and will never return to being \fBenabled\fR.
+
+.RE
+.sp
+.ne 2
+.na
+\fB\fBlarge_blocks\fR\fR
+.ad
+.RS 4n
+.TS
+l l .
+GUID org.open-zfs:large_block
+READ\-ONLY COMPATIBLE no
+DEPENDENCIES extensible_dataset
+.TE
+
+The \fBlarge_block\fR feature allows the record size on a dataset to be
+set larger than 128KB.
+
+This feature becomes \fBactive\fR once a dataset contains a file with
+a block size larger than 128KB, and will return to being \fBenabled\fR once all
+filesystems that have ever had their recordsize larger than 128KB are destroyed.
+.RE
+
+.sp
+.ne 2
+.na
+\fB\fBlarge_dnode\fR\fR
+.ad
+.RS 4n
+.TS
+l l .
+GUID org.zfsonlinux:large_dnode
+READ\-ONLY COMPATIBLE no
+DEPENDENCIES extensible_dataset
+.TE
+
+The \fBlarge_dnode\fR feature allows the size of dnodes in a dataset to be
+set larger than 512B.
+
+This feature becomes \fBactive\fR once a dataset contains an object with
+a dnode larger than 512B, which occurs as a result of setting the
+\fBdnodesize\fR dataset property to a value other than \fBlegacy\fR. The
+feature will return to being \fBenabled\fR once all filesystems that
+have ever contained a dnode larger than 512B are destroyed. Large dnodes
+allow more data to be stored in the bonus buffer, thus potentially
+improving performance by avoiding the use of spill blocks.
+.RE
+
+\fB\fBsha512\fR\fR
+.ad
+.RS 4n
+.TS
+l l .
+GUID org.illumos:sha512
+READ\-ONLY COMPATIBLE no
+DEPENDENCIES extensible_dataset
+.TE
+
+This feature enables the use of the SHA-512/256 truncated hash algorithm
+(FIPS 180-4) for checksum and dedup. The native 64-bit arithmetic of
+SHA-512 provides an approximate 50% performance boost over SHA-256 on
+64-bit hardware and is thus a good minimum-change replacement candidate
+for systems where hash performance is important, but these systems
+cannot for whatever reason utilize the faster \fBskein\fR and
+\fBedonr\fR algorithms.
+
+When the \fBsha512\fR feature is set to \fBenabled\fR, the administrator
+can turn on the \fBsha512\fR checksum on any dataset using the
+\fBzfs set checksum=sha512\fR(1M) command. This feature becomes
+\fBactive\fR once a \fBchecksum\fR property has been set to \fBsha512\fR,
+and will return to being \fBenabled\fR once all filesystems that have
+ever had their checksum set to \fBsha512\fR are destroyed.
+
+Booting off of pools utilizing SHA-512/256 is supported (provided that
+the updated GRUB stage2 module is installed).
+
+.RE
+
+.sp
+.ne 2
+.na
+\fB\fBskein\fR\fR
+.ad
+.RS 4n
+.TS
+l l .
+GUID org.illumos:skein
+READ\-ONLY COMPATIBLE no
+DEPENDENCIES extensible_dataset
+.TE
+
+This feature enables the use of the Skein hash algorithm for checksum
+and dedup. Skein is a high-performance secure hash algorithm that was a
+finalist in the NIST SHA-3 competition. It provides a very high security
+margin and high performance on 64-bit hardware (80% faster than
+SHA-256). This implementation also utilizes the new salted checksumming
+functionality in ZFS, which means that the checksum is pre-seeded with a
+secret 256-bit random key (stored on the pool) before being fed the data
+block to be checksummed. Thus the produced checksums are unique to a
+given pool, preventing hash collision attacks on systems with dedup.
+
+When the \fBskein\fR feature is set to \fBenabled\fR, the administrator
+can turn on the \fBskein\fR checksum on any dataset using the
+\fBzfs set checksum=skein\fR(1M) command. This feature becomes
+\fBactive\fR once a \fBchecksum\fR property has been set to \fBskein\fR,
+and will return to being \fBenabled\fR once all filesystems that have
+ever had their checksum set to \fBskein\fR are destroyed.
+
+Booting off of pools using \fBskein\fR is \fBNOT\fR supported
+-- any attempt to enable \fBskein\fR on a root pool will fail with an
+error.
+
+.RE
+
+.sp
+.ne 2
+.na
+\fB\fBedonr\fR\fR
+.ad
+.RS 4n
+.TS
+l l .
+GUID org.illumos:edonr
+READ\-ONLY COMPATIBLE no
+DEPENDENCIES extensible_dataset
+.TE
+
+This feature enables the use of the Edon-R hash algorithm for checksum,
+including for nopwrite (if compression is also enabled, an overwrite of
+a block whose checksum matches the data being written will be ignored).
+In an abundance of caution, Edon-R can not be used with dedup
+(without verification).
+
+Edon-R is a very high-performance hash algorithm that was part
+of the NIST SHA-3 competition. It provides extremely high hash
+performance (over 350% faster than SHA-256), but was not selected
+because of its unsuitability as a general purpose secure hash algorithm.
+This implementation utilizes the new salted checksumming functionality
+in ZFS, which means that the checksum is pre-seeded with a secret
+256-bit random key (stored on the pool) before being fed the data block
+to be checksummed. Thus the produced checksums are unique to a given
+pool.
+
+When the \fBedonr\fR feature is set to \fBenabled\fR, the administrator
+can turn on the \fBedonr\fR checksum on any dataset using the
+\fBzfs set checksum=edonr\fR(1M) command. This feature becomes
+\fBactive\fR once a \fBchecksum\fR property has been set to \fBedonr\fR,
+and will return to being \fBenabled\fR once all filesystems that have
+ever had their checksum set to \fBedonr\fR are destroyed.
+
+Booting off of pools using \fBedonr\fR is \fBNOT\fR supported
+-- any attempt to enable \fBedonr\fR on a root pool will fail with an
+error.
+
+.RE
+
+.sp
+.ne 2
+.na
+\fB\fBuserobj_accounting\fR\fR
+.ad
+.RS 4n
+.TS
+l l .
+GUID org.zfsonlinux:userobj_accounting
+READ\-ONLY COMPATIBLE yes
+DEPENDENCIES extensible_dataset
+.TE
+
+This feature allows administrators to account the object usage information
+by user and group.
+
+This feature becomes \fBactive\fR as soon as it is enabled and will never
+return to being \fBenabled\fR. Each filesystem will be upgraded automatically
+when remounted, or when new files are created under that filesystem.
+The upgrade can also be started manually on filesystems by running
+`zfs set version=current <pool/fs>`. The upgrade process runs in the background
+and may take a while to complete for filesystems containing a large number of
+files.
+
+.RE
+
+.sp
+.ne 2
+.na
+\fB\fBencryption\fR\fR
+.ad
+.RS 4n
+.TS
+l l .
+GUID com.datto:encryption
+READ\-ONLY COMPATIBLE no
+DEPENDENCIES extensible_dataset
+.TE
+
+This feature enables the creation and management of natively encrypted datasets.
+
+This feature becomes \fBactive\fR when an encrypted dataset is created and will
+be returned to the \fBenabled\fR state when all datasets that use this feature
+are destroyed.
+
+.RE
+
+.sp
+.ne 2
+.na
+\fB\fBproject_quota\fR\fR
+.ad
+.RS 4n
+.TS
+l l .
+GUID org.zfsonlinux:project_quota
+READ\-ONLY COMPATIBLE yes
+DEPENDENCIES extensible_dataset
+.TE
+
+This feature allows administrators to account the spaces and objects usage
+information against the project identifier (ID).
+
+The project ID is new object-based attribute. When upgrading an existing
+filesystem, object without project ID attribute will be assigned a zero
+project ID. After this feature is enabled, newly created object will inherit
+its parent directory's project ID if the parent inherit flag is set (via
+\fBchattr +/-P\fR or \fBzfs project [-s|-C]\fR). Otherwise, the new object's
+project ID will be set as zero. An object's project ID can be changed at
+anytime by the owner (or privileged user) via \fBchattr -p $prjid\fR or
+\fBzfs project -p $prjid\fR.
+
+This feature will become \fBactive\fR as soon as it is enabled and will never
+return to being \fBdisabled\fR. Each filesystem will be upgraded automatically
+when remounted or when new file is created under that filesystem. The upgrade
+can also be triggered on filesystems via `zfs set version=current <pool/fs>`.
+The upgrade process runs in the background and may take a while to complete
+for the filesystems containing a large number of files.
+
+.RE
+
+.SH "SEE ALSO"
+\fBzpool\fR(8)