diff options
Diffstat (limited to 'man/man5')
-rw-r--r-- | man/man5/Makefile.am | 4 | ||||
-rw-r--r-- | man/man5/vdev_id.conf.5 | 188 | ||||
-rw-r--r-- | man/man5/zfs-events.5 | 929 | ||||
-rw-r--r-- | man/man5/zfs-module-parameters.5 | 2754 | ||||
-rw-r--r-- | man/man5/zpool-features.5 | 720 |
5 files changed, 4595 insertions, 0 deletions
diff --git a/man/man5/Makefile.am b/man/man5/Makefile.am new file mode 100644 index 000000000..4746914c5 --- /dev/null +++ b/man/man5/Makefile.am @@ -0,0 +1,4 @@ +dist_man_MANS = vdev_id.conf.5 zpool-features.5 zfs-module-parameters.5 zfs-events.5 + +install-data-local: + $(INSTALL) -d -m 0755 "$(DESTDIR)$(mandir)/man5" diff --git a/man/man5/vdev_id.conf.5 b/man/man5/vdev_id.conf.5 new file mode 100644 index 000000000..50caa92c0 --- /dev/null +++ b/man/man5/vdev_id.conf.5 @@ -0,0 +1,188 @@ +.TH vdev_id.conf 5 +.SH NAME +vdev_id.conf \- Configuration file for vdev_id +.SH DESCRIPTION +.I vdev_id.conf +is the configuration file for +.BR vdev_id (8). +It controls the default behavior of +.BR vdev_id (8) +while it is mapping a disk device name to an alias. +.PP +The +.I vdev_id.conf +file uses a simple format consisting of a keyword followed by one or +more values on a single line. Any line not beginning with a recognized +keyword is ignored. Comments may optionally begin with a hash +character. + +The following keywords and values are used. +.TP +\fIalias\fR <name> <devlink> +Maps a device link in the /dev directory hierarchy to a new device +name. The udev rule defining the device link must have run prior to +.BR vdev_id (8). +A defined alias takes precedence over a topology-derived name, but the +two naming methods can otherwise coexist. For example, one might name +drives in a JBOD with the sas_direct topology while naming an internal +L2ARC device with an alias. + +\fIname\fR - the name of the link to the device that will by created in +/dev/disk/by-vdev. + +\fIdevlink\fR - the name of the device link that has already been +defined by udev. This may be an absolute path or the base filename. + +.TP +\fIchannel\fR [pci_slot] <port> <name> +Maps a physical path to a channel name (typically representing a single +disk enclosure). + +\fIpci_slot\fR - specifies the PCI SLOT of the HBA +hosting the disk enclosure being mapped, as found in the output of +.BR lspci (8). +This argument is not used in sas_switch mode. + +\fIport\fR - specifies the numeric identifier of the HBA or SAS switch port +connected to the disk enclosure being mapped. + +\fIname\fR - specifies the name of the channel. + +.TP +\fIslot\fR <old> <new> [channel] +Maps a disk slot number as reported by the operating system to an +alternative slot number. If the \fIchannel\fR parameter is specified +then the mapping is only applied to slots in the named channel, +otherwise the mapping is applied to all channels. The first-specified +\fIslot\fR rule that can match a slot takes precedence. Therefore a +channel-specific mapping for a given slot should generally appear before +a generic mapping for the same slot. In this way a custom mapping may +be applied to a particular channel and a default mapping applied to the +others. + +.TP +\fImultipath\fR <yes|no> +Specifies whether +.BR vdev_id (8) +will handle only dm-multipath devices. If set to "yes" then +.BR vdev_id (8) +will examine the first running component disk of a dm-multipath +device as listed by the +.BR multipath (8) +command to determine the physical path. +.TP +\fItopology\fR <sas_direct|sas_switch> +Identifies a physical topology that governs how physical paths are +mapped to channels. + +\fIsas_direct\fR - in this mode a channel is uniquely identified by +a PCI slot and a HBA port number + +\fIsas_switch\fR - in this mode a channel is uniquely identified by +a SAS switch port number + +.TP +\fIphys_per_port\fR <num> +Specifies the number of PHY devices associated with a SAS HBA port or SAS +switch port. +.BR vdev_id (8) +internally uses this value to determine which HBA or switch port a +device is connected to. The default is 4. + +.TP +\fIslot\fR <bay|phy|port|id|lun|ses> +Specifies from which element of a SAS identifier the slot number is +taken. The default is bay. + +\fIbay\fR - read the slot number from the bay identifier. + +\fIphy\fR - read the slot number from the phy identifier. + +\fIport\fR - use the SAS port as the slot number. + +\fIid\fR - use the scsi id as the slot number. + +\fIlun\fR - use the scsi lun as the slot number. + +\fIses\fR - use the SCSI Enclosure Services (SES) enclosure device slot number, +as reported by +.BR sg_ses (8). +This is intended for use only on systems where \fIbay\fR is unsupported, +noting that \fIport\fR and \fIid\fR may be unstable across disk replacement. +.SH EXAMPLES +A non-multipath configuration with direct-attached SAS enclosures and an +arbitrary slot re-mapping. +.P +.nf + multipath no + topology sas_direct + phys_per_port 4 + slot bay + + # PCI_SLOT HBA PORT CHANNEL NAME + channel 85:00.0 1 A + channel 85:00.0 0 B + channel 86:00.0 1 C + channel 86:00.0 0 D + + # Custom mapping for Channel A + + # Linux Mapped + # Slot Slot Channel + slot 1 7 A + slot 2 10 A + slot 3 3 A + slot 4 6 A + + # Default mapping for B, C, and D + + slot 1 4 + slot 2 2 + slot 3 1 + slot 4 3 +.fi +.P +A SAS-switch topology. Note that the +.I channel +keyword takes only two arguments in this example. +.P +.nf + topology sas_switch + + # SWITCH PORT CHANNEL NAME + channel 1 A + channel 2 B + channel 3 C + channel 4 D +.fi +.P +A multipath configuration. Note that channel names have multiple +definitions - one per physical path. +.P +.nf + multipath yes + + # PCI_SLOT HBA PORT CHANNEL NAME + channel 85:00.0 1 A + channel 85:00.0 0 B + channel 86:00.0 1 A + channel 86:00.0 0 B +.fi +.P +A configuration using device link aliases. +.P +.nf + # by-vdev + # name fully qualified or base name of device link + alias d1 /dev/disk/by-id/wwn-0x5000c5002de3b9ca + alias d2 wwn-0x5000c5002def789e +.fi +.P + +.SH FILES +.TP +.I /etc/zfs/vdev_id.conf +The configuration file for +.BR vdev_id (8). +.SH SEE ALSO +.BR vdev_id (8) diff --git a/man/man5/zfs-events.5 b/man/man5/zfs-events.5 new file mode 100644 index 000000000..4c60eecc5 --- /dev/null +++ b/man/man5/zfs-events.5 @@ -0,0 +1,929 @@ +'\" te +.\" Copyright (c) 2013 by Turbo Fredriksson <[email protected]>. All rights reserved. +.\" The contents of this file are subject to the terms of the Common Development +.\" and Distribution License (the "License"). You may not use this file except +.\" in compliance with the License. You can obtain a copy of the license at +.\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing. +.\" +.\" See the License for the specific language governing permissions and +.\" limitations under the License. When distributing Covered Code, include this +.\" CDDL HEADER in each file and include the License file at +.\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this +.\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your +.\" own identifying information: +.\" Portions Copyright [yyyy] [name of copyright owner] +.TH ZFS-EVENTS 5 "Jun 6, 2015" +.SH NAME +zfs\-events \- Events created by the ZFS filesystem. +.SH DESCRIPTION +.sp +.LP +Description of the different events generated by the ZFS stack. +.sp +Most of these don't have any description. The events generated by ZFS +have never been publicly documented. What is here is intended as a +starting point to provide documentation for all possible events. +.sp +To view all events created since the loading of the ZFS infrastructure +(i.e, "the module"), run +.P +.nf +\fBzpool events\fR +.fi +.P +to get a short list, and +.P +.nf +\fBzpool events -v\fR +.fi +.P +to get a full detail of the events and what information +is available about it. +.sp +This man page lists the different subclasses that are issued +in the case of an event. The full event name would be +\fIereport.fs.zfs.SUBCLASS\fR, but we only list the last +part here. + +.SS "EVENTS (SUBCLASS)" +.sp +.LP + +.sp +.ne 2 +.na +\fBchecksum\fR +.ad +.RS 12n +Issued when a checksum error has been detected. +.RE + +.sp +.ne 2 +.na +\fBio\fR +.ad +.RS 12n +Issued when there is an I/O error in a vdev in the pool. +.RE + +.sp +.ne 2 +.na +\fBdata\fR +.ad +.RS 12n +Issued when there have been data errors in the pool. +.RE + +.sp +.ne 2 +.na +\fBdeadman\fR +.ad +.RS 12n +Issued when an I/O is determined to be "hung", this can be caused by lost +completion events due to flaky hardware or drivers. See the +\fBzfs_deadman_failmode\fR module option description for additional +information regarding "hung" I/O detection and configuration. +.RE + +.sp +.ne 2 +.na +\fBdelay\fR +.ad +.RS 12n +Issued when a completed I/O exceeds the maximum allowed time specified +by the \fBzio_delay_max\fR module option. This can be an indicator of +problems with the underlying storage device. +.RE + +.sp +.ne 2 +.na +\fBconfig.sync\fR +.ad +.RS 12n +Issued every time a vdev change have been done to the pool. +.RE + +.sp +.ne 2 +.na +\fBzpool\fR +.ad +.RS 12n +Issued when a pool cannot be imported. +.RE + +.sp +.ne 2 +.na +\fBzpool.destroy\fR +.ad +.RS 12n +Issued when a pool is destroyed. +.RE + +.sp +.ne 2 +.na +\fBzpool.export\fR +.ad +.RS 12n +Issued when a pool is exported. +.RE + +.sp +.ne 2 +.na +\fBzpool.import\fR +.ad +.RS 12n +Issued when a pool is imported. +.RE + +.sp +.ne 2 +.na +\fBzpool.reguid\fR +.ad +.RS 12n +Issued when a REGUID (new unique identifier for the pool have been regenerated) have been detected. +.RE + +.sp +.ne 2 +.na +\fBvdev.unknown\fR +.ad +.RS 12n +Issued when the vdev is unknown. Such as trying to clear device +errors on a vdev that have failed/been kicked from the system/pool +and is no longer available. +.RE + +.sp +.ne 2 +.na +\fBvdev.open_failed\fR +.ad +.RS 12n +Issued when a vdev could not be opened (because it didn't exist for example). +.RE + +.sp +.ne 2 +.na +\fBvdev.corrupt_data\fR +.ad +.RS 12n +Issued when corrupt data have been detected on a vdev. +.RE + +.sp +.ne 2 +.na +\fBvdev.no_replicas\fR +.ad +.RS 12n +Issued when there are no more replicas to sustain the pool. +This would lead to the pool being \fIDEGRADED\fR. +.RE + +.sp +.ne 2 +.na +\fBvdev.bad_guid_sum\fR +.ad +.RS 12n +Issued when a missing device in the pool have been detected. +.RE + +.sp +.ne 2 +.na +\fBvdev.too_small\fR +.ad +.RS 12n +Issued when the system (kernel) have removed a device, and ZFS +notices that the device isn't there any more. This is usually +followed by a \fBprobe_failure\fR event. +.RE + +.sp +.ne 2 +.na +\fBvdev.bad_label\fR +.ad +.RS 12n +Issued when the label is OK but invalid. +.RE + +.sp +.ne 2 +.na +\fBvdev.bad_ashift\fR +.ad +.RS 12n +Issued when the ashift alignment requirement has increased. +.RE + +.sp +.ne 2 +.na +\fBvdev.remove\fR +.ad +.RS 12n +Issued when a vdev is detached from a mirror (or a spare detached from a +vdev where it have been used to replace a failed drive - only works if +the original drive have been readded). +.RE + +.sp +.ne 2 +.na +\fBvdev.clear\fR +.ad +.RS 12n +Issued when clearing device errors in a pool. Such as running \fBzpool clear\fR +on a device in the pool. +.RE + +.sp +.ne 2 +.na +\fBvdev.check\fR +.ad +.RS 12n +Issued when a check to see if a given vdev could be opened is started. +.RE + +.sp +.ne 2 +.na +\fBvdev.spare\fR +.ad +.RS 12n +Issued when a spare have kicked in to replace a failed device. +.RE + +.sp +.ne 2 +.na +\fBvdev.autoexpand\fR +.ad +.RS 12n +Issued when a vdev can be automatically expanded. +.RE + +.sp +.ne 2 +.na +\fBio_failure\fR +.ad +.RS 12n +Issued when there is an I/O failure in a vdev in the pool. +.RE + +.sp +.ne 2 +.na +\fBprobe_failure\fR +.ad +.RS 12n +Issued when a probe fails on a vdev. This would occur if a vdev +have been kicked from the system outside of ZFS (such as the kernel +have removed the device). +.RE + +.sp +.ne 2 +.na +\fBlog_replay\fR +.ad +.RS 12n +Issued when the intent log cannot be replayed. The can occur in the case +of a missing or damaged log device. +.RE + +.sp +.ne 2 +.na +\fBresilver.start\fR +.ad +.RS 12n +Issued when a resilver is started. +.RE + +.sp +.ne 2 +.na +\fBresilver.finish\fR +.ad +.RS 12n +Issued when the running resilver have finished. +.RE + +.sp +.ne 2 +.na +\fBscrub.start\fR +.ad +.RS 12n +Issued when a scrub is started on a pool. +.RE + +.sp +.ne 2 +.na +\fBscrub.finish\fR +.ad +.RS 12n +Issued when a pool has finished scrubbing. +.RE + +.sp +.ne 2 +.na +\fBscrub.abort\fR +.ad +.RS 12n +Issued when a scrub is aborted on a pool. +.RE + +.sp +.ne 2 +.na +\fBscrub.resume\fR +.ad +.RS 12n +Issued when a scrub is resumed on a pool. +.RE + +.sp +.ne 2 +.na +\fBscrub.paused\fR +.ad +.RS 12n +Issued when a scrub is paused on a pool. +.RE + +.sp +.ne 2 +.na +\fBbootfs.vdev.attach\fR +.ad +.RS 12n +.RE + +.SS "PAYLOADS" +.sp +.LP +This is the payload (data, information) that accompanies an +event. +.sp +For +.BR zed (8), +these are set to uppercase and prefixed with \fBZEVENT_\fR. + +.sp +.ne 2 +.na +\fBpool\fR +.ad +.RS 12n +Pool name. +.RE + +.sp +.ne 2 +.na +\fBpool_failmode\fR +.ad +.RS 12n +Failmode - \fBwait\fR, \fBcontinue\fR or \fBpanic\fR. +See +.BR pool (8) +(\fIfailmode\fR property) for more information. +.RE + +.sp +.ne 2 +.na +\fBpool_guid\fR +.ad +.RS 12n +The GUID of the pool. +.RE + +.sp +.ne 2 +.na +\fBpool_context\fR +.ad +.RS 12n +The load state for the pool (0=none, 1=open, 2=import, 3=tryimport, 4=recover +5=error). +.RE + +.sp +.ne 2 +.na +\fBvdev_guid\fR +.ad +.RS 12n +The GUID of the vdev in question (the vdev failing or operated upon with +\fBzpool clear\fR etc). +.RE + +.sp +.ne 2 +.na +\fBvdev_type\fR +.ad +.RS 12n +Type of vdev - \fBdisk\fR, \fBfile\fR, \fBmirror\fR etc. See +.BR zpool (8) +under \fBVirtual Devices\fR for more information on possible values. +.RE + +.sp +.ne 2 +.na +\fBvdev_path\fR +.ad +.RS 12n +Full path of the vdev, including any \fI-partX\fR. +.RE + +.sp +.ne 2 +.na +\fBvdev_devid\fR +.ad +.RS 12n +ID of vdev (if any). +.RE + +.sp +.ne 2 +.na +\fBvdev_fru\fR +.ad +.RS 12n +Physical FRU location. +.RE + +.sp +.ne 2 +.na +\fBvdev_state\fR +.ad +.RS 12n +State of vdev (0=uninitialized, 1=closed, 2=offline, 3=removed, 4=failed to open, 5=faulted, 6=degraded, 7=healthy). +.RE + +.sp +.ne 2 +.na +\fBvdev_ashift\fR +.ad +.RS 12n +The ashift value of the vdev. +.RE + +.sp +.ne 2 +.na +\fBvdev_complete_ts\fR +.ad +.RS 12n +The time the last I/O completed for the specified vdev. +.RE + +.sp +.ne 2 +.na +\fBvdev_delta_ts\fR +.ad +.RS 12n +The time since the last I/O completed for the specified vdev. +.RE + +.sp +.ne 2 +.na +\fBvdev_spare_paths\fR +.ad +.RS 12n +List of spares, including full path and any \fI-partX\fR. +.RE + +.sp +.ne 2 +.na +\fBvdev_spare_guids\fR +.ad +.RS 12n +GUID(s) of spares. +.RE + +.sp +.ne 2 +.na +\fBvdev_read_errors\fR +.ad +.RS 12n +How many read errors that have been detected on the vdev. +.RE + +.sp +.ne 2 +.na +\fBvdev_write_errors\fR +.ad +.RS 12n +How many write errors that have been detected on the vdev. +.RE + +.sp +.ne 2 +.na +\fBvdev_cksum_errors\fR +.ad +.RS 12n +How many checkum errors that have been detected on the vdev. +.RE + +.sp +.ne 2 +.na +\fBparent_guid\fR +.ad +.RS 12n +GUID of the vdev parent. +.RE + +.sp +.ne 2 +.na +\fBparent_type\fR +.ad +.RS 12n +Type of parent. See \fBvdev_type\fR. +.RE + +.sp +.ne 2 +.na +\fBparent_path\fR +.ad +.RS 12n +Path of the vdev parent (if any). +.RE + +.sp +.ne 2 +.na +\fBparent_devid\fR +.ad +.RS 12n +ID of the vdev parent (if any). +.RE + +.sp +.ne 2 +.na +\fBzio_objset\fR +.ad +.RS 12n +The object set number for a given I/O. +.RE + +.sp +.ne 2 +.na +\fBzio_object\fR +.ad +.RS 12n +The object number for a given I/O. +.RE + +.sp +.ne 2 +.na +\fBzio_level\fR +.ad +.RS 12n +The block level for a given I/O. +.RE + +.sp +.ne 2 +.na +\fBzio_blkid\fR +.ad +.RS 12n +The block ID for a given I/O. +.RE + +.sp +.ne 2 +.na +\fBzio_err\fR +.ad +.RS 12n +The errno for a failure when handling a given I/O. +.RE + +.sp +.ne 2 +.na +\fBzio_offset\fR +.ad +.RS 12n +The offset in bytes of where to write the I/O for the specified vdev. +.RE + +.sp +.ne 2 +.na +\fBzio_size\fR +.ad +.RS 12n +The size in bytes of the I/O. +.RE + +.sp +.ne 2 +.na +\fBzio_flags\fR +.ad +.RS 12n +The current flags describing how the I/O should be handled. See the +\fBI/O FLAGS\fR section for the full list of I/O flags. +.RE + +.sp +.ne 2 +.na +\fBzio_stage\fR +.ad +.RS 12n +The current stage of the I/O in the pipeline. See the \fBI/O STAGES\fR +section for a full list of all the I/O stages. +.RE + +.sp +.ne 2 +.na +\fBzio_pipeline\fR +.ad +.RS 12n +The valid pipeline stages for the I/O. See the \fBI/O STAGES\fR section for a +full list of all the I/O stages. +.RE + +.sp +.ne 2 +.na +\fBzio_delay\fR +.ad +.RS 12n +The time in ticks (HZ) required for the block layer to service the I/O. Unlike +\fBzio_delta\fR this does not include any vdev queuing time and is therefore +solely a measure of the block layer performance. On most modern Linux systems +HZ is defined as 1000 making a tick equivalent to 1 millisecond. +.RE + +.sp +.ne 2 +.na +\fBzio_timestamp\fR +.ad +.RS 12n +The time when a given I/O was submitted. +.RE + +.sp +.ne 2 +.na +\fBzio_delta\fR +.ad +.RS 12n +The time required to service a given I/O. +.RE + +.sp +.ne 2 +.na +\fBprev_state\fR +.ad +.RS 12n +The previous state of the vdev. +.RE + +.sp +.ne 2 +.na +\fBcksum_expected\fR +.ad +.RS 12n +The expected checksum value. +.RE + +.sp +.ne 2 +.na +\fBcksum_actual\fR +.ad +.RS 12n +The actual/current checksum value. +.RE + +.sp +.ne 2 +.na +\fBcksum_algorithm\fR +.ad +.RS 12n +Checksum algorithm used. See \fBzfs\fR(8) for more information on checksum algorithms available. +.RE + +.sp +.ne 2 +.na +\fBcksum_byteswap\fR +.ad +.RS 12n +Checksum value is byte swapped. +.RE + +.sp +.ne 2 +.na +\fBbad_ranges\fR +.ad +.RS 12n +Checksum bad offset ranges. +.RE + +.sp +.ne 2 +.na +\fBbad_ranges_min_gap\fR +.ad +.RS 12n +Checksum allowed minimum gap. +.RE + +.sp +.ne 2 +.na +\fBbad_range_sets\fR +.ad +.RS 12n +Checksum for each range the number of bits set. +.RE + +.sp +.ne 2 +.na +\fBbad_range_clears\fR +.ad +.RS 12n +Checksum for each range the number of bits cleared. +.RE + +.sp +.ne 2 +.na +\fBbad_set_bits\fR +.ad +.RS 12n +Checksum array of bits set. +.RE + +.sp +.ne 2 +.na +\fBbad_cleared_bits\fR +.ad +.RS 12n +Checksum array of bits cleared. +.RE + +.sp +.ne 2 +.na +\fBbad_set_histogram\fR +.ad +.RS 12n +Checksum histogram of set bits by bit number in a 64-bit word. +.RE + +.sp +.ne 2 +.na +\fBbad_cleared_histogram\fR +.ad +.RS 12n +Checksum histogram of cleared bits by bit number in a 64-bit word. +.RE + +.SS "I/O STAGES" +.sp +.LP +The ZFS I/O pipeline is comprised of various stages which are defined +below. The individual stages are used to construct these basic I/O +operations: Read, Write, Free, Claim, and Ioctl. These stages may be +set on an event to describe the life cycle of a given I/O. + +.TS +tab(:); +l l l . +Stage:Bit Mask:Operations +_:_:_ +ZIO_STAGE_OPEN:0x00000001:RWFCI + +ZIO_STAGE_READ_BP_INIT:0x00000002:R---- +ZIO_STAGE_FREE_BP_INIT:0x00000004:--F-- +ZIO_STAGE_ISSUE_ASYNC:0x00000008:RWF-- +ZIO_STAGE_WRITE_BP_INIT:0x00000010:-W--- + +ZIO_STAGE_CHECKSUM_GENERATE:0x00000020:-W--- + +ZIO_STAGE_NOP_WRITE:0x00000040:-W--- + +ZIO_STAGE_DDT_READ_START:0x00000080:R---- +ZIO_STAGE_DDT_READ_DONE:0x00000100:R---- +ZIO_STAGE_DDT_WRITE:0x00000200:-W--- +ZIO_STAGE_DDT_FREE:0x00000400:--F-- + +ZIO_STAGE_GANG_ASSEMBLE:0x00000800:RWFC- +ZIO_STAGE_GANG_ISSUE:0x00001000:RWFC- + +ZIO_STAGE_DVA_ALLOCATE:0x00002000:-W--- +ZIO_STAGE_DVA_FREE:0x00004000:--F-- +ZIO_STAGE_DVA_CLAIM:0x00008000:---C- + +ZIO_STAGE_READY:0x00010000:RWFCI + +ZIO_STAGE_VDEV_IO_START:0x00020000:RW--I +ZIO_STAGE_VDEV_IO_DONE:0x00040000:RW--I +ZIO_STAGE_VDEV_IO_ASSESS:0x00080000:RW--I + +ZIO_STAGE_CHECKSUM_VERIFY0:0x00100000:R---- + +ZIO_STAGE_DONE:0x00200000:RWFCI +.TE + +.SS "I/O FLAGS" +.sp +.LP +Every I/O in the pipeline contains a set of flags which describe its +function and are used to govern its behavior. These flags will be set +in an event as an \fBzio_flags\fR payload entry. + +.TS +tab(:); +l l . +Flag:Bit Mask +_:_ +ZIO_FLAG_DONT_AGGREGATE:0x00000001 +ZIO_FLAG_IO_REPAIR:0x00000002 +ZIO_FLAG_SELF_HEAL:0x00000004 +ZIO_FLAG_RESILVER:0x00000008 +ZIO_FLAG_SCRUB:0x00000010 +ZIO_FLAG_SCAN_THREAD:0x00000020 +ZIO_FLAG_PHYSICAL:0x00000040 + +ZIO_FLAG_CANFAIL:0x00000080 +ZIO_FLAG_SPECULATIVE:0x00000100 +ZIO_FLAG_CONFIG_WRITER:0x00000200 +ZIO_FLAG_DONT_RETRY:0x00000400 +ZIO_FLAG_DONT_CACHE:0x00000800 +ZIO_FLAG_NODATA:0x00001000 +ZIO_FLAG_INDUCE_DAMAGE:0x00002000 + +ZIO_FLAG_IO_RETRY:0x00004000 +ZIO_FLAG_PROBE:0x00008000 +ZIO_FLAG_TRYHARD:0x00010000 +ZIO_FLAG_OPTIONAL:0x00020000 + +ZIO_FLAG_DONT_QUEUE:0x00040000 +ZIO_FLAG_DONT_PROPAGATE:0x00080000 +ZIO_FLAG_IO_BYPASS:0x00100000 +ZIO_FLAG_IO_REWRITE:0x00200000 +ZIO_FLAG_RAW:0x00400000 +ZIO_FLAG_GANG_CHILD:0x00800000 +ZIO_FLAG_DDT_CHILD:0x01000000 +ZIO_FLAG_GODFATHER:0x02000000 +ZIO_FLAG_NOPWRITE:0x04000000 +ZIO_FLAG_REEXECUTED:0x08000000 +ZIO_FLAG_DELEGATED:0x10000000 +ZIO_FLAG_FASTWRITE:0x20000000 +.TE diff --git a/man/man5/zfs-module-parameters.5 b/man/man5/zfs-module-parameters.5 new file mode 100644 index 000000000..dbfa8806a --- /dev/null +++ b/man/man5/zfs-module-parameters.5 @@ -0,0 +1,2754 @@ +'\" te +.\" Copyright (c) 2013 by Turbo Fredriksson <[email protected]>. All rights reserved. +.\" Copyright (c) 2017 Datto Inc. +.\" The contents of this file are subject to the terms of the Common Development +.\" and Distribution License (the "License"). You may not use this file except +.\" in compliance with the License. You can obtain a copy of the license at +.\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing. +.\" +.\" See the License for the specific language governing permissions and +.\" limitations under the License. When distributing Covered Code, include this +.\" CDDL HEADER in each file and include the License file at +.\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this +.\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your +.\" own identifying information: +.\" Portions Copyright [yyyy] [name of copyright owner] +.TH ZFS-MODULE-PARAMETERS 5 "Oct 28, 2017" +.SH NAME +zfs\-module\-parameters \- ZFS module parameters +.SH DESCRIPTION +.sp +.LP +Description of the different parameters to the ZFS module. + +.SS "Module parameters" +.sp +.LP + +.sp +.ne 2 +.na +\fBdbuf_cache_max_bytes\fR (ulong) +.ad +.RS 12n +Maximum size in bytes of the dbuf cache. When \fB0\fR this value will default +to \fB1/2^dbuf_cache_shift\fR (1/32) of the target ARC size, otherwise the +provided value in bytes will be used. The behavior of the dbuf cache and its +associated settings can be observed via the \fB/proc/spl/kstat/zfs/dbufstats\fR +kstat. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBdbuf_cache_hiwater_pct\fR (uint) +.ad +.RS 12n +The percentage over \fBdbuf_cache_max_bytes\fR when dbufs must be evicted +directly. +.sp +Default value: \fB10\fR%. +.RE + +.sp +.ne 2 +.na +\fBdbuf_cache_lowater_pct\fR (uint) +.ad +.RS 12n +The percentage below \fBdbuf_cache_max_bytes\fR when the evict thread stops +evicting dbufs. +.sp +Default value: \fB10\fR%. +.RE + +.sp +.ne 2 +.na +\fBdbuf_cache_shift\fR (int) +.ad +.RS 12n +Set the size of the dbuf cache, \fBdbuf_cache_max_bytes\fR, to a log2 fraction +of the target arc size. +.sp +Default value: \fB5\fR. +.RE + +.sp +.ne 2 +.na +\fBignore_hole_birth\fR (int) +.ad +.RS 12n +When set, the hole_birth optimization will not be used, and all holes will +always be sent on zfs send. Useful if you suspect your datasets are affected +by a bug in hole_birth. +.sp +Use \fB1\fR for on (default) and \fB0\fR for off. +.RE + +.sp +.ne 2 +.na +\fBl2arc_feed_again\fR (int) +.ad +.RS 12n +Turbo L2ARC warm-up. When the L2ARC is cold the fill interval will be set as +fast as possible. +.sp +Use \fB1\fR for yes (default) and \fB0\fR to disable. +.RE + +.sp +.ne 2 +.na +\fBl2arc_feed_min_ms\fR (ulong) +.ad +.RS 12n +Min feed interval in milliseconds. Requires \fBl2arc_feed_again=1\fR and only +applicable in related situations. +.sp +Default value: \fB200\fR. +.RE + +.sp +.ne 2 +.na +\fBl2arc_feed_secs\fR (ulong) +.ad +.RS 12n +Seconds between L2ARC writing +.sp +Default value: \fB1\fR. +.RE + +.sp +.ne 2 +.na +\fBl2arc_headroom\fR (ulong) +.ad +.RS 12n +How far through the ARC lists to search for L2ARC cacheable content, expressed +as a multiplier of \fBl2arc_write_max\fR +.sp +Default value: \fB2\fR. +.RE + +.sp +.ne 2 +.na +\fBl2arc_headroom_boost\fR (ulong) +.ad +.RS 12n +Scales \fBl2arc_headroom\fR by this percentage when L2ARC contents are being +successfully compressed before writing. A value of 100 disables this feature. +.sp +Default value: \fB200\fR%. +.RE + +.sp +.ne 2 +.na +\fBl2arc_noprefetch\fR (int) +.ad +.RS 12n +Do not write buffers to L2ARC if they were prefetched but not used by +applications +.sp +Use \fB1\fR for yes (default) and \fB0\fR to disable. +.RE + +.sp +.ne 2 +.na +\fBl2arc_norw\fR (int) +.ad +.RS 12n +No reads during writes +.sp +Use \fB1\fR for yes and \fB0\fR for no (default). +.RE + +.sp +.ne 2 +.na +\fBl2arc_write_boost\fR (ulong) +.ad +.RS 12n +Cold L2ARC devices will have \fBl2arc_write_max\fR increased by this amount +while they remain cold. +.sp +Default value: \fB8,388,608\fR. +.RE + +.sp +.ne 2 +.na +\fBl2arc_write_max\fR (ulong) +.ad +.RS 12n +Max write bytes per interval +.sp +Default value: \fB8,388,608\fR. +.RE + +.sp +.ne 2 +.na +\fBmetaslab_aliquot\fR (ulong) +.ad +.RS 12n +Metaslab granularity, in bytes. This is roughly similar to what would be +referred to as the "stripe size" in traditional RAID arrays. In normal +operation, ZFS will try to write this amount of data to a top-level vdev +before moving on to the next one. +.sp +Default value: \fB524,288\fR. +.RE + +.sp +.ne 2 +.na +\fBmetaslab_bias_enabled\fR (int) +.ad +.RS 12n +Enable metaslab group biasing based on its vdev's over- or under-utilization +relative to the pool. +.sp +Use \fB1\fR for yes (default) and \fB0\fR for no. +.RE + +.sp +.ne 2 +.na +\fBmetaslab_force_ganging\fR (ulong) +.ad +.RS 12n +Make some blocks above a certain size be gang blocks. This option is used +by the test suite to facilitate testing. +.sp +Default value: \fB16,777,217\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_metaslab_segment_weight_enabled\fR (int) +.ad +.RS 12n +Enable/disable segment-based metaslab selection. +.sp +Use \fB1\fR for yes (default) and \fB0\fR for no. +.RE + +.sp +.ne 2 +.na +\fBzfs_metaslab_switch_threshold\fR (int) +.ad +.RS 12n +When using segment-based metaslab selection, continue allocating +from the active metaslab until \fBzfs_metaslab_switch_threshold\fR +worth of buckets have been exhausted. +.sp +Default value: \fB2\fR. +.RE + +.sp +.ne 2 +.na +\fBmetaslab_debug_load\fR (int) +.ad +.RS 12n +Load all metaslabs during pool import. +.sp +Use \fB1\fR for yes and \fB0\fR for no (default). +.RE + +.sp +.ne 2 +.na +\fBmetaslab_debug_unload\fR (int) +.ad +.RS 12n +Prevent metaslabs from being unloaded. +.sp +Use \fB1\fR for yes and \fB0\fR for no (default). +.RE + +.sp +.ne 2 +.na +\fBmetaslab_fragmentation_factor_enabled\fR (int) +.ad +.RS 12n +Enable use of the fragmentation metric in computing metaslab weights. +.sp +Use \fB1\fR for yes (default) and \fB0\fR for no. +.RE + +.sp +.ne 2 +.na +\fBmetaslabs_per_vdev\fR (int) +.ad +.RS 12n +When a vdev is added, it will be divided into approximately (but no more than) this number of metaslabs. +.sp +Default value: \fB200\fR. +.RE + +.sp +.ne 2 +.na +\fBmetaslab_preload_enabled\fR (int) +.ad +.RS 12n +Enable metaslab group preloading. +.sp +Use \fB1\fR for yes (default) and \fB0\fR for no. +.RE + +.sp +.ne 2 +.na +\fBmetaslab_lba_weighting_enabled\fR (int) +.ad +.RS 12n +Give more weight to metaslabs with lower LBAs, assuming they have +greater bandwidth as is typically the case on a modern constant +angular velocity disk drive. +.sp +Use \fB1\fR for yes (default) and \fB0\fR for no. +.RE + +.sp +.ne 2 +.na +\fBspa_config_path\fR (charp) +.ad +.RS 12n +SPA config file +.sp +Default value: \fB/etc/zfs/zpool.cache\fR. +.RE + +.sp +.ne 2 +.na +\fBspa_asize_inflation\fR (int) +.ad +.RS 12n +Multiplication factor used to estimate actual disk consumption from the +size of data being written. The default value is a worst case estimate, +but lower values may be valid for a given pool depending on its +configuration. Pool administrators who understand the factors involved +may wish to specify a more realistic inflation factor, particularly if +they operate close to quota or capacity limits. +.sp +Default value: \fB24\fR. +.RE + +.sp +.ne 2 +.na +\fBspa_load_print_vdev_tree\fR (int) +.ad +.RS 12n +Whether to print the vdev tree in the debugging message buffer during pool import. +Use 0 to disable and 1 to enable. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBspa_load_verify_data\fR (int) +.ad +.RS 12n +Whether to traverse data blocks during an "extreme rewind" (\fB-X\fR) +import. Use 0 to disable and 1 to enable. + +An extreme rewind import normally performs a full traversal of all +blocks in the pool for verification. If this parameter is set to 0, +the traversal skips non-metadata blocks. It can be toggled once the +import has started to stop or start the traversal of non-metadata blocks. +.sp +Default value: \fB1\fR. +.RE + +.sp +.ne 2 +.na +\fBspa_load_verify_metadata\fR (int) +.ad +.RS 12n +Whether to traverse blocks during an "extreme rewind" (\fB-X\fR) +pool import. Use 0 to disable and 1 to enable. + +An extreme rewind import normally performs a full traversal of all +blocks in the pool for verification. If this parameter is set to 0, +the traversal is not performed. It can be toggled once the import has +started to stop or start the traversal. +.sp +Default value: \fB1\fR. +.RE + +.sp +.ne 2 +.na +\fBspa_load_verify_maxinflight\fR (int) +.ad +.RS 12n +Maximum concurrent I/Os during the traversal performed during an "extreme +rewind" (\fB-X\fR) pool import. +.sp +Default value: \fB10000\fR. +.RE + +.sp +.ne 2 +.na +\fBspa_slop_shift\fR (int) +.ad +.RS 12n +Normally, we don't allow the last 3.2% (1/(2^spa_slop_shift)) of space +in the pool to be consumed. This ensures that we don't run the pool +completely out of space, due to unaccounted changes (e.g. to the MOS). +It also limits the worst-case time to allocate space. If we have +less than this amount of free space, most ZPL operations (e.g. write, +create) will return ENOSPC. +.sp +Default value: \fB5\fR. +.RE + +.sp +.ne 2 +.na +\fBvdev_removal_max_span\fR (int) +.ad +.RS 12n +During top-level vdev removal, chunks of data are copied from the vdev +which may include free space in order to trade bandwidth for IOPS. +This parameter determines the maximum span of free space (in bytes) +which will be included as "unnecessary" data in a chunk of copied data. + +The default value here was chosen to align with +\fBzfs_vdev_read_gap_limit\fR, which is a similar concept when doing +regular reads (but there's no reason it has to be the same). +.sp +Default value: \fB32,768\fR. +.RE + +.sp +.ne 2 +.na +\fBzfetch_array_rd_sz\fR (ulong) +.ad +.RS 12n +If prefetching is enabled, disable prefetching for reads larger than this size. +.sp +Default value: \fB1,048,576\fR. +.RE + +.sp +.ne 2 +.na +\fBzfetch_max_distance\fR (uint) +.ad +.RS 12n +Max bytes to prefetch per stream (default 8MB). +.sp +Default value: \fB8,388,608\fR. +.RE + +.sp +.ne 2 +.na +\fBzfetch_max_streams\fR (uint) +.ad +.RS 12n +Max number of streams per zfetch (prefetch streams per file). +.sp +Default value: \fB8\fR. +.RE + +.sp +.ne 2 +.na +\fBzfetch_min_sec_reap\fR (uint) +.ad +.RS 12n +Min time before an active prefetch stream can be reclaimed +.sp +Default value: \fB2\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_arc_dnode_limit\fR (ulong) +.ad +.RS 12n +When the number of bytes consumed by dnodes in the ARC exceeds this number of +bytes, try to unpin some of it in response to demand for non-metadata. This +value acts as a ceiling to the amount of dnode metadata, and defaults to 0 which +indicates that a percent which is based on \fBzfs_arc_dnode_limit_percent\fR of +the ARC meta buffers that may be used for dnodes. + +See also \fBzfs_arc_meta_prune\fR which serves a similar purpose but is used +when the amount of metadata in the ARC exceeds \fBzfs_arc_meta_limit\fR rather +than in response to overall demand for non-metadata. + +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_arc_dnode_limit_percent\fR (ulong) +.ad +.RS 12n +Percentage that can be consumed by dnodes of ARC meta buffers. +.sp +See also \fBzfs_arc_dnode_limit\fR which serves a similar purpose but has a +higher priority if set to nonzero value. +.sp +Default value: \fB10\fR%. +.RE + +.sp +.ne 2 +.na +\fBzfs_arc_dnode_reduce_percent\fR (ulong) +.ad +.RS 12n +Percentage of ARC dnodes to try to scan in response to demand for non-metadata +when the number of bytes consumed by dnodes exceeds \fBzfs_arc_dnode_limit\fR. + +.sp +Default value: \fB10\fR% of the number of dnodes in the ARC. +.RE + +.sp +.ne 2 +.na +\fBzfs_arc_average_blocksize\fR (int) +.ad +.RS 12n +The ARC's buffer hash table is sized based on the assumption of an average +block size of \fBzfs_arc_average_blocksize\fR (default 8K). This works out +to roughly 1MB of hash table per 1GB of physical memory with 8-byte pointers. +For configurations with a known larger average block size this value can be +increased to reduce the memory footprint. + +.sp +Default value: \fB8192\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_arc_evict_batch_limit\fR (int) +.ad +.RS 12n +Number ARC headers to evict per sub-list before proceeding to another sub-list. +This batch-style operation prevents entire sub-lists from being evicted at once +but comes at a cost of additional unlocking and locking. +.sp +Default value: \fB10\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_arc_grow_retry\fR (int) +.ad +.RS 12n +If set to a non zero value, it will replace the arc_grow_retry value with this value. +The arc_grow_retry value (default 5) is the number of seconds the ARC will wait before +trying to resume growth after a memory pressure event. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_arc_lotsfree_percent\fR (int) +.ad +.RS 12n +Throttle I/O when free system memory drops below this percentage of total +system memory. Setting this value to 0 will disable the throttle. +.sp +Default value: \fB10\fR%. +.RE + +.sp +.ne 2 +.na +\fBzfs_arc_max\fR (ulong) +.ad +.RS 12n +Max arc size of ARC in bytes. If set to 0 then it will consume 1/2 of system +RAM. This value must be at least 67108864 (64 megabytes). +.sp +This value can be changed dynamically with some caveats. It cannot be set back +to 0 while running and reducing it below the current ARC size will not cause +the ARC to shrink without memory pressure to induce shrinking. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_arc_meta_adjust_restarts\fR (ulong) +.ad +.RS 12n +The number of restart passes to make while scanning the ARC attempting +the free buffers in order to stay below the \fBzfs_arc_meta_limit\fR. +This value should not need to be tuned but is available to facilitate +performance analysis. +.sp +Default value: \fB4096\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_arc_meta_limit\fR (ulong) +.ad +.RS 12n +The maximum allowed size in bytes that meta data buffers are allowed to +consume in the ARC. When this limit is reached meta data buffers will +be reclaimed even if the overall arc_c_max has not been reached. This +value defaults to 0 which indicates that a percent which is based on +\fBzfs_arc_meta_limit_percent\fR of the ARC may be used for meta data. +.sp +This value my be changed dynamically except that it cannot be set back to 0 +for a specific percent of the ARC; it must be set to an explicit value. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_arc_meta_limit_percent\fR (ulong) +.ad +.RS 12n +Percentage of ARC buffers that can be used for meta data. + +See also \fBzfs_arc_meta_limit\fR which serves a similar purpose but has a +higher priority if set to nonzero value. + +.sp +Default value: \fB75\fR%. +.RE + +.sp +.ne 2 +.na +\fBzfs_arc_meta_min\fR (ulong) +.ad +.RS 12n +The minimum allowed size in bytes that meta data buffers may consume in +the ARC. This value defaults to 0 which disables a floor on the amount +of the ARC devoted meta data. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_arc_meta_prune\fR (int) +.ad +.RS 12n +The number of dentries and inodes to be scanned looking for entries +which can be dropped. This may be required when the ARC reaches the +\fBzfs_arc_meta_limit\fR because dentries and inodes can pin buffers +in the ARC. Increasing this value will cause to dentry and inode caches +to be pruned more aggressively. Setting this value to 0 will disable +pruning the inode and dentry caches. +.sp +Default value: \fB10,000\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_arc_meta_strategy\fR (int) +.ad +.RS 12n +Define the strategy for ARC meta data buffer eviction (meta reclaim strategy). +A value of 0 (META_ONLY) will evict only the ARC meta data buffers. +A value of 1 (BALANCED) indicates that additional data buffers may be evicted if +that is required to in order to evict the required number of meta data buffers. +.sp +Default value: \fB1\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_arc_min\fR (ulong) +.ad +.RS 12n +Min arc size of ARC in bytes. If set to 0 then arc_c_min will default to +consuming the larger of 32M or 1/32 of total system memory. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_arc_min_prefetch_ms\fR (int) +.ad +.RS 12n +Minimum time prefetched blocks are locked in the ARC, specified in ms. +A value of \fB0\fR will default to 1000 ms. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_arc_min_prescient_prefetch_ms\fR (int) +.ad +.RS 12n +Minimum time "prescient prefetched" blocks are locked in the ARC, specified +in ms. These blocks are meant to be prefetched fairly aggresively ahead of +the code that may use them. A value of \fB0\fR will default to 6000 ms. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_max_missing_tvds\fR (int) +.ad +.RS 12n +Number of missing top-level vdevs which will be allowed during +pool import (only in read-only mode). +.sp +Default value: \fB0\fR +.RE + +.sp +.ne 2 +.na +\fBzfs_multilist_num_sublists\fR (int) +.ad +.RS 12n +To allow more fine-grained locking, each ARC state contains a series +of lists for both data and meta data objects. Locking is performed at +the level of these "sub-lists". This parameters controls the number of +sub-lists per ARC state, and also applies to other uses of the +multilist data structure. +.sp +Default value: \fB4\fR or the number of online CPUs, whichever is greater +.RE + +.sp +.ne 2 +.na +\fBzfs_arc_overflow_shift\fR (int) +.ad +.RS 12n +The ARC size is considered to be overflowing if it exceeds the current +ARC target size (arc_c) by a threshold determined by this parameter. +The threshold is calculated as a fraction of arc_c using the formula +"arc_c >> \fBzfs_arc_overflow_shift\fR". + +The default value of 8 causes the ARC to be considered to be overflowing +if it exceeds the target size by 1/256th (0.3%) of the target size. + +When the ARC is overflowing, new buffer allocations are stalled until +the reclaim thread catches up and the overflow condition no longer exists. +.sp +Default value: \fB8\fR. +.RE + +.sp +.ne 2 +.na + +\fBzfs_arc_p_min_shift\fR (int) +.ad +.RS 12n +If set to a non zero value, this will update arc_p_min_shift (default 4) +with the new value. +arc_p_min_shift is used to shift of arc_c for calculating both min and max +max arc_p +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_arc_p_dampener_disable\fR (int) +.ad +.RS 12n +Disable arc_p adapt dampener +.sp +Use \fB1\fR for yes (default) and \fB0\fR to disable. +.RE + +.sp +.ne 2 +.na +\fBzfs_arc_shrink_shift\fR (int) +.ad +.RS 12n +If set to a non zero value, this will update arc_shrink_shift (default 7) +with the new value. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_arc_pc_percent\fR (uint) +.ad +.RS 12n +Percent of pagecache to reclaim arc to + +This tunable allows ZFS arc to play more nicely with the kernel's LRU +pagecache. It can guarantee that the arc size won't collapse under scanning +pressure on the pagecache, yet still allows arc to be reclaimed down to +zfs_arc_min if necessary. This value is specified as percent of pagecache +size (as measured by NR_FILE_PAGES) where that percent may exceed 100. This +only operates during memory pressure/reclaim. +.sp +Default value: \fB0\fR% (disabled). +.RE + +.sp +.ne 2 +.na +\fBzfs_arc_sys_free\fR (ulong) +.ad +.RS 12n +The target number of bytes the ARC should leave as free memory on the system. +Defaults to the larger of 1/64 of physical memory or 512K. Setting this +option to a non-zero value will override the default. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_autoimport_disable\fR (int) +.ad +.RS 12n +Disable pool import at module load by ignoring the cache file (typically \fB/etc/zfs/zpool.cache\fR). +.sp +Use \fB1\fR for yes (default) and \fB0\fR for no. +.RE + +.sp +.ne 2 +.na +\fBzfs_checksums_per_second\fR (int) +.ad +.RS 12n +Rate limit checksum events to this many per second. Note that this should +not be set below the zed thresholds (currently 10 checksums over 10 sec) +or else zed may not trigger any action. +.sp +Default value: 20 +.RE + +.sp +.ne 2 +.na +\fBzfs_commit_timeout_pct\fR (int) +.ad +.RS 12n +This controls the amount of time that a ZIL block (lwb) will remain "open" +when it isn't "full", and it has a thread waiting for it to be committed to +stable storage. The timeout is scaled based on a percentage of the last lwb +latency to avoid significantly impacting the latency of each individual +transaction record (itx). +.sp +Default value: \fB5\fR%. +.RE + +.sp +.ne 2 +.na +\fBzfs_condense_indirect_vdevs_enable\fR (int) +.ad +.RS 12n +Enable condensing indirect vdev mappings. When set to a non-zero value, +attempt to condense indirect vdev mappings if the mapping uses more than +\fBzfs_condense_min_mapping_bytes\fR bytes of memory and if the obsolete +space map object uses more than \fBzfs_condense_max_obsolete_bytes\fR +bytes on-disk. The condensing process is an attempt to save memory by +removing obsolete mappings. +.sp +Default value: \fB1\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_condense_max_obsolete_bytes\fR (ulong) +.ad +.RS 12n +Only attempt to condense indirect vdev mappings if the on-disk size +of the obsolete space map object is greater than this number of bytes +(see \fBfBzfs_condense_indirect_vdevs_enable\fR). +.sp +Default value: \fB1,073,741,824\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_condense_min_mapping_bytes\fR (ulong) +.ad +.RS 12n +Minimum size vdev mapping to attempt to condense (see +\fBzfs_condense_indirect_vdevs_enable\fR). +.sp +Default value: \fB131,072\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_dbgmsg_enable\fR (int) +.ad +.RS 12n +Internally ZFS keeps a small log to facilitate debugging. By default the log +is disabled, to enable it set this option to 1. The contents of the log can +be accessed by reading the /proc/spl/kstat/zfs/dbgmsg file. Writing 0 to +this proc file clears the log. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_dbgmsg_maxsize\fR (int) +.ad +.RS 12n +The maximum size in bytes of the internal ZFS debug log. +.sp +Default value: \fB4M\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_dbuf_state_index\fR (int) +.ad +.RS 12n +This feature is currently unused. It is normally used for controlling what +reporting is available under /proc/spl/kstat/zfs. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_deadman_enabled\fR (int) +.ad +.RS 12n +When a pool sync operation takes longer than \fBzfs_deadman_synctime_ms\fR +milliseconds, or when an individual I/O takes longer than +\fBzfs_deadman_ziotime_ms\fR milliseconds, then the operation is considered to +be "hung". If \fBzfs_deadman_enabled\fR is set then the deadman behavior is +invoked as described by the \fBzfs_deadman_failmode\fR module option. +By default the deadman is enabled and configured to \fBwait\fR which results +in "hung" I/Os only being logged. The deadman is automatically disabled +when a pool gets suspended. +.sp +Default value: \fB1\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_deadman_failmode\fR (charp) +.ad +.RS 12n +Controls the failure behavior when the deadman detects a "hung" I/O. Valid +values are \fBwait\fR, \fBcontinue\fR, and \fBpanic\fR. +.sp +\fBwait\fR - Wait for a "hung" I/O to complete. For each "hung" I/O a +"deadman" event will be posted describing that I/O. +.sp +\fBcontinue\fR - Attempt to recover from a "hung" I/O by re-dispatching it +to the I/O pipeline if possible. +.sp +\fBpanic\fR - Panic the system. This can be used to facilitate an automatic +fail-over to a properly configured fail-over partner. +.sp +Default value: \fBwait\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_deadman_checktime_ms\fR (int) +.ad +.RS 12n +Check time in milliseconds. This defines the frequency at which we check +for hung I/O and potentially invoke the \fBzfs_deadman_failmode\fR behavior. +.sp +Default value: \fB60,000\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_deadman_synctime_ms\fR (ulong) +.ad +.RS 12n +Interval in milliseconds after which the deadman is triggered and also +the interval after which a pool sync operation is considered to be "hung". +Once this limit is exceeded the deadman will be invoked every +\fBzfs_deadman_checktime_ms\fR milliseconds until the pool sync completes. +.sp +Default value: \fB600,000\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_deadman_ziotime_ms\fR (ulong) +.ad +.RS 12n +Interval in milliseconds after which the deadman is triggered and an +individual IO operation is considered to be "hung". As long as the I/O +remains "hung" the deadman will be invoked every \fBzfs_deadman_checktime_ms\fR +milliseconds until the I/O completes. +.sp +Default value: \fB300,000\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_dedup_prefetch\fR (int) +.ad +.RS 12n +Enable prefetching dedup-ed blks +.sp +Use \fB1\fR for yes and \fB0\fR to disable (default). +.RE + +.sp +.ne 2 +.na +\fBzfs_delay_min_dirty_percent\fR (int) +.ad +.RS 12n +Start to delay each transaction once there is this amount of dirty data, +expressed as a percentage of \fBzfs_dirty_data_max\fR. +This value should be >= zfs_vdev_async_write_active_max_dirty_percent. +See the section "ZFS TRANSACTION DELAY". +.sp +Default value: \fB60\fR%. +.RE + +.sp +.ne 2 +.na +\fBzfs_delay_scale\fR (int) +.ad +.RS 12n +This controls how quickly the transaction delay approaches infinity. +Larger values cause longer delays for a given amount of dirty data. +.sp +For the smoothest delay, this value should be about 1 billion divided +by the maximum number of operations per second. This will smoothly +handle between 10x and 1/10th this number. +.sp +See the section "ZFS TRANSACTION DELAY". +.sp +Note: \fBzfs_delay_scale\fR * \fBzfs_dirty_data_max\fR must be < 2^64. +.sp +Default value: \fB500,000\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_delays_per_second\fR (int) +.ad +.RS 12n +Rate limit IO delay events to this many per second. +.sp +Default value: 20 +.RE + +.sp +.ne 2 +.na +\fBzfs_delete_blocks\fR (ulong) +.ad +.RS 12n +This is the used to define a large file for the purposes of delete. Files +containing more than \fBzfs_delete_blocks\fR will be deleted asynchronously +while smaller files are deleted synchronously. Decreasing this value will +reduce the time spent in an unlink(2) system call at the expense of a longer +delay before the freed space is available. +.sp +Default value: \fB20,480\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_dirty_data_max\fR (int) +.ad +.RS 12n +Determines the dirty space limit in bytes. Once this limit is exceeded, new +writes are halted until space frees up. This parameter takes precedence +over \fBzfs_dirty_data_max_percent\fR. +See the section "ZFS TRANSACTION DELAY". +.sp +Default value: \fB10\fR% of physical RAM, capped at \fBzfs_dirty_data_max_max\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_dirty_data_max_max\fR (int) +.ad +.RS 12n +Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed in bytes. +This limit is only enforced at module load time, and will be ignored if +\fBzfs_dirty_data_max\fR is later changed. This parameter takes +precedence over \fBzfs_dirty_data_max_max_percent\fR. See the section +"ZFS TRANSACTION DELAY". +.sp +Default value: \fB25\fR% of physical RAM. +.RE + +.sp +.ne 2 +.na +\fBzfs_dirty_data_max_max_percent\fR (int) +.ad +.RS 12n +Maximum allowable value of \fBzfs_dirty_data_max\fR, expressed as a +percentage of physical RAM. This limit is only enforced at module load +time, and will be ignored if \fBzfs_dirty_data_max\fR is later changed. +The parameter \fBzfs_dirty_data_max_max\fR takes precedence over this +one. See the section "ZFS TRANSACTION DELAY". +.sp +Default value: \fB25\fR%. +.RE + +.sp +.ne 2 +.na +\fBzfs_dirty_data_max_percent\fR (int) +.ad +.RS 12n +Determines the dirty space limit, expressed as a percentage of all +memory. Once this limit is exceeded, new writes are halted until space frees +up. The parameter \fBzfs_dirty_data_max\fR takes precedence over this +one. See the section "ZFS TRANSACTION DELAY". +.sp +Default value: \fB10\fR%, subject to \fBzfs_dirty_data_max_max\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_dirty_data_sync\fR (int) +.ad +.RS 12n +Start syncing out a transaction group if there is at least this much dirty data. +.sp +Default value: \fB67,108,864\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_fletcher_4_impl\fR (string) +.ad +.RS 12n +Select a fletcher 4 implementation. +.sp +Supported selectors are: \fBfastest\fR, \fBscalar\fR, \fBsse2\fR, \fBssse3\fR, +\fBavx2\fR, \fBavx512f\fR, and \fBaarch64_neon\fR. +All of the selectors except \fBfastest\fR and \fBscalar\fR require instruction +set extensions to be available and will only appear if ZFS detects that they are +present at runtime. If multiple implementations of fletcher 4 are available, +the \fBfastest\fR will be chosen using a micro benchmark. Selecting \fBscalar\fR +results in the original, CPU based calculation, being used. Selecting any option +other than \fBfastest\fR and \fBscalar\fR results in vector instructions from +the respective CPU instruction set being used. +.sp +Default value: \fBfastest\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_free_bpobj_enabled\fR (int) +.ad +.RS 12n +Enable/disable the processing of the free_bpobj object. +.sp +Default value: \fB1\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_async_block_max_blocks\fR (ulong) +.ad +.RS 12n +Maximum number of blocks freed in a single txg. +.sp +Default value: \fB100,000\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_override_estimate_recordsize\fR (ulong) +.ad +.RS 12n +Record size calculation override for zfs send estimates. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_async_read_max_active\fR (int) +.ad +.RS 12n +Maximum asynchronous read I/Os active to each device. +See the section "ZFS I/O SCHEDULER". +.sp +Default value: \fB3\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_async_read_min_active\fR (int) +.ad +.RS 12n +Minimum asynchronous read I/Os active to each device. +See the section "ZFS I/O SCHEDULER". +.sp +Default value: \fB1\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_async_write_active_max_dirty_percent\fR (int) +.ad +.RS 12n +When the pool has more than +\fBzfs_vdev_async_write_active_max_dirty_percent\fR dirty data, use +\fBzfs_vdev_async_write_max_active\fR to limit active async writes. If +the dirty data is between min and max, the active I/O limit is linearly +interpolated. See the section "ZFS I/O SCHEDULER". +.sp +Default value: \fB60\fR%. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_async_write_active_min_dirty_percent\fR (int) +.ad +.RS 12n +When the pool has less than +\fBzfs_vdev_async_write_active_min_dirty_percent\fR dirty data, use +\fBzfs_vdev_async_write_min_active\fR to limit active async writes. If +the dirty data is between min and max, the active I/O limit is linearly +interpolated. See the section "ZFS I/O SCHEDULER". +.sp +Default value: \fB30\fR%. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_async_write_max_active\fR (int) +.ad +.RS 12n +Maximum asynchronous write I/Os active to each device. +See the section "ZFS I/O SCHEDULER". +.sp +Default value: \fB10\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_async_write_min_active\fR (int) +.ad +.RS 12n +Minimum asynchronous write I/Os active to each device. +See the section "ZFS I/O SCHEDULER". +.sp +Lower values are associated with better latency on rotational media but poorer +resilver performance. The default value of 2 was chosen as a compromise. A +value of 3 has been shown to improve resilver performance further at a cost of +further increasing latency. +.sp +Default value: \fB2\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_max_active\fR (int) +.ad +.RS 12n +The maximum number of I/Os active to each device. Ideally, this will be >= +the sum of each queue's max_active. It must be at least the sum of each +queue's min_active. See the section "ZFS I/O SCHEDULER". +.sp +Default value: \fB1,000\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_scrub_max_active\fR (int) +.ad +.RS 12n +Maximum scrub I/Os active to each device. +See the section "ZFS I/O SCHEDULER". +.sp +Default value: \fB2\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_scrub_min_active\fR (int) +.ad +.RS 12n +Minimum scrub I/Os active to each device. +See the section "ZFS I/O SCHEDULER". +.sp +Default value: \fB1\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_sync_read_max_active\fR (int) +.ad +.RS 12n +Maximum synchronous read I/Os active to each device. +See the section "ZFS I/O SCHEDULER". +.sp +Default value: \fB10\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_sync_read_min_active\fR (int) +.ad +.RS 12n +Minimum synchronous read I/Os active to each device. +See the section "ZFS I/O SCHEDULER". +.sp +Default value: \fB10\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_sync_write_max_active\fR (int) +.ad +.RS 12n +Maximum synchronous write I/Os active to each device. +See the section "ZFS I/O SCHEDULER". +.sp +Default value: \fB10\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_sync_write_min_active\fR (int) +.ad +.RS 12n +Minimum synchronous write I/Os active to each device. +See the section "ZFS I/O SCHEDULER". +.sp +Default value: \fB10\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_queue_depth_pct\fR (int) +.ad +.RS 12n +Maximum number of queued allocations per top-level vdev expressed as +a percentage of \fBzfs_vdev_async_write_max_active\fR which allows the +system to detect devices that are more capable of handling allocations +and to allocate more blocks to those devices. It allows for dynamic +allocation distribution when devices are imbalanced as fuller devices +will tend to be slower than empty devices. + +See also \fBzio_dva_throttle_enabled\fR. +.sp +Default value: \fB1000\fR%. +.RE + +.sp +.ne 2 +.na +\fBzfs_expire_snapshot\fR (int) +.ad +.RS 12n +Seconds to expire .zfs/snapshot +.sp +Default value: \fB300\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_admin_snapshot\fR (int) +.ad +.RS 12n +Allow the creation, removal, or renaming of entries in the .zfs/snapshot +directory to cause the creation, destruction, or renaming of snapshots. +When enabled this functionality works both locally and over NFS exports +which have the 'no_root_squash' option set. This functionality is disabled +by default. +.sp +Use \fB1\fR for yes and \fB0\fR for no (default). +.RE + +.sp +.ne 2 +.na +\fBzfs_flags\fR (int) +.ad +.RS 12n +Set additional debugging flags. The following flags may be bitwise-or'd +together. +.sp +.TS +box; +rB lB +lB lB +r l. +Value Symbolic Name + Description +_ +1 ZFS_DEBUG_DPRINTF + Enable dprintf entries in the debug log. +_ +2 ZFS_DEBUG_DBUF_VERIFY * + Enable extra dbuf verifications. +_ +4 ZFS_DEBUG_DNODE_VERIFY * + Enable extra dnode verifications. +_ +8 ZFS_DEBUG_SNAPNAMES + Enable snapshot name verification. +_ +16 ZFS_DEBUG_MODIFY + Check for illegally modified ARC buffers. +_ +64 ZFS_DEBUG_ZIO_FREE + Enable verification of block frees. +_ +128 ZFS_DEBUG_HISTOGRAM_VERIFY + Enable extra spacemap histogram verifications. +_ +256 ZFS_DEBUG_METASLAB_VERIFY + Verify space accounting on disk matches in-core range_trees. +_ +512 ZFS_DEBUG_SET_ERROR + Enable SET_ERROR and dprintf entries in the debug log. +.TE +.sp +* Requires debug build. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_free_leak_on_eio\fR (int) +.ad +.RS 12n +If destroy encounters an EIO while reading metadata (e.g. indirect +blocks), space referenced by the missing metadata can not be freed. +Normally this causes the background destroy to become "stalled", as +it is unable to make forward progress. While in this stalled state, +all remaining space to free from the error-encountering filesystem is +"temporarily leaked". Set this flag to cause it to ignore the EIO, +permanently leak the space from indirect blocks that can not be read, +and continue to free everything else that it can. + +The default, "stalling" behavior is useful if the storage partially +fails (i.e. some but not all i/os fail), and then later recovers. In +this case, we will be able to continue pool operations while it is +partially failed, and when it recovers, we can continue to free the +space, with no leaks. However, note that this case is actually +fairly rare. + +Typically pools either (a) fail completely (but perhaps temporarily, +e.g. a top-level vdev going offline), or (b) have localized, +permanent errors (e.g. disk returns the wrong data due to bit flip or +firmware bug). In case (a), this setting does not matter because the +pool will be suspended and the sync thread will not be able to make +forward progress regardless. In case (b), because the error is +permanent, the best we can do is leak the minimum amount of space, +which is what setting this flag will do. Therefore, it is reasonable +for this flag to normally be set, but we chose the more conservative +approach of not setting it, so that there is no possibility of +leaking space in the "partial temporary" failure case. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_free_min_time_ms\fR (int) +.ad +.RS 12n +During a \fBzfs destroy\fR operation using \fBfeature@async_destroy\fR a minimum +of this much time will be spent working on freeing blocks per txg. +.sp +Default value: \fB1,000\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_immediate_write_sz\fR (long) +.ad +.RS 12n +Largest data block to write to zil. Larger blocks will be treated as if the +dataset being written to had the property setting \fBlogbias=throughput\fR. +.sp +Default value: \fB32,768\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_max_recordsize\fR (int) +.ad +.RS 12n +We currently support block sizes from 512 bytes to 16MB. The benefits of +larger blocks, and thus larger IO, need to be weighed against the cost of +COWing a giant block to modify one byte. Additionally, very large blocks +can have an impact on i/o latency, and also potentially on the memory +allocator. Therefore, we do not allow the recordsize to be set larger than +zfs_max_recordsize (default 1MB). Larger blocks can be created by changing +this tunable, and pools with larger blocks can always be imported and used, +regardless of this setting. +.sp +Default value: \fB1,048,576\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_metaslab_fragmentation_threshold\fR (int) +.ad +.RS 12n +Allow metaslabs to keep their active state as long as their fragmentation +percentage is less than or equal to this value. An active metaslab that +exceeds this threshold will no longer keep its active status allowing +better metaslabs to be selected. +.sp +Default value: \fB70\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_mg_fragmentation_threshold\fR (int) +.ad +.RS 12n +Metaslab groups are considered eligible for allocations if their +fragmentation metric (measured as a percentage) is less than or equal to +this value. If a metaslab group exceeds this threshold then it will be +skipped unless all metaslab groups within the metaslab class have also +crossed this threshold. +.sp +Default value: \fB85\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_mg_noalloc_threshold\fR (int) +.ad +.RS 12n +Defines a threshold at which metaslab groups should be eligible for +allocations. The value is expressed as a percentage of free space +beyond which a metaslab group is always eligible for allocations. +If a metaslab group's free space is less than or equal to the +threshold, the allocator will avoid allocating to that group +unless all groups in the pool have reached the threshold. Once all +groups have reached the threshold, all groups are allowed to accept +allocations. The default value of 0 disables the feature and causes +all metaslab groups to be eligible for allocations. + +This parameter allows one to deal with pools having heavily imbalanced +vdevs such as would be the case when a new vdev has been added. +Setting the threshold to a non-zero percentage will stop allocations +from being made to vdevs that aren't filled to the specified percentage +and allow lesser filled vdevs to acquire more allocations than they +otherwise would under the old \fBzfs_mg_alloc_failures\fR facility. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_multihost_history\fR (int) +.ad +.RS 12n +Historical statistics for the last N multihost updates will be available in +\fB/proc/spl/kstat/zfs/<pool>/multihost\fR +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_multihost_interval\fR (ulong) +.ad +.RS 12n +Used to control the frequency of multihost writes which are performed when the +\fBmultihost\fR pool property is on. This is one factor used to determine +the length of the activity check during import. +.sp +The multihost write period is \fBzfs_multihost_interval / leaf-vdevs\fR milliseconds. +This means that on average a multihost write will be issued for each leaf vdev every +\fBzfs_multihost_interval\fR milliseconds. In practice, the observed period can +vary with the I/O load and this observed value is the delay which is stored in +the uberblock. +.sp +On import the activity check waits a minimum amount of time determined by +\fBzfs_multihost_interval * zfs_multihost_import_intervals\fR. The activity +check time may be further extended if the value of mmp delay found in the best +uberblock indicates actual multihost updates happened at longer intervals than +\fBzfs_multihost_interval\fR. A minimum value of \fB100ms\fR is enforced. +.sp +Default value: \fB1000\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_multihost_import_intervals\fR (uint) +.ad +.RS 12n +Used to control the duration of the activity test on import. Smaller values of +\fBzfs_multihost_import_intervals\fR will reduce the import time but increase +the risk of failing to detect an active pool. The total activity check time is +never allowed to drop below one second. A value of 0 is ignored and treated as +if it was set to 1 +.sp +Default value: \fB10\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_multihost_fail_intervals\fR (uint) +.ad +.RS 12n +Controls the behavior of the pool when multihost write failures are detected. +.sp +When \fBzfs_multihost_fail_intervals = 0\fR then multihost write failures are ignored. +The failures will still be reported to the ZED which depending on its +configuration may take action such as suspending the pool or offlining a device. +.sp +When \fBzfs_multihost_fail_intervals > 0\fR then sequential multihost write failures +will cause the pool to be suspended. This occurs when +\fBzfs_multihost_fail_intervals * zfs_multihost_interval\fR milliseconds have +passed since the last successful multihost write. This guarantees the activity test +will see multihost writes if the pool is imported. +.sp +Default value: \fB5\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_no_scrub_io\fR (int) +.ad +.RS 12n +Set for no scrub I/O. This results in scrubs not actually scrubbing data and +simply doing a metadata crawl of the pool instead. +.sp +Use \fB1\fR for yes and \fB0\fR for no (default). +.RE + +.sp +.ne 2 +.na +\fBzfs_no_scrub_prefetch\fR (int) +.ad +.RS 12n +Set to disable block prefetching for scrubs. +.sp +Use \fB1\fR for yes and \fB0\fR for no (default). +.RE + +.sp +.ne 2 +.na +\fBzfs_nocacheflush\fR (int) +.ad +.RS 12n +Disable cache flush operations on disks when writing. Beware, this may cause +corruption if disks re-order writes. +.sp +Use \fB1\fR for yes and \fB0\fR for no (default). +.RE + +.sp +.ne 2 +.na +\fBzfs_nopwrite_enabled\fR (int) +.ad +.RS 12n +Enable NOP writes +.sp +Use \fB1\fR for yes (default) and \fB0\fR to disable. +.RE + +.sp +.ne 2 +.na +\fBzfs_dmu_offset_next_sync\fR (int) +.ad +.RS 12n +Enable forcing txg sync to find holes. When enabled forces ZFS to act +like prior versions when SEEK_HOLE or SEEK_DATA flags are used, which +when a dnode is dirty causes txg's to be synced so that this data can be +found. +.sp +Use \fB1\fR for yes and \fB0\fR to disable (default). +.RE + +.sp +.ne 2 +.na +\fBzfs_pd_bytes_max\fR (int) +.ad +.RS 12n +The number of bytes which should be prefetched during a pool traversal +(eg: \fBzfs send\fR or other data crawling operations) +.sp +Default value: \fB52,428,800\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_per_txg_dirty_frees_percent \fR (ulong) +.ad +.RS 12n +Tunable to control percentage of dirtied blocks from frees in one TXG. +After this threshold is crossed, additional dirty blocks from frees +wait until the next TXG. +A value of zero will disable this throttle. +.sp +Default value: \fB30\fR and \fB0\fR to disable. +.RE + + + +.sp +.ne 2 +.na +\fBzfs_prefetch_disable\fR (int) +.ad +.RS 12n +This tunable disables predictive prefetch. Note that it leaves "prescient" +prefetch (e.g. prefetch for zfs send) intact. Unlike predictive prefetch, +prescient prefetch never issues i/os that end up not being needed, so it +can't hurt performance. +.sp +Use \fB1\fR for yes and \fB0\fR for no (default). +.RE + +.sp +.ne 2 +.na +\fBzfs_read_chunk_size\fR (long) +.ad +.RS 12n +Bytes to read per chunk +.sp +Default value: \fB1,048,576\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_read_history\fR (int) +.ad +.RS 12n +Historical statistics for the last N reads will be available in +\fB/proc/spl/kstat/zfs/<pool>/reads\fR +.sp +Default value: \fB0\fR (no data is kept). +.RE + +.sp +.ne 2 +.na +\fBzfs_read_history_hits\fR (int) +.ad +.RS 12n +Include cache hits in read history +.sp +Use \fB1\fR for yes and \fB0\fR for no (default). +.RE + +.sp +.ne 2 +.na +\fBzfs_reconstruct_indirect_combinations_max\fR (int) +.ad +.RS 12na +If an indirect split block contains more than this many possible unique +combinations when being reconstructed, consider it too computationally +expensive to check them all. Instead, try at most +\fBzfs_reconstruct_indirect_combinations_max\fR randomly-selected +combinations each time the block is accessed. This allows all segment +copies to participate fairly in the reconstruction when all combinations +cannot be checked and prevents repeated use of one bad copy. +.sp +Default value: \fB100\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_recover\fR (int) +.ad +.RS 12n +Set to attempt to recover from fatal errors. This should only be used as a +last resort, as it typically results in leaked space, or worse. +.sp +Use \fB1\fR for yes and \fB0\fR for no (default). +.RE + +.sp +.ne 2 +.na +\fBzfs_resilver_min_time_ms\fR (int) +.ad +.RS 12n +Resilvers are processed by the sync thread. While resilvering it will spend +at least this much time working on a resilver between txg flushes. +.sp +Default value: \fB3,000\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_scan_ignore_errors\fR (int) +.ad +.RS 12n +If set to a nonzero value, remove the DTL (dirty time list) upon +completion of a pool scan (scrub) even if there were unrepairable +errors. It is intended to be used during pool repair or recovery to +stop resilvering when the pool is next imported. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_scrub_min_time_ms\fR (int) +.ad +.RS 12n +Scrubs are processed by the sync thread. While scrubbing it will spend +at least this much time working on a scrub between txg flushes. +.sp +Default value: \fB1,000\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_scan_checkpoint_intval\fR (int) +.ad +.RS 12n +To preserve progress across reboots the sequential scan algorithm periodically +needs to stop metadata scanning and issue all the verifications I/Os to disk. +The frequency of this flushing is determined by the +\fBfBzfs_scan_checkpoint_intval\fR tunable. +.sp +Default value: \fB7200\fR seconds (every 2 hours). +.RE + +.sp +.ne 2 +.na +\fBzfs_scan_fill_weight\fR (int) +.ad +.RS 12n +This tunable affects how scrub and resilver I/O segments are ordered. A higher +number indicates that we care more about how filled in a segment is, while a +lower number indicates we care more about the size of the extent without +considering the gaps within a segment. This value is only tunable upon module +insertion. Changing the value afterwards will have no affect on scrub or +resilver performance. +.sp +Default value: \fB3\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_scan_issue_strategy\fR (int) +.ad +.RS 12n +Determines the order that data will be verified while scrubbing or resilvering. +If set to \fB1\fR, data will be verified as sequentially as possible, given the +amount of memory reserved for scrubbing (see \fBzfs_scan_mem_lim_fact\fR). This +may improve scrub performance if the pool's data is very fragmented. If set to +\fB2\fR, the largest mostly-contiguous chunk of found data will be verified +first. By deferring scrubbing of small segments, we may later find adjacent data +to coalesce and increase the segment size. If set to \fB0\fR, zfs will use +strategy \fB1\fR during normal verification and strategy \fB2\fR while taking a +checkpoint. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_scan_legacy\fR (int) +.ad +.RS 12n +A value of 0 indicates that scrubs and resilvers will gather metadata in +memory before issuing sequential I/O. A value of 1 indicates that the legacy +algorithm will be used where I/O is initiated as soon as it is discovered. +Changing this value to 0 will not affect scrubs or resilvers that are already +in progress. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_scan_max_ext_gap\fR (int) +.ad +.RS 12n +Indicates the largest gap in bytes between scrub / resilver I/Os that will still +be considered sequential for sorting purposes. Changing this value will not +affect scrubs or resilvers that are already in progress. +.sp +Default value: \fB2097152 (2 MB)\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_scan_mem_lim_fact\fR (int) +.ad +.RS 12n +Maximum fraction of RAM used for I/O sorting by sequential scan algorithm. +This tunable determines the hard limit for I/O sorting memory usage. +When the hard limit is reached we stop scanning metadata and start issuing +data verification I/O. This is done until we get below the soft limit. +.sp +Default value: \fB20\fR which is 5% of RAM (1/20). +.RE + +.sp +.ne 2 +.na +\fBzfs_scan_mem_lim_soft_fact\fR (int) +.ad +.RS 12n +The fraction of the hard limit used to determined the soft limit for I/O sorting +by the sequential scan algorithm. When we cross this limit from bellow no action +is taken. When we cross this limit from above it is because we are issuing +verification I/O. In this case (unless the metadata scan is done) we stop +issuing verification I/O and start scanning metadata again until we get to the +hard limit. +.sp +Default value: \fB20\fR which is 5% of the hard limit (1/20). +.RE + +.sp +.ne 2 +.na +\fBzfs_scan_vdev_limit\fR (int) +.ad +.RS 12n +Maximum amount of data that can be concurrently issued at once for scrubs and +resilvers per leaf device, given in bytes. +.sp +Default value: \fB41943040\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_send_corrupt_data\fR (int) +.ad +.RS 12n +Allow sending of corrupt data (ignore read/checksum errors when sending data) +.sp +Use \fB1\fR for yes and \fB0\fR for no (default). +.RE + +.sp +.ne 2 +.na +\fBzfs_send_queue_length\fR (int) +.ad +.RS 12n +The maximum number of bytes allowed in the \fBzfs send\fR queue. This value +must be at least twice the maximum block size in use. +.sp +Default value: \fB16,777,216\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_recv_queue_length\fR (int) +.ad +.RS 12n +.sp +The maximum number of bytes allowed in the \fBzfs receive\fR queue. This value +must be at least twice the maximum block size in use. +.sp +Default value: \fB16,777,216\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_sync_pass_deferred_free\fR (int) +.ad +.RS 12n +Flushing of data to disk is done in passes. Defer frees starting in this pass +.sp +Default value: \fB2\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_sync_pass_dont_compress\fR (int) +.ad +.RS 12n +Don't compress starting in this pass +.sp +Default value: \fB5\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_sync_pass_rewrite\fR (int) +.ad +.RS 12n +Rewrite new block pointers starting in this pass +.sp +Default value: \fB2\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_sync_taskq_batch_pct\fR (int) +.ad +.RS 12n +This controls the number of threads used by the dp_sync_taskq. The default +value of 75% will create a maximum of one thread per cpu. +.sp +Default value: \fB75\fR%. +.RE + +.sp +.ne 2 +.na +\fBzfs_txg_history\fR (int) +.ad +.RS 12n +Historical statistics for the last N txgs will be available in +\fB/proc/spl/kstat/zfs/<pool>/txgs\fR +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_txg_timeout\fR (int) +.ad +.RS 12n +Flush dirty data to disk at least every N seconds (maximum txg duration) +.sp +Default value: \fB5\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_aggregation_limit\fR (int) +.ad +.RS 12n +Max vdev I/O aggregation size +.sp +Default value: \fB131,072\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_cache_bshift\fR (int) +.ad +.RS 12n +Shift size to inflate reads too +.sp +Default value: \fB16\fR (effectively 65536). +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_cache_max\fR (int) +.ad +.RS 12n +Inflate reads smaller than this value to meet the \fBzfs_vdev_cache_bshift\fR +size (default 64k). +.sp +Default value: \fB16384\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_cache_size\fR (int) +.ad +.RS 12n +Total size of the per-disk cache in bytes. +.sp +Currently this feature is disabled as it has been found to not be helpful +for performance and in some cases harmful. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_mirror_rotating_inc\fR (int) +.ad +.RS 12n +A number by which the balancing algorithm increments the load calculation for +the purpose of selecting the least busy mirror member when an I/O immediately +follows its predecessor on rotational vdevs for the purpose of making decisions +based on load. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_mirror_rotating_seek_inc\fR (int) +.ad +.RS 12n +A number by which the balancing algorithm increments the load calculation for +the purpose of selecting the least busy mirror member when an I/O lacks +locality as defined by the zfs_vdev_mirror_rotating_seek_offset. I/Os within +this that are not immediately following the previous I/O are incremented by +half. +.sp +Default value: \fB5\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_mirror_rotating_seek_offset\fR (int) +.ad +.RS 12n +The maximum distance for the last queued I/O in which the balancing algorithm +considers an I/O to have locality. +See the section "ZFS I/O SCHEDULER". +.sp +Default value: \fB1048576\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_mirror_non_rotating_inc\fR (int) +.ad +.RS 12n +A number by which the balancing algorithm increments the load calculation for +the purpose of selecting the least busy mirror member on non-rotational vdevs +when I/Os do not immediately follow one another. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_mirror_non_rotating_seek_inc\fR (int) +.ad +.RS 12n +A number by which the balancing algorithm increments the load calculation for +the purpose of selecting the least busy mirror member when an I/O lacks +locality as defined by the zfs_vdev_mirror_rotating_seek_offset. I/Os within +this that are not immediately following the previous I/O are incremented by +half. +.sp +Default value: \fB1\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_read_gap_limit\fR (int) +.ad +.RS 12n +Aggregate read I/O operations if the gap on-disk between them is within this +threshold. +.sp +Default value: \fB32,768\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_scheduler\fR (charp) +.ad +.RS 12n +Set the Linux I/O scheduler on whole disk vdevs to this scheduler. Valid options +are noop, cfq, bfq & deadline +.sp +Default value: \fBnoop\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_write_gap_limit\fR (int) +.ad +.RS 12n +Aggregate write I/O over gap +.sp +Default value: \fB4,096\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_vdev_raidz_impl\fR (string) +.ad +.RS 12n +Parameter for selecting raidz parity implementation to use. + +Options marked (always) below may be selected on module load as they are +supported on all systems. +The remaining options may only be set after the module is loaded, as they +are available only if the implementations are compiled in and supported +on the running system. + +Once the module is loaded, the content of +/sys/module/zfs/parameters/zfs_vdev_raidz_impl will show available options +with the currently selected one enclosed in []. +Possible options are: + fastest - (always) implementation selected using built-in benchmark + original - (always) original raidz implementation + scalar - (always) scalar raidz implementation + sse2 - implementation using SSE2 instruction set (64bit x86 only) + ssse3 - implementation using SSSE3 instruction set (64bit x86 only) + avx2 - implementation using AVX2 instruction set (64bit x86 only) + avx512f - implementation using AVX512F instruction set (64bit x86 only) + avx512bw - implementation using AVX512F & AVX512BW instruction sets (64bit x86 only) + aarch64_neon - implementation using NEON (Aarch64/64 bit ARMv8 only) + aarch64_neonx2 - implementation using NEON with more unrolling (Aarch64/64 bit ARMv8 only) +.sp +Default value: \fBfastest\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_zevent_cols\fR (int) +.ad +.RS 12n +When zevents are logged to the console use this as the word wrap width. +.sp +Default value: \fB80\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_zevent_console\fR (int) +.ad +.RS 12n +Log events to the console +.sp +Use \fB1\fR for yes and \fB0\fR for no (default). +.RE + +.sp +.ne 2 +.na +\fBzfs_zevent_len_max\fR (int) +.ad +.RS 12n +Max event queue length. A value of 0 will result in a calculated value which +increases with the number of CPUs in the system (minimum 64 events). Events +in the queue can be viewed with the \fBzpool events\fR command. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_zil_clean_taskq_maxalloc\fR (int) +.ad +.RS 12n +The maximum number of taskq entries that are allowed to be cached. When this +limit is exceeded transaction records (itxs) will be cleaned synchronously. +.sp +Default value: \fB1048576\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_zil_clean_taskq_minalloc\fR (int) +.ad +.RS 12n +The number of taskq entries that are pre-populated when the taskq is first +created and are immediately available for use. +.sp +Default value: \fB1024\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_zil_clean_taskq_nthr_pct\fR (int) +.ad +.RS 12n +This controls the number of threads used by the dp_zil_clean_taskq. The default +value of 100% will create a maximum of one thread per cpu. +.sp +Default value: \fB100\fR%. +.RE + +.sp +.ne 2 +.na +\fBzil_replay_disable\fR (int) +.ad +.RS 12n +Disable intent logging replay. Can be disabled for recovery from corrupted +ZIL +.sp +Use \fB1\fR for yes and \fB0\fR for no (default). +.RE + +.sp +.ne 2 +.na +\fBzil_slog_bulk\fR (ulong) +.ad +.RS 12n +Limit SLOG write size per commit executed with synchronous priority. +Any writes above that will be executed with lower (asynchronous) priority +to limit potential SLOG device abuse by single active ZIL writer. +.sp +Default value: \fB786,432\fR. +.RE + +.sp +.ne 2 +.na +\fBzio_delay_max\fR (int) +.ad +.RS 12n +A zevent will be logged if a ZIO operation takes more than N milliseconds to +complete. Note that this is only a logging facility, not a timeout on +operations. +.sp +Default value: \fB30,000\fR. +.RE + +.sp +.ne 2 +.na +\fBzio_dva_throttle_enabled\fR (int) +.ad +.RS 12n +Throttle block allocations in the ZIO pipeline. This allows for +dynamic allocation distribution when devices are imbalanced. +When enabled, the maximum number of pending allocations per top-level vdev +is limited by \fBzfs_vdev_queue_depth_pct\fR. +.sp +Default value: \fB1\fR. +.RE + +.sp +.ne 2 +.na +\fBzio_requeue_io_start_cut_in_line\fR (int) +.ad +.RS 12n +Prioritize requeued I/O +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzio_taskq_batch_pct\fR (uint) +.ad +.RS 12n +Percentage of online CPUs (or CPU cores, etc) which will run a worker thread +for IO. These workers are responsible for IO work such as compression and +checksum calculations. Fractional number of CPUs will be rounded down. +.sp +The default value of 75 was chosen to avoid using all CPUs which can result in +latency issues and inconsistent application performance, especially when high +compression is enabled. +.sp +Default value: \fB75\fR. +.RE + +.sp +.ne 2 +.na +\fBzvol_inhibit_dev\fR (uint) +.ad +.RS 12n +Do not create zvol device nodes. This may slightly improve startup time on +systems with a very large number of zvols. +.sp +Use \fB1\fR for yes and \fB0\fR for no (default). +.RE + +.sp +.ne 2 +.na +\fBzvol_major\fR (uint) +.ad +.RS 12n +Major number for zvol block devices +.sp +Default value: \fB230\fR. +.RE + +.sp +.ne 2 +.na +\fBzvol_max_discard_blocks\fR (ulong) +.ad +.RS 12n +Discard (aka TRIM) operations done on zvols will be done in batches of this +many blocks, where block size is determined by the \fBvolblocksize\fR property +of a zvol. +.sp +Default value: \fB16,384\fR. +.RE + +.sp +.ne 2 +.na +\fBzvol_prefetch_bytes\fR (uint) +.ad +.RS 12n +When adding a zvol to the system prefetch \fBzvol_prefetch_bytes\fR +from the start and end of the volume. Prefetching these regions +of the volume is desirable because they are likely to be accessed +immediately by \fBblkid(8)\fR or by the kernel scanning for a partition +table. +.sp +Default value: \fB131,072\fR. +.RE + +.sp +.ne 2 +.na +\fBzvol_request_sync\fR (uint) +.ad +.RS 12n +When processing I/O requests for a zvol submit them synchronously. This +effectively limits the queue depth to 1 for each I/O submitter. When set +to 0 requests are handled asynchronously by a thread pool. The number of +requests which can be handled concurrently is controller by \fBzvol_threads\fR. +.sp +Default value: \fB0\fR. +.RE + +.sp +.ne 2 +.na +\fBzvol_threads\fR (uint) +.ad +.RS 12n +Max number of threads which can handle zvol I/O requests concurrently. +.sp +Default value: \fB32\fR. +.RE + +.sp +.ne 2 +.na +\fBzvol_volmode\fR (uint) +.ad +.RS 12n +Defines zvol block devices behaviour when \fBvolmode\fR is set to \fBdefault\fR. +Valid values are \fB1\fR (full), \fB2\fR (dev) and \fB3\fR (none). +.sp +Default value: \fB1\fR. +.RE + +.sp +.ne 2 +.na +\fBzfs_qat_disable\fR (int) +.ad +.RS 12n +This tunable disables qat hardware acceleration for gzip compression and. +AES-GCM encryption. It is available only if qat acceleration is compiled in +and the qat driver is present. +.sp +Use \fB1\fR for yes and \fB0\fR for no (default). +.RE + +.SH ZFS I/O SCHEDULER +ZFS issues I/O operations to leaf vdevs to satisfy and complete I/Os. +The I/O scheduler determines when and in what order those operations are +issued. The I/O scheduler divides operations into five I/O classes +prioritized in the following order: sync read, sync write, async read, +async write, and scrub/resilver. Each queue defines the minimum and +maximum number of concurrent operations that may be issued to the +device. In addition, the device has an aggregate maximum, +\fBzfs_vdev_max_active\fR. Note that the sum of the per-queue minimums +must not exceed the aggregate maximum. If the sum of the per-queue +maximums exceeds the aggregate maximum, then the number of active I/Os +may reach \fBzfs_vdev_max_active\fR, in which case no further I/Os will +be issued regardless of whether all per-queue minimums have been met. +.sp +For many physical devices, throughput increases with the number of +concurrent operations, but latency typically suffers. Further, physical +devices typically have a limit at which more concurrent operations have no +effect on throughput or can actually cause it to decrease. +.sp +The scheduler selects the next operation to issue by first looking for an +I/O class whose minimum has not been satisfied. Once all are satisfied and +the aggregate maximum has not been hit, the scheduler looks for classes +whose maximum has not been satisfied. Iteration through the I/O classes is +done in the order specified above. No further operations are issued if the +aggregate maximum number of concurrent operations has been hit or if there +are no operations queued for an I/O class that has not hit its maximum. +Every time an I/O is queued or an operation completes, the I/O scheduler +looks for new operations to issue. +.sp +In general, smaller max_active's will lead to lower latency of synchronous +operations. Larger max_active's may lead to higher overall throughput, +depending on underlying storage. +.sp +The ratio of the queues' max_actives determines the balance of performance +between reads, writes, and scrubs. E.g., increasing +\fBzfs_vdev_scrub_max_active\fR will cause the scrub or resilver to complete +more quickly, but reads and writes to have higher latency and lower throughput. +.sp +All I/O classes have a fixed maximum number of outstanding operations +except for the async write class. Asynchronous writes represent the data +that is committed to stable storage during the syncing stage for +transaction groups. Transaction groups enter the syncing state +periodically so the number of queued async writes will quickly burst up +and then bleed down to zero. Rather than servicing them as quickly as +possible, the I/O scheduler changes the maximum number of active async +write I/Os according to the amount of dirty data in the pool. Since +both throughput and latency typically increase with the number of +concurrent operations issued to physical devices, reducing the +burstiness in the number of concurrent operations also stabilizes the +response time of operations from other -- and in particular synchronous +-- queues. In broad strokes, the I/O scheduler will issue more +concurrent operations from the async write queue as there's more dirty +data in the pool. +.sp +Async Writes +.sp +The number of concurrent operations issued for the async write I/O class +follows a piece-wise linear function defined by a few adjustable points. +.nf + + | o---------| <-- zfs_vdev_async_write_max_active + ^ | /^ | + | | / | | +active | / | | + I/O | / | | +count | / | | + | / | | + |-------o | | <-- zfs_vdev_async_write_min_active + 0|_______^______|_________| + 0% | | 100% of zfs_dirty_data_max + | | + | `-- zfs_vdev_async_write_active_max_dirty_percent + `--------- zfs_vdev_async_write_active_min_dirty_percent + +.fi +Until the amount of dirty data exceeds a minimum percentage of the dirty +data allowed in the pool, the I/O scheduler will limit the number of +concurrent operations to the minimum. As that threshold is crossed, the +number of concurrent operations issued increases linearly to the maximum at +the specified maximum percentage of the dirty data allowed in the pool. +.sp +Ideally, the amount of dirty data on a busy pool will stay in the sloped +part of the function between \fBzfs_vdev_async_write_active_min_dirty_percent\fR +and \fBzfs_vdev_async_write_active_max_dirty_percent\fR. If it exceeds the +maximum percentage, this indicates that the rate of incoming data is +greater than the rate that the backend storage can handle. In this case, we +must further throttle incoming writes, as described in the next section. + +.SH ZFS TRANSACTION DELAY +We delay transactions when we've determined that the backend storage +isn't able to accommodate the rate of incoming writes. +.sp +If there is already a transaction waiting, we delay relative to when +that transaction will finish waiting. This way the calculated delay time +is independent of the number of threads concurrently executing +transactions. +.sp +If we are the only waiter, wait relative to when the transaction +started, rather than the current time. This credits the transaction for +"time already served", e.g. reading indirect blocks. +.sp +The minimum time for a transaction to take is calculated as: +.nf + min_time = zfs_delay_scale * (dirty - min) / (max - dirty) + min_time is then capped at 100 milliseconds. +.fi +.sp +The delay has two degrees of freedom that can be adjusted via tunables. The +percentage of dirty data at which we start to delay is defined by +\fBzfs_delay_min_dirty_percent\fR. This should typically be at or above +\fBzfs_vdev_async_write_active_max_dirty_percent\fR so that we only start to +delay after writing at full speed has failed to keep up with the incoming write +rate. The scale of the curve is defined by \fBzfs_delay_scale\fR. Roughly speaking, +this variable determines the amount of delay at the midpoint of the curve. +.sp +.nf +delay + 10ms +-------------------------------------------------------------*+ + | *| + 9ms + *+ + | *| + 8ms + *+ + | * | + 7ms + * + + | * | + 6ms + * + + | * | + 5ms + * + + | * | + 4ms + * + + | * | + 3ms + * + + | * | + 2ms + (midpoint) * + + | | ** | + 1ms + v *** + + | zfs_delay_scale ----------> ******** | + 0 +-------------------------------------*********----------------+ + 0% <- zfs_dirty_data_max -> 100% +.fi +.sp +Note that since the delay is added to the outstanding time remaining on the +most recent transaction, the delay is effectively the inverse of IOPS. +Here the midpoint of 500us translates to 2000 IOPS. The shape of the curve +was chosen such that small changes in the amount of accumulated dirty data +in the first 3/4 of the curve yield relatively small differences in the +amount of delay. +.sp +The effects can be easier to understand when the amount of delay is +represented on a log scale: +.sp +.nf +delay +100ms +-------------------------------------------------------------++ + + + + | | + + *+ + 10ms + *+ + + ** + + | (midpoint) ** | + + | ** + + 1ms + v **** + + + zfs_delay_scale ----------> ***** + + | **** | + + **** + +100us + ** + + + * + + | * | + + * + + 10us + * + + + + + | | + + + + +--------------------------------------------------------------+ + 0% <- zfs_dirty_data_max -> 100% +.fi +.sp +Note here that only as the amount of dirty data approaches its limit does +the delay start to increase rapidly. The goal of a properly tuned system +should be to keep the amount of dirty data out of that range by first +ensuring that the appropriate limits are set for the I/O scheduler to reach +optimal throughput on the backend storage, and then by changing the value +of \fBzfs_delay_scale\fR to increase the steepness of the curve. diff --git a/man/man5/zpool-features.5 b/man/man5/zpool-features.5 new file mode 100644 index 000000000..ce34a05a2 --- /dev/null +++ b/man/man5/zpool-features.5 @@ -0,0 +1,720 @@ +'\" te +.\" Copyright (c) 2013, 2016 by Delphix. All rights reserved. +.\" Copyright (c) 2013 by Saso Kiselkov. All rights reserved. +.\" Copyright (c) 2014, Joyent, Inc. All rights reserved. +.\" The contents of this file are subject to the terms of the Common Development +.\" and Distribution License (the "License"). You may not use this file except +.\" in compliance with the License. You can obtain a copy of the license at +.\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing. +.\" +.\" See the License for the specific language governing permissions and +.\" limitations under the License. When distributing Covered Code, include this +.\" CDDL HEADER in each file and include the License file at +.\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this +.\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your +.\" own identifying information: +.\" Portions Copyright [yyyy] [name of copyright owner] +.TH ZPOOL-FEATURES 5 "Aug 27, 2013" +.SH NAME +zpool\-features \- ZFS pool feature descriptions +.SH DESCRIPTION +.sp +.LP +ZFS pool on\-disk format versions are specified via "features" which replace +the old on\-disk format numbers (the last supported on\-disk format number is +28). To enable a feature on a pool use the \fBupgrade\fR subcommand of the +\fBzpool\fR(8) command, or set the \fBfeature@\fR\fIfeature_name\fR property +to \fBenabled\fR. +.sp +.LP +The pool format does not affect file system version compatibility or the ability +to send file systems between pools. +.sp +.LP +Since most features can be enabled independently of each other the on\-disk +format of the pool is specified by the set of all features marked as +\fBactive\fR on the pool. If the pool was created by another software version +this set may include unsupported features. +.SS "Identifying features" +.sp +.LP +Every feature has a guid of the form \fIcom.example:feature_name\fR. The reverse +DNS name ensures that the feature's guid is unique across all ZFS +implementations. When unsupported features are encountered on a pool they will +be identified by their guids. Refer to the documentation for the ZFS +implementation that created the pool for information about those features. +.sp +.LP +Each supported feature also has a short name. By convention a feature's short +name is the portion of its guid which follows the ':' (e.g. +\fIcom.example:feature_name\fR would have the short name \fIfeature_name\fR), +however a feature's short name may differ across ZFS implementations if +following the convention would result in name conflicts. +.SS "Feature states" +.sp +.LP +Features can be in one of three states: +.sp +.ne 2 +.na +\fB\fBactive\fR\fR +.ad +.RS 12n +This feature's on\-disk format changes are in effect on the pool. Support for +this feature is required to import the pool in read\-write mode. If this +feature is not read-only compatible, support is also required to import the pool +in read\-only mode (see "Read\-only compatibility"). +.RE + +.sp +.ne 2 +.na +\fB\fBenabled\fR\fR +.ad +.RS 12n +An administrator has marked this feature as enabled on the pool, but the +feature's on\-disk format changes have not been made yet. The pool can still be +imported by software that does not support this feature, but changes may be made +to the on\-disk format at any time which will move the feature to the +\fBactive\fR state. Some features may support returning to the \fBenabled\fR +state after becoming \fBactive\fR. See feature\-specific documentation for +details. +.RE + +.sp +.ne 2 +.na +\fBdisabled\fR +.ad +.RS 12n +This feature's on\-disk format changes have not been made and will not be made +unless an administrator moves the feature to the \fBenabled\fR state. Features +cannot be disabled once they have been enabled. +.RE + +.sp +.LP +The state of supported features is exposed through pool properties of the form +\fIfeature@short_name\fR. +.SS "Read\-only compatibility" +.sp +.LP +Some features may make on\-disk format changes that do not interfere with other +software's ability to read from the pool. These features are referred to as +"read\-only compatible". If all unsupported features on a pool are read\-only +compatible, the pool can be imported in read\-only mode by setting the +\fBreadonly\fR property during import (see \fBzpool\fR(8) for details on +importing pools). +.SS "Unsupported features" +.sp +.LP +For each unsupported feature enabled on an imported pool a pool property +named \fIunsupported@feature_guid\fR will indicate why the import was allowed +despite the unsupported feature. Possible values for this property are: + +.sp +.ne 2 +.na +\fB\fBinactive\fR\fR +.ad +.RS 12n +The feature is in the \fBenabled\fR state and therefore the pool's on\-disk +format is still compatible with software that does not support this feature. +.RE + +.sp +.ne 2 +.na +\fB\fBreadonly\fR\fR +.ad +.RS 12n +The feature is read\-only compatible and the pool has been imported in +read\-only mode. +.RE + +.SS "Feature dependencies" +.sp +.LP +Some features depend on other features being enabled in order to function +properly. Enabling a feature will automatically enable any features it +depends on. +.SH FEATURES +.sp +.LP +The following features are supported on this system: +.sp +.ne 2 +.na +\fB\fBasync_destroy\fR\fR +.ad +.RS 4n +.TS +l l . +GUID com.delphix:async_destroy +READ\-ONLY COMPATIBLE yes +DEPENDENCIES none +.TE + +Destroying a file system requires traversing all of its data in order to +return its used space to the pool. Without \fBasync_destroy\fR the file system +is not fully removed until all space has been reclaimed. If the destroy +operation is interrupted by a reboot or power outage the next attempt to open +the pool will need to complete the destroy operation synchronously. + +When \fBasync_destroy\fR is enabled the file system's data will be reclaimed +by a background process, allowing the destroy operation to complete without +traversing the entire file system. The background process is able to resume +interrupted destroys after the pool has been opened, eliminating the need +to finish interrupted destroys as part of the open operation. The amount +of space remaining to be reclaimed by the background process is available +through the \fBfreeing\fR property. + +This feature is only \fBactive\fR while \fBfreeing\fR is non\-zero. +.RE + +.sp +.ne 2 +.na +\fB\fBempty_bpobj\fR\fR +.ad +.RS 4n +.TS +l l . +GUID com.delphix:empty_bpobj +READ\-ONLY COMPATIBLE yes +DEPENDENCIES none +.TE + +This feature increases the performance of creating and using a large +number of snapshots of a single filesystem or volume, and also reduces +the disk space required. + +When there are many snapshots, each snapshot uses many Block Pointer +Objects (bpobj's) to track blocks associated with that snapshot. +However, in common use cases, most of these bpobj's are empty. This +feature allows us to create each bpobj on-demand, thus eliminating the +empty bpobjs. + +This feature is \fBactive\fR while there are any filesystems, volumes, +or snapshots which were created after enabling this feature. +.RE + +.sp +.ne 2 +.na +\fB\fBfilesystem_limits\fR\fR +.ad +.RS 4n +.TS +l l . +GUID com.joyent:filesystem_limits +READ\-ONLY COMPATIBLE yes +DEPENDENCIES extensible_dataset +.TE + +This feature enables filesystem and snapshot limits. These limits can be used +to control how many filesystems and/or snapshots can be created at the point in +the tree on which the limits are set. + +This feature is \fBactive\fR once either of the limit properties has been +set on a dataset. Once activated the feature is never deactivated. +.RE + +.sp +.ne 2 +.na +\fB\fBlz4_compress\fR\fR +.ad +.RS 4n +.TS +l l . +GUID org.illumos:lz4_compress +READ\-ONLY COMPATIBLE no +DEPENDENCIES none +.TE + +\fBlz4\fR is a high-performance real-time compression algorithm that +features significantly faster compression and decompression as well as a +higher compression ratio than the older \fBlzjb\fR compression. +Typically, \fBlz4\fR compression is approximately 50% faster on +compressible data and 200% faster on incompressible data than +\fBlzjb\fR. It is also approximately 80% faster on decompression, while +giving approximately 10% better compression ratio. + +When the \fBlz4_compress\fR feature is set to \fBenabled\fR, the +administrator can turn on \fBlz4\fR compression on any dataset on the +pool using the \fBzfs\fR(8) command. Please note that doing so will +immediately activate the \fBlz4_compress\fR feature on the underlying +pool using the \fBzfs\fR(1M) command. Also, all newly written metadata +will be compressed with \fBlz4\fR algorithm. Since this feature is not +read-only compatible, this operation will render the pool unimportable +on systems without support for the \fBlz4_compress\fR feature. Booting +off of \fBlz4\fR-compressed root pools is supported. + +This feature becomes \fBactive\fR as soon as it is enabled and will +never return to being \fBenabled\fB. +.RE + +.sp +.ne 2 +.na +\fB\fBspacemap_histogram\fR\fR +.ad +.RS 4n +.TS +l l . +GUID com.delphix:spacemap_histogram +READ\-ONLY COMPATIBLE yes +DEPENDENCIES none +.TE + +This features allows ZFS to maintain more information about how free space +is organized within the pool. If this feature is \fBenabled\fR, ZFS will +set this feature to \fBactive\fR when a new space map object is created or +an existing space map is upgraded to the new format. Once the feature is +\fBactive\fR, it will remain in that state until the pool is destroyed. + +.RE + +.sp +.ne 2 +.na +\fB\fBmulti_vdev_crash_dump\fR\fR +.ad +.RS 4n +.TS +l l . +GUID com.joyent:multi_vdev_crash_dump +READ\-ONLY COMPATIBLE no +DEPENDENCIES none +.TE + +This feature allows a dump device to be configured with a pool comprised +of multiple vdevs. Those vdevs may be arranged in any mirrored or raidz +configuration. + +When the \fBmulti_vdev_crash_dump\fR feature is set to \fBenabled\fR, +the administrator can use the \fBdumpadm\fR(1M) command to configure a +dump device on a pool comprised of multiple vdevs. + +Under Linux this feature is registered for compatibility but not used. +New pools created under Linux will have the feature \fBenabled\fR but +will never transition to \fB\fBactive\fR. This functionality is not +required in order to support crash dumps under Linux. Existing pools +where this feature is \fB\fBactive\fR can be imported. +.RE + +.sp +.ne 2 +.na +\fB\fBextensible_dataset\fR\fR +.ad +.RS 4n +.TS +l l . +GUID com.delphix:extensible_dataset +READ\-ONLY COMPATIBLE no +DEPENDENCIES none +.TE + +This feature allows more flexible use of internal ZFS data structures, +and exists for other features to depend on. + +This feature will be \fBactive\fR when the first dependent feature uses it, +and will be returned to the \fBenabled\fR state when all datasets that use +this feature are destroyed. + +.RE + +.sp +.ne 2 +.na +\fB\fBbookmarks\fR\fR +.ad +.RS 4n +.TS +l l . +GUID com.delphix:bookmarks +READ\-ONLY COMPATIBLE yes +DEPENDENCIES extensible_dataset +.TE + +This feature enables use of the \fBzfs bookmark\fR subcommand. + +This feature is \fBactive\fR while any bookmarks exist in the pool. +All bookmarks in the pool can be listed by running +\fBzfs list -t bookmark -r \fIpoolname\fR\fR. + +.RE + +.sp +.ne 2 +.na +\fB\fBenabled_txg\fR\fR +.ad +.RS 4n +.TS +l l . +GUID com.delphix:enabled_txg +READ\-ONLY COMPATIBLE yes +DEPENDENCIES none +.TE + +Once this feature is enabled ZFS records the transaction group number +in which new features are enabled. This has no user-visible impact, +but other features may depend on this feature. + +This feature becomes \fBactive\fR as soon as it is enabled and will +never return to being \fBenabled\fB. + +.RE + +.sp +.ne 2 +.na +\fB\fBhole_birth\fR\fR +.ad +.RS 4n +.TS +l l . +GUID com.delphix:hole_birth +READ\-ONLY COMPATIBLE no +DEPENDENCIES enabled_txg +.TE + +This feature improves performance of incremental sends ("zfs send -i") +and receives for objects with many holes. The most common case of +hole-filled objects is zvols. + +An incremental send stream from snapshot \fBA\fR to snapshot \fBB\fR +contains information about every block that changed between \fBA\fR and +\fBB\fR. Blocks which did not change between those snapshots can be +identified and omitted from the stream using a piece of metadata called +the 'block birth time', but birth times are not recorded for holes (blocks +filled only with zeroes). Since holes created after \fBA\fR cannot be +distinguished from holes created before \fBA\fR, information about every +hole in the entire filesystem or zvol is included in the send stream. + +For workloads where holes are rare this is not a problem. However, when +incrementally replicating filesystems or zvols with many holes (for +example a zvol formatted with another filesystem) a lot of time will +be spent sending and receiving unnecessary information about holes that +already exist on the receiving side. + +Once the \fBhole_birth\fR feature has been enabled the block birth times +of all new holes will be recorded. Incremental sends between snapshots +created after this feature is enabled will use this new metadata to avoid +sending information about holes that already exist on the receiving side. + +This feature becomes \fBactive\fR as soon as it is enabled and will +never return to being \fBenabled\fB. + +.RE + +.sp +.ne 2 +.na +\fB\fBembedded_data\fR\fR +.ad +.RS 4n +.TS +l l . +GUID com.delphix:embedded_data +READ\-ONLY COMPATIBLE no +DEPENDENCIES none +.TE + +This feature improves the performance and compression ratio of +highly-compressible blocks. Blocks whose contents can compress to 112 bytes +or smaller can take advantage of this feature. + +When this feature is enabled, the contents of highly-compressible blocks are +stored in the block "pointer" itself (a misnomer in this case, as it contains +the compressed data, rather than a pointer to its location on disk). Thus +the space of the block (one sector, typically 512 bytes or 4KB) is saved, +and no additional i/o is needed to read and write the data block. + +This feature becomes \fBactive\fR as soon as it is enabled and will +never return to being \fBenabled\fR. + +.RE +.sp +.ne 2 +.na +\fB\fBdevice_removal\fR\fR +.ad +.RS 4n +.TS +l l . +GUID com.delphix:device_removal +READ\-ONLY COMPATIBLE no +DEPENDENCIES none +.TE + +This feature enables the "zpool remove" subcommand to remove top-level +vdevs, evacuating them to reduce the total size of the pool. + +This feature becomes \fBactive\fR when the "zpool remove" command is used +on a top-level vdev, and will never return to being \fBenabled\fR. + +.RE +.sp +.ne 2 +.na +\fB\fBobsolete_counts\fR\fR +.ad +.RS 4n +.TS +l l . +GUID com.delphix:obsolete_counts +READ\-ONLY COMPATIBLE yes +DEPENDENCIES device_removal +.TE + +This feature is an enhancement of device_removal, which will over time +reduce the memory used to track removed devices. When indirect blocks +are freed or remapped, we note that their part of the indirect mapping +is "obsolete", i.e. no longer needed. See also the \fBzfs remap\fR +subcommand in \fBzfs\fR(1M). + +This feature becomes \fBactive\fR when the "zpool remove" command is +used on a top-level vdev, and will never return to being \fBenabled\fR. + +.RE +.sp +.ne 2 +.na +\fB\fBlarge_blocks\fR\fR +.ad +.RS 4n +.TS +l l . +GUID org.open-zfs:large_block +READ\-ONLY COMPATIBLE no +DEPENDENCIES extensible_dataset +.TE + +The \fBlarge_block\fR feature allows the record size on a dataset to be +set larger than 128KB. + +This feature becomes \fBactive\fR once a dataset contains a file with +a block size larger than 128KB, and will return to being \fBenabled\fR once all +filesystems that have ever had their recordsize larger than 128KB are destroyed. +.RE + +.sp +.ne 2 +.na +\fB\fBlarge_dnode\fR\fR +.ad +.RS 4n +.TS +l l . +GUID org.zfsonlinux:large_dnode +READ\-ONLY COMPATIBLE no +DEPENDENCIES extensible_dataset +.TE + +The \fBlarge_dnode\fR feature allows the size of dnodes in a dataset to be +set larger than 512B. + +This feature becomes \fBactive\fR once a dataset contains an object with +a dnode larger than 512B, which occurs as a result of setting the +\fBdnodesize\fR dataset property to a value other than \fBlegacy\fR. The +feature will return to being \fBenabled\fR once all filesystems that +have ever contained a dnode larger than 512B are destroyed. Large dnodes +allow more data to be stored in the bonus buffer, thus potentially +improving performance by avoiding the use of spill blocks. +.RE + +\fB\fBsha512\fR\fR +.ad +.RS 4n +.TS +l l . +GUID org.illumos:sha512 +READ\-ONLY COMPATIBLE no +DEPENDENCIES extensible_dataset +.TE + +This feature enables the use of the SHA-512/256 truncated hash algorithm +(FIPS 180-4) for checksum and dedup. The native 64-bit arithmetic of +SHA-512 provides an approximate 50% performance boost over SHA-256 on +64-bit hardware and is thus a good minimum-change replacement candidate +for systems where hash performance is important, but these systems +cannot for whatever reason utilize the faster \fBskein\fR and +\fBedonr\fR algorithms. + +When the \fBsha512\fR feature is set to \fBenabled\fR, the administrator +can turn on the \fBsha512\fR checksum on any dataset using the +\fBzfs set checksum=sha512\fR(1M) command. This feature becomes +\fBactive\fR once a \fBchecksum\fR property has been set to \fBsha512\fR, +and will return to being \fBenabled\fR once all filesystems that have +ever had their checksum set to \fBsha512\fR are destroyed. + +Booting off of pools utilizing SHA-512/256 is supported (provided that +the updated GRUB stage2 module is installed). + +.RE + +.sp +.ne 2 +.na +\fB\fBskein\fR\fR +.ad +.RS 4n +.TS +l l . +GUID org.illumos:skein +READ\-ONLY COMPATIBLE no +DEPENDENCIES extensible_dataset +.TE + +This feature enables the use of the Skein hash algorithm for checksum +and dedup. Skein is a high-performance secure hash algorithm that was a +finalist in the NIST SHA-3 competition. It provides a very high security +margin and high performance on 64-bit hardware (80% faster than +SHA-256). This implementation also utilizes the new salted checksumming +functionality in ZFS, which means that the checksum is pre-seeded with a +secret 256-bit random key (stored on the pool) before being fed the data +block to be checksummed. Thus the produced checksums are unique to a +given pool, preventing hash collision attacks on systems with dedup. + +When the \fBskein\fR feature is set to \fBenabled\fR, the administrator +can turn on the \fBskein\fR checksum on any dataset using the +\fBzfs set checksum=skein\fR(1M) command. This feature becomes +\fBactive\fR once a \fBchecksum\fR property has been set to \fBskein\fR, +and will return to being \fBenabled\fR once all filesystems that have +ever had their checksum set to \fBskein\fR are destroyed. + +Booting off of pools using \fBskein\fR is \fBNOT\fR supported +-- any attempt to enable \fBskein\fR on a root pool will fail with an +error. + +.RE + +.sp +.ne 2 +.na +\fB\fBedonr\fR\fR +.ad +.RS 4n +.TS +l l . +GUID org.illumos:edonr +READ\-ONLY COMPATIBLE no +DEPENDENCIES extensible_dataset +.TE + +This feature enables the use of the Edon-R hash algorithm for checksum, +including for nopwrite (if compression is also enabled, an overwrite of +a block whose checksum matches the data being written will be ignored). +In an abundance of caution, Edon-R can not be used with dedup +(without verification). + +Edon-R is a very high-performance hash algorithm that was part +of the NIST SHA-3 competition. It provides extremely high hash +performance (over 350% faster than SHA-256), but was not selected +because of its unsuitability as a general purpose secure hash algorithm. +This implementation utilizes the new salted checksumming functionality +in ZFS, which means that the checksum is pre-seeded with a secret +256-bit random key (stored on the pool) before being fed the data block +to be checksummed. Thus the produced checksums are unique to a given +pool. + +When the \fBedonr\fR feature is set to \fBenabled\fR, the administrator +can turn on the \fBedonr\fR checksum on any dataset using the +\fBzfs set checksum=edonr\fR(1M) command. This feature becomes +\fBactive\fR once a \fBchecksum\fR property has been set to \fBedonr\fR, +and will return to being \fBenabled\fR once all filesystems that have +ever had their checksum set to \fBedonr\fR are destroyed. + +Booting off of pools using \fBedonr\fR is \fBNOT\fR supported +-- any attempt to enable \fBedonr\fR on a root pool will fail with an +error. + +.RE + +.sp +.ne 2 +.na +\fB\fBuserobj_accounting\fR\fR +.ad +.RS 4n +.TS +l l . +GUID org.zfsonlinux:userobj_accounting +READ\-ONLY COMPATIBLE yes +DEPENDENCIES extensible_dataset +.TE + +This feature allows administrators to account the object usage information +by user and group. + +This feature becomes \fBactive\fR as soon as it is enabled and will never +return to being \fBenabled\fR. Each filesystem will be upgraded automatically +when remounted, or when new files are created under that filesystem. +The upgrade can also be started manually on filesystems by running +`zfs set version=current <pool/fs>`. The upgrade process runs in the background +and may take a while to complete for filesystems containing a large number of +files. + +.RE + +.sp +.ne 2 +.na +\fB\fBencryption\fR\fR +.ad +.RS 4n +.TS +l l . +GUID com.datto:encryption +READ\-ONLY COMPATIBLE no +DEPENDENCIES extensible_dataset +.TE + +This feature enables the creation and management of natively encrypted datasets. + +This feature becomes \fBactive\fR when an encrypted dataset is created and will +be returned to the \fBenabled\fR state when all datasets that use this feature +are destroyed. + +.RE + +.sp +.ne 2 +.na +\fB\fBproject_quota\fR\fR +.ad +.RS 4n +.TS +l l . +GUID org.zfsonlinux:project_quota +READ\-ONLY COMPATIBLE yes +DEPENDENCIES extensible_dataset +.TE + +This feature allows administrators to account the spaces and objects usage +information against the project identifier (ID). + +The project ID is new object-based attribute. When upgrading an existing +filesystem, object without project ID attribute will be assigned a zero +project ID. After this feature is enabled, newly created object will inherit +its parent directory's project ID if the parent inherit flag is set (via +\fBchattr +/-P\fR or \fBzfs project [-s|-C]\fR). Otherwise, the new object's +project ID will be set as zero. An object's project ID can be changed at +anytime by the owner (or privileged user) via \fBchattr -p $prjid\fR or +\fBzfs project -p $prjid\fR. + +This feature will become \fBactive\fR as soon as it is enabled and will never +return to being \fBdisabled\fR. Each filesystem will be upgraded automatically +when remounted or when new file is created under that filesystem. The upgrade +can also be triggered on filesystems via `zfs set version=current <pool/fs>`. +The upgrade process runs in the background and may take a while to complete +for the filesystems containing a large number of files. + +.RE + +.SH "SEE ALSO" +\fBzpool\fR(8) |