| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implements the RAID-Z function using AltiVec SIMD.
This is basically the NEON code translated to AltiVec.
Note that the 'fletcher' algorithm requires 64-bits
operations, and the initial implementations of AltiVec
(PPC74xx a.k.a. G4, PPC970 a.k.a. G5) only has up to
32-bits operations, so no 'fletcher'.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Romain Dolbeau <[email protected]>
Closes #9539
|
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Signed-off-by: Christian Schwarz <[email protected]>
Closes #9867
|
|
|
|
|
|
|
| |
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Signed-off-by: Christian Schwarz <[email protected]>
Closes #9867
|
|
|
|
|
|
|
|
|
|
|
|
| |
As an alternative to the dataset name, zdb now allows the decimal
or hexadecimal objset ID to be specified. When permanent errors
are reported as 2 hexadecimal numbers (objset ID : object ID) in
zpool status; you can now use 'zdb <pool>[/objset ID] object' to
determine the names of the objset and object which have the error.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Paul Zuchowski <[email protected]>
Closes #9733
|
|
|
|
|
|
|
|
|
|
|
| |
Update the project website links contained in to repository to
reference the secure https://zfsonlinux.org address.
Reviewed-By: Richard Laager <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Garrett Fields <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #9837
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds the --saved (-S) to the 'zfs send' command.
This flag allows a user to send a partially received dataset,
which can be useful when migrating a backup server to new
hardware. This flag is compatible with resumable receives, so
even if the saved send is interrupted, it can be resumed.
The flag does not require any user / kernel ABI changes or any
new feature flags in the send stream format.
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Alek Pinchuk <[email protected]>
Reviewed-by: Paul Zuchowski <[email protected]>
Reviewed-by: Christian Schwarz <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Closes #9007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For dedup, special and log devices "zpool add -n" does not print
correctly their vdev type:
~# zpool add -n pool dedup /tmp/dedup special /tmp/special log /tmp/log
would update 'pool' to the following configuration:
pool
/tmp/normal
/tmp/dedup
/tmp/special
/tmp/log
This could lead storage administrators to modify their ZFS pools to
unexpected and unintended vdev configurations.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #9783
Closes #9390
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Include checksums in the output of 'zdb -dddddd' along
with other indirect block information already displayed.
Example output follows (with long lines trimmed):
$ zdb -dddddd tank/fish 128
Dataset tank/fish [ZPL], ID 259, cr_txg 10, 16.2M, 93 objects, rootbp DV
Object lvl iblk dblk dsize dnsize lsize %full type
128 2 128K 128K 634K 512 1M 100.00 ZFS plain f
168 bonus System attri
dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED
dnode maxblkid: 7
path /c
uid 0
gid 0
atime Sat Dec 21 10:49:26 2019
mtime Sat Dec 21 10:49:26 2019
ctime Sat Dec 21 10:49:26 2019
crtime Sat Dec 21 10:49:26 2019
gen 41
mode 100755
size 964592
parent 34
links 1
pflags 40800000104
Indirect blocks:
0 L1 0:2c0000:400 0:c021e00:400 20000L/400P F=8 B=41/41
0 L0 0:227800:13800 20000L/13800P F=1 B=41/41 cksum=167a
20000 L0 0:25ec00:17c00 20000L/17c00P F=1 B=41/41 cksum=2312
40000 L0 0:276800:18400 20000L/18400P F=1 B=41/41 cksum=24e0
60000 L0 0:2a7800:18800 20000L/18800P F=1 B=41/41 cksum=25be
80000 L0 0:28ec00:18c00 20000L/18c00P F=1 B=41/41 cksum=2579
a0000 L0 0:24d000:11c00 20000L/11c00P F=1 B=41/41 cksum=140a
c0000 L0 0:23b000:12000 20000L/12000P F=1 B=41/41 cksum=164e
e0000 L0 0:221e00:5a00 20000L/5a00P F=1 B=41/41 cksum=9de790
segment [0000000000000000, 0000000000100000) size 1M
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ned Bass <[email protected]>
Closes #9765
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the ZFS_COLOR env variable is set, then use ANSI color
output in zpool status:
- Column headers are bold
- Degraded or offline pools/vdevs are yellow
- Non-zero error counters and faulted vdevs/pools are red
- The 'status:' and 'action:' sections are yellow if they're
displaying a warning.
This also includes a new 'faketty' function in libtest.shlib that is
compatible with FreeBSD (code provided by @freqlabs).
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #9340
|
|
|
|
|
|
|
|
|
|
|
|
| |
As indicated by the VERIFY the local who_perm variable can never
be NULL in parse_fs_perm(). Due to the existence of the is_set
conditional, which is always true, cppcheck 1.88 was reporting
a possible NULL reference. Resolve the issue by removing the
extraneous is_set variable.
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #9732
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the 'zfs jail/unjail' subcommands along with the relevant
documentation from FreeBSD. This feature is not supported on
Linux and still requires the match kernel ioctls which will
be included when the FreeBSD platform code is integrated.
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #9686
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This interferes with zdb_read_block trying all the decompression
algorithms when the 'd' flag is specified, as some are
expected to fail. Also control the output when guessing
algorithms, try the more common compression types first, allow
specifying lsize/psize, and fix an uninitialized variable.
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Paul Zuchowski <[email protected]>
Closes #9612
Closes #9630
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the ASSERTV macro and handle suppressing unused
compiler warnings for variables only in ASSERTs using the
__attribute__((unused)) compiler annotation. The annotation
is understood by both gcc and clang.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Jorgen Lundman <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #9671
|
|
|
|
|
|
|
|
|
| |
arc_summary3 reports L2ARC hits and misses as Bytes, whereas they
should be reported as events. arc_summary2 reports these correctly.
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes #9669
|
|
|
|
|
|
|
|
|
|
|
|
| |
The checksum display code of zdb_read_block uses a zio
to read in the block and then calls zio_checksum_compute.
Use a new zio in the call to zio_checksum_compute not the zio
from the read which has been destroyed by zio_wait.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Signed-off-by: Paul Zuchowski <[email protected]>
Closes #9644
Closes #9657
|
|
|
|
|
|
|
|
| |
Adding the FreeBSD code allows arc_summary and arcstat
to be used on FreeBSD.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #9641
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The function zdb_read_block (zdb -R) was always intended to have a :c
flag which would read the DVA and length supplied by the user, and
display the checksum. Since we don't know which checksum goes with
the data, we should calculate and display them all.
For each checksum in the table, read in the data at the supplied
DVA:length, calculate the checksum, and display it. Update the man
page and create a zfs test for the new feature.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Signed-off-by: Paul Zuchowski <[email protected]>
Closes #9607
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Provide a common zfs_file_* interface which can be implemented on all
platforms to perform normal file access from either the kernel module
or the libzpool library.
This allows all non-portable vnode_t usage in the common code to be
replaced by the new portable zfs_file_t. The associated vnode and
kobj compatibility functions, types, and macros have been removed
from the SPL. Moving forward, vnodes should only be used in platform
specific code when provided by the native operating system.
Reviewed-by: Sean Eric Fagan <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Jorgen Lundman <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #9556
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
df58307 removed the need to specify -d 1 when zfs list and zfs get are
called with -t snapshot on a datset. This commit extends the same
behaviour to -t bookmark.
This commit also introduces the 'snap' shorthand for snapshots from
zfs list to zfs get.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Signed-off-by: InsanePrawn <[email protected]>
Closes #9589
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use sys.platform to choose the correct implementation of functions and
values of variables for the platform being run on.
Reword some comments to avoid describing implementation details in the
wrong places.
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Jorgen Lundman <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #9561
|
|
|
|
|
|
|
|
|
|
| |
zvol_wait waits for zvol links to be created under /dev/zvol for each zvol.
Links are not created for redacted zvols so we should ignore those.
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Pavel Zakharov <[email protected]>
Closes #9545
|
|
|
|
|
|
|
|
|
|
| |
A struct rangelock already exists on FreeBSD. Add a zfs_ prefix as
per our convention to prevent any conflict with existing symbols.
This change is a follow up to 2cc479d0.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #9534
|
|
|
|
|
|
|
|
| |
The custom ECKSUM errno is defined as appropriate by the
platform specific os/linux/spl/sys/errno.h header.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #9537
|
|
|
|
|
|
|
|
|
|
| |
FreeBSD has no analog. Buffered block devices were removed a decade
plus ago.
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #9508
|
|
|
|
|
|
|
|
|
|
|
|
| |
In original implementation, zpool history will read the whole history
before printing anything, causing memory usage goes unbounded. We fix
this by breaking it into read-print iterations.
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #9516
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consistently use the `zfs_ioctl()` wrapper since `ioctl()` cannot be
called directly due to differing semantics between platforms.
Follow up PR to #9492.
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #9507
|
|
|
|
|
|
|
|
| |
We don't need to include stdio_ext.h
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #9483
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Not all versions of sed have the --in-place flag. Detect support for
the flag during ./configure and provide a fallback mechanism for those
systems where sed's behavior differs. The autoconf variable
${ac_inplace} can be used to choose the correct flags for editing a
file in place with sed.
Replace violating usages in Makefile.am with ${ac_inplace}.
Reviewed-by: Chris Dunlop <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #9463
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the FreeBSD kernel the strdup signature is:
```
char *strdup(const char *__restrict, struct malloc_type *);
```
It's unfortunate that the developers have chosen to change
the signature of libc functions - but it's what I have to
deal with.
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #9433
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements a new tree structure for ZFS, and uses it to
store range trees more efficiently.
The new structure is approximately a B-tree, though there are some
small differences from the usual characterizations. The tree has core
nodes and leaf nodes; each contain data elements, which the elements
in the core nodes acting as separators between its children. The
difference between core and leaf nodes is that the core nodes have an
array of children, while leaf nodes don't. Every node in the tree may
be only partially full; in most cases, they are all at least 50% full
(in terms of element count) except for the root node, which can be
less full. Underfull nodes will steal from their neighbors or merge to
remain full enough, while overfull nodes will split in two. The data
elements are contained in tree-controlled buffers; they are copied
into these on insertion, and overwritten on deletion. This means that
the elements are not independently allocated, which reduces overhead,
but also means they can't be shared between trees (and also that
pointers to them are only valid until a side-effectful tree operation
occurs). The overhead varies based on how dense the tree is, but is
usually on the order of about 50% of the element size; the per-node
overheads are very small, and so don't make a significant difference.
The trees can accept arbitrary records; they accept a size and a
comparator to allow them to be used for a variety of purposes.
The new trees replace the AVL trees used in the range trees today.
Currently, the range_seg_t structure contains three 8 byte integers
of payload and two 24 byte avl_tree_node_ts to handle its storage in
both an offset-sorted tree and a size-sorted tree (total size: 64
bytes). In the new model, the range seg structures are usually two 4
byte integers, but a separate one needs to exist for the size-sorted
and offset-sorted tree. Between the raw size, the 50% overhead, and
the double storage, the new btrees are expected to use 8*1.5*2 = 24
bytes per record, or 33.3% as much memory as the AVL trees (this is
for the purposes of storing metaslab range trees; for other purposes,
like scrubs, they use ~50% as much memory).
We reduced the size of the payload in the range segments by teaching
range trees about starting offsets and shifts; since metaslabs have a
fixed starting offset, and they all operate in terms of disk sectors,
we can store the ranges using 4-byte integers as long as the size of
the metaslab divided by the sector size is less than 2^32. For 512-byte
sectors, this is a 2^41 (or 2TB) metaslab, which with the default
settings corresponds to a 256PB disk. 4k sector disks can handle
metaslabs up to 2^46 bytes, or 2^63 byte disks. Since we do not
anticipate disks of this size in the near future, there should be
almost no cases where metaslabs need 64-byte integers to store their
ranges. We do still have the capability to store 64-byte integer ranges
to account for cases where we are storing per-vdev (or per-dnode) trees,
which could reasonably go above the limits discussed. We also do not
store fill information in the compact version of the node, since it
is only used for sorted scrub.
We also optimized the metaslab loading process in various other ways
to offset some inefficiencies in the btree model. While individual
operations (find, insert, remove_from) are faster for the btree than
they are for the avl tree, remove usually requires a find operation,
while in the AVL tree model the element itself suffices. Some clever
changes actually caused an overall speedup in metaslab loading; we use
approximately 40% less cpu to load metaslabs in our tests on Illumos.
Another memory and performance optimization was achieved by changing
what is stored in the size-sorted trees. When a disk is heavily
fragmented, the df algorithm used by default in ZFS will almost always
find a number of small regions in its initial cursor-based search; it
will usually only fall back to the size-sorted tree to find larger
regions. If we increase the size of the cursor-based search slightly,
and don't store segments that are smaller than a tunable size floor
in the size-sorted tree, we can further cut memory usage down to
below 20% of what the AVL trees store. This also results in further
reductions in CPU time spent loading metaslabs.
The 16KiB size floor was chosen because it results in substantial memory
usage reduction while not usually resulting in situations where we can't
find an appropriate chunk with the cursor and are forced to use an
oversized chunk from the size-sorted tree. In addition, even if we do
have to use an oversized chunk from the size-sorted tree, the chunk
would be too small to use for ZIL allocations, so it isn't as big of a
loss as it might otherwise be. And often, more small allocations will
follow the initial one, and the cursor search will now find the
remainder of the chunk we didn't use all of and use it for subsequent
allocations. Practical testing has shown little or no change in
fragmentation as a result of this change.
If the size-sorted tree becomes empty while the offset sorted one still
has entries, it will load all the entries from the offset sorted tree
and disregard the size floor until it is unloaded again. This operation
occurs rarely with the default setting, only on incredibly thoroughly
fragmented pools.
There are some other small changes to zdb to teach it to handle btrees,
but nothing major.
Reviewed-by: George Wilson <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed by: Sebastien Roy [email protected]
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Closes #9181
|
|
|
|
|
|
|
|
| |
Factor Linux specific functionality out of libzutil.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #9356
|
|
|
|
|
|
|
|
|
| |
Factor Linux specific pieces out of libspl.
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Sean Eric Fagan <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #9336
|
|
|
|
|
|
|
|
|
|
|
| |
Factor Linux specific functions out of the zpool command.
Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Sean Eric Fagan <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: loli10K <[email protected]>
Signed-off-by: Matt Macy <[email protected]>
Closes #9333
|
|
|
|
|
|
|
|
|
|
| |
Allow ZED notification via slack incoming webhook.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Richard Elling <[email protected]>
Signed-off-by: Ben McGough <[email protected]>
Closes #9076
Closes #9350
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The difference between the sizes could be positive or negative. Leaving
the types as unsigned means the result overflows when the difference is
negative and removing the labs() means we'll have introduced a bug. The
subtraction results in the correct value when the unsigned integer is
interpreted as a signed integer by labs().
Clang doesn't see that we're doing a subtraction and abusing the types.
It sees the result of the subtraction, an unsigned value, being passed
to an absolute value function and emits a warning which we treat as an
error.
Reviewed by: Youzhong Yang <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #9355
|
|
|
|
|
|
|
|
|
| |
Move the trailing newlines from the error message strings to the format
strings to more closely match the other error messages.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #9330
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the best way to wait for the completion of a long-running
operation in a pool, like a scrub or device removal, is to poll 'zpool
status' and parse its output, which is neither efficient nor convenient.
This change adds a 'wait' subcommand to the zpool command. When invoked,
'zpool wait' will block until a specified type of background activity
completes. Currently, this subcommand can wait for any of the following:
- Scrubs or resilvers to complete
- Devices to initialized
- Devices to be replaced
- Devices to be removed
- Checkpoints to be discarded
- Background freeing to complete
For example, a scrub that is in progress could be waited for by running
zpool wait -t scrub <pool>
This also adds a -w flag to the attach, checkpoint, initialize, replace,
remove, and scrub subcommands. When used, this flag makes the operations
kicked off by these subcommands synchronous instead of asynchronous.
This functionality is implemented using a new ioctl. The type of
activity to wait for is provided as input to the ioctl, and the ioctl
blocks until all activity of that type has completed. An ioctl was used
over other methods of kernel-userspace communiction primarily for the
sake of portability.
Porting Notes:
This is ported from Delphix OS change DLPX-44432. The following changes
were made while porting:
- Added ZoL-style ioctl input declaration.
- Reorganized error handling in zpool_initialize in libzfs to integrate
better with changes made for TRIM support.
- Fixed check for whether a checkpoint discard is in progress.
Previously it also waited if the pool had a checkpoint, instead of
just if a checkpoint was being discarded.
- Exposed zfs_initialize_chunk_size as a ZoL-style tunable.
- Updated more existing tests to make use of new 'zpool wait'
functionality, tests that don't exist in Delphix OS.
- Used existing ZoL tunable zfs_scan_suspend_progress, together with
zinject, in place of a new tunable zfs_scan_max_blks_per_txg.
- Added support for a non-integral interval argument to zpool wait.
Future work:
ZoL has support for trimming devices, which Delphix OS does not. In the
future, 'zpool wait' could be extended to add the ability to wait for
trim operations to complete.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: John Gallagher <[email protected]>
Closes #9162
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
/usr/bin/env python3 is the suggested[1] shebang for Python in general
(likewise for python2) and is conventional across platforms. This eases
development on systems where python is not installed in /usr/bin
(FreeBSD for example) and makes it possible to develop in virtual
environments (venv) for isolating dependencies.
Many packaging guidelines discourage the use of /usr/bin/env, but since
this is the canonical way of writing shebangs in the Python community,
many packaging scripts are already equipped to handle substituting the
appropriate absolute path to python automatically.
Some RPM package builders lacking brp-mangle-shebangs need a small
fallback mechanism in the package spec to stamp the appropriate shebang
on installed Python scripts.
[1]: https://docs.python.org/3/using/unix.html?#miscellaneous
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #9314
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Code for interfacing with procfs for kstats and tunables is Linux-
specific. A more generic interface can be used for the abstractions of
loading kstats and various tunable parameters, allowing other platforms
to implement the functions cleanly. In a similar vein, determining the
ZFS/SPL version can be abstracted away in order for other platforms to
provide their own implementations of this function.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Macy <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #9279
|
|
|
|
|
|
|
|
|
|
|
|
| |
A more generic interface can be used for the abstraction of loading
kstats, allowing other platforms to implement the function cleanly.
In a similar vein, loading tunables can be abstracted away in order for
other platforms to provide their own implementations of this function.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Macy <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes #9277
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Both 'detach' and 'online' zpool subcommands, when provided with an
unsupported option, forget to print it in the error message:
# zpool online -t rpool vda3
invalid option ''
usage:
online [-e] <pool> <device> ...
This changes fixes the error message in order to include the actual
option that is not supported.
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #9270
|
|
|
|
|
|
|
|
|
| |
Partially received zvols won't have links in /dev/zvol.
Reviewed-by: Sebastien Roy <[email protected]>
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Pavel Zakharov <[email protected]>
Closes #9260
|
|
|
|
|
|
|
| |
Reviewed-by: Ryan Moeller <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Andrea Gelmini <[email protected]>
Closes #9234
|
|
|
|
|
|
|
|
|
| |
Automake can perform program name transformations at install time.
However, arc_summary has its own name transformation taking place,
which interferes with the automake transforms. The automake transforms
must be taken into account in order to resolve the conflict.
Signed-off-by: Ryan Moeller <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a pool is imported it will scan the pool to verify the integrity
of the data and metadata. The amount it scans will depend on the
import flags provided. On systems with small amounts of memory or
when importing a pool from the crash kernel, it's possible for
spa_load_verify to issue too many I/Os that it consumes all the memory
of the system resulting in an OOM message or a hang.
To prevent this, we limit the amount of memory that the initial pool
scan can consume. This change will, by default, use 1/16th of the ARC
for scan I/Os to prevent running the system out of memory during import.
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Signed-off-by: George Wilson [email protected]
External-issue: DLPX-65237
External-issue: DLPX-65238
Closes #9146
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we unload metaslabs today in ZFS, the cached max_size value is
discarded. We instead use the histogram to determine whether or not we
think we can satisfy an allocation from the metaslab. This can result in
situations where, if we're doing I/Os of a size not aligned to a
histogram bucket, a metaslab is loaded even though it cannot satisfy the
allocation we think it can. For example, a metaslab with 16 entries in
the 16k-32k bucket may have entirely 16kB entries. If we try to allocate
a 24kB buffer, we will load that metaslab because we think it should be
able to handle the allocation. Doing so is expensive in CPU time, disk
reads, and average IO latency. This is exacerbated if the write being
attempted is a sync write.
This change makes ZFS cache the max_size after the metaslab is
unloaded. If we ever get a free (or a coalesced group of frees) larger
than the max_size, we will update it. Otherwise, we leave it as is. When
attempting to allocate, we use the max_size as a lower bound, and
respect it unless we are in try_hard. However, we do age the max_size
out at some point, since we expect the actual max_size to increase as we
do more frees. A more sophisticated algorithm here might be helpful, but
this works reasonably well.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Closes #9055
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zed core dumps due to a NULL pointer in zfs_agent_iter_vdev(). The
gs_devid is NULL, but the nvl has a "devid" entry.
zfs_agent_post_event() checks that ZFS_EV_VDEV_GUID or DEV_IDENTIFIER is
present in nvl, but then later it and zfs_agent_iter_vdev() assume that
DEV_IDENTIFIER is present and thus gs_devid is set.
Typically this is not a problem because usually either all vdevs have
devid's, or none of them do. Since zfs_agent_iter_vdev() first checks if
the vdev has devid before dereferencing gs_devid, the problem isn't
typically encountered. However, if some vdevs have devid's and some do
not, then the problem is easily reproduced. This can happen if the pool
has been moved from a system that has devid's to one that does not.
The fix is for zfs_agent_iter_vdev() to only try to match the devid's if
both nvl and gsp have devid's present.
Reviewed-by: Prashanth Sreenivasa <[email protected]>
Reviewed-by: Don Brady <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: loli10K <[email protected]>
Signed-off-by: Matthew Ahrens <[email protected]>
External-issue: DLPX-65090
Closes #9054
Closes #9060
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Deleting a clone requires finding blocks are clone-only, not shared
with the snapshot. This was done by traversing the entire block tree
which results in a large performance penalty for sparsely
written clones.
This is new method keeps track of clone blocks when they are
modified in a "Livelist" so that, when it’s time to delete,
the clone-specific blocks are already at hand.
We see performance improvements because now deletion work is
proportional to the number of clone-modified blocks, not the size
of the original dataset.
Reviewed-by: Sean Eric Fagan <[email protected]>
Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Serapheim Dimitropoulos <[email protected]>
Signed-off-by: Sara Hartse <[email protected]>
Closes #8416
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the past we've seen multiple race conditions that have
to do with open-context threads async threads and concurrent
calls to spa_export()/spa_destroy() (including the one
referenced in issue #9015).
This patch ensures that only one thread can execute the
main body of spa_export_common() at a time, with subsequent
threads returning with a new error code created just for
this situation, eliminating this way any race condition
bugs introduced by concurrent calls to this function.
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes #9015
Closes #9044
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Creating a pool with not features enabled and running
`zdb -mmmmmm on` it before the patch:
```
Log Space Maps in Pool:
Log Space Map Obsolete Entry Statistics:
0 valid entries out of 0 - txg 0
0 valid entries out of 0 - total
```
After this patch the above output goes away.
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Sara Hartse <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes #9048
|