| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
Move the synchronization of inode/znode i_flgas/pflags into
the respective internal zfs function. This is mostly
mechanical work and shouldn't introduce any functional
changes.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Issue #227
Closes #5223
|
|
|
|
|
|
|
|
|
| |
coverity scan CID:150953,type: uninitialized scalar variable
coverity scan CID:147603,type: Resource leak
coverity scan CID:147610,type: Resource leak
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: luozhengzheng <[email protected]>
Closes #5209
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
extensible dataset
Authored by: ilovezfs <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Richard Laager <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Ported by: Tony Hutter <[email protected]>
In any pool without the extensible dataset feature flag already enabled,
creating a dataset with dedup set to use one of the new checksums would
result in the following panic as soon as any data was added:
panic[cpu0]/thread=ffffff0006761c40: feature_get_refcount(spa, feature,
&refcount) != 48 (0x30 != 0x30), file: ../../common/fs/zfs/zfeature.c
line 390
Inpsection showed that feature->fi_feature was 7, which is the value of
SPA_FEATURE_EXTENSIBLE_DATASET in the spa_feature enum. This commit
adds extensible dataset as a dependency for the sha512, edonr, and skein
feature flags, which prevents the panic.
OpenZFS-issue: https://www.illumos.org/issues/6585
OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/892586e8a147c02d7f4053cc405229a13e796928
Porting Notes:
This code was originally from Illumos, but I actually ported it from:
openzfsonosx/zfs@b62a652
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the dedup property value
Authored by: ilovezfs <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Richard Laager <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Ported-by: Tony Hutter <[email protected]>
zio_checksum_to_feature() expects a zio_checksum enum not a raw property
intval, so the new checksums weren't being detected when the
ZIO_CHECKSUM_VERIFY flag got in the way.
Given a pool without feature@sha512,
zfs create -o dedup=sha512 naughty/fivetwelve_noverify_ds
would fail as expected since the raw intval would indeed be equal to
SPA_FEATURE_SHA512.
However,
zfs create -o dedup=sha512,verify naughty/fivetwelve_verify_ds
would incorrectly succeed because ZIO_CHECKSUM_VERIFY would be in the
way, the raw intval would not be a member of the enum, and
zio_checksum_to_feature() would return SPA_FEATURE_NONE, with the result
that spa_feature_is_enabled() would never be called.
This was first detected with edonr, since in that case verify is
required.
This commit clears the ZIO_CHECKSUM_VERIFY flag before calling
zio_checksum_to_feature() using the ZIO_CHECKSUM_MASK and verifies in
zio_checksum_to_feature() that ZIO_CHECKSUM_MASK has been applied by the
caller to attempt to prevent the same bug from occurring again in the
future.
OpenZFS-issue: https://www.illumos.org/issues/6541
OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/971640e6aa954c91b0706543741aa4570299f4d7
Porting notes:
This code was originally from Illumos, but I actually ported it from:
openzfsonosx/zfs@bef06e1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewed by: George Wilson <[email protected]>
Reviewed by: Prakash Surya <[email protected]>
Reviewed by: Saso Kiselkov <[email protected]>
Reviewed by: Richard Lowe <[email protected]>
Approved by: Garrett D'Amore <[email protected]>
Ported by: Tony Hutter <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/4185
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/45818ee
Porting Notes:
This code is ported on top of the Illumos Crypto Framework code:
https://github.com/zfsonlinux/zfs/pull/4329/commits/b5e030c8dbb9cd393d313571dee4756fbba8c22d
The list of porting changes includes:
- Copied module/icp/include/sha2/sha2.h directly from illumos
- Removed from module/icp/algs/sha2/sha2.c:
#pragma inline(SHA256Init, SHA384Init, SHA512Init)
- Added 'ctx' to lib/libzfs/libzfs_sendrecv.c:zio_checksum_SHA256() since
it now takes in an extra parameter.
- Added CTASSERT() to assert.h from for module/zfs/edonr_zfs.c
- Added skein & edonr to libicp/Makefile.am
- Added sha512.S. It was generated from sha512-x86_64.pl in Illumos.
- Updated ztest.c with new fletcher_4_*() args; used NULL for new CTX argument.
- In icp/algs/edonr/edonr_byteorder.h, Removed the #if defined(__linux) section
to not #include the non-existant endian.h.
- In skein_test.c, renane NULL to 0 in "no test vector" array entries to get
around a compiler warning.
- Fixup test files:
- Rename <sys/varargs.h> -> <varargs.h>, <strings.h> -> <string.h>,
- Remove <note.h> and define NOTE() as NOP.
- Define u_longlong_t
- Rename "#!/usr/bin/ksh" -> "#!/bin/ksh -p"
- Rename NULL to 0 in "no test vector" array entries to get around a
compiler warning.
- Remove "for isa in $($ISAINFO); do" stuff
- Add/update Makefiles
- Add some userspace headers like stdio.h/stdlib.h in places of
sys/types.h.
- EXPORT_SYMBOL *_Init/*_Update/*_Final... routines in ICP modules.
- Update scripts/zfs2zol-patch.sed
- include <sys/sha2.h> in sha2_impl.h
- Add sha2.h to include/sys/Makefile.am
- Add skein and edonr dirs to icp Makefile
- Add new checksums to zpool_get.cfg
- Move checksum switch block from zfs_secpolicy_setprop() to
zfs_check_settable()
- Fix -Wuninitialized error in edonr_byteorder.h on PPC
- Fix stack frame size errors on ARM32
- Don't unroll loops in Skein on 32-bit to save stack space
- Add memory barriers in sha2.c on 32-bit to save stack space
- Add filetest_001_pos.ksh checksum sanity test
- Add option to write psudorandom data in file_write utility
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This re-use the framework established for SSE2, SSSE3 and
AVX2. However, GCC is using FP registers on Aarch64, so
unlike SSE/AVX2 we can't rely on the registers being left alone
between ASM statements. So instead, the NEON code uses
C variables and GCC extended ASM syntax. Note that since
the kernel explicitly disable vector registers, they
have to be locally re-enabled explicitly.
As we use the variable's number to define the symbolic
name, and GCC won't allow duplicate symbolic names,
numbers have to be unique. Even when the code is not
going to be used (e.g. the case for 4 registers when
using the macro with only 2). Only the actually used
variables should be declared, otherwise the build
will fails in debug mode.
This requires the replacement of the XOR(X,X) syntax
by a new ZERO(X) macro, which does the same thing but
without repeating the argument. And perhaps someday
there will be a machine where there is a more efficient
way to zero a register than XOR with itself. This affects
scalar, SSE2, SSSE3 and AVX2 as they need the new macro.
It's possible to write faster implementations (different
scheduling, different unrolling, interleaving NEON and
scalar, ...) for various cores, but this one has the
advantage of fitting in the current state of the code,
and thus is likely easier to review/check/merge.
The only difference between aarch64-neon and aarch64-neonx2
is that aarch64-neonx2 unroll some functions some more.
Reviewed-by: Gvozden Neskovic <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Romain Dolbeau <[email protected]>
Closes #4801
|
|
|
|
|
|
|
|
|
|
|
| |
coverity scan CID:147448,type: unchecked return value
coverity scan CID:147449,type: unchecked return value
coverity scan CID:147450,type: unchecked return value
coverity scan CID:147453,type: unchecked return value
coverity scan CID:147454,type: unchecked return value
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: luozhengzheng <[email protected]>
Closes #5206
|
|
|
|
|
|
|
|
|
|
| |
In the default case the function must return to avoid dereferencing
'prov_mech' which will be NULL.
Reviewed-by: Tom Caputi <[email protected]>
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: candychencan <[email protected]>
Closes #5134
|
|
|
|
|
|
|
|
|
| |
coverity scan CID:147563, Type:dereference null return value
coverity scan CID:147560, Type:dereference null return value
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5168
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
coverity scan CID:147531,type: Argument cannot be negative
- may copy data with negative size
coverity scan CID:147532,type: resource leaks
- may close a fd which is negative
coverity scan CID:147533,type: resource leaks
- may call pwrite64 with a negative size
coverity scan CID:147535,type: resource leaks
- may call fdopen with a negative fd
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: GeLiXin <[email protected]>
Closes #5176
|
|
|
|
|
|
|
|
|
|
|
| |
Cppcheck 1.63 erroneously complains about an uninitialized value
in buf_init(). Newer versions of cppcheck (1.72) handle this
correctly but we'll initialize the value anyway to silence the
warning.
Reviewed-by: Richard Elling <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #5203
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoid calculating (1<<64) if lh_prefix_len == 0. Semantics of the method remain
the same.
Assert (lh_prefix_len > 0) in zap_expand_leaf() to detect possibly the same
problem.
Issue #4883
Signed-off-by: Gvozden Neskovic <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Explicitly promote variables to correct type. Undefined behavior is
reported because length of int is not well defined by C standard.
Issue #4883
Signed-off-by: Gvozden Neskovic <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Undefined operation is reported by running ztest (or zloop) compiled with GCC
UndefinedBehaviorSanitizer. Error only happens on top level of dnode indirection
with large enough offset values. Logically, left shift operation would work,
but bit shift semantics in C, and limitation of uint64_t, do not produce desired
result.
Issue #5059, #4883
Signed-off-by: Gvozden Neskovic <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Without plugging, the default 'noop' scheduler will not merge
the BIOs which are part of a large ZIO.
Reviewed-by: Andreas Dilger <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Isaac Huang <[email protected]>
Closes #5181
|
|
|
|
|
|
|
|
|
|
| |
coverity scan CID:147658, Type:copy into fixed size buffer.
coverity scan CID:147652, Type:copy into fixed size buffer.
coverity scan CID:147651, Type:copy into fixed size buffer.
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5160
|
|
|
|
|
|
|
|
|
|
| |
Refactor the code in such a way so that inode->i_mode is being set
at the same time zp->z_mode is being changed. This has the effect of
keeping both in sync without relying on zfs_inode_update.
Reviewed-by: Richard Laager <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Closes #5158
|
|
|
|
|
|
|
|
|
|
| |
coverity scan CID:147650, Type:copy into fixed size buffer.
coverity scan CID:147649, Type:copy into fixed size buffer.
coverity scan CID:147647, Type:copy into fixed size buffer.
coverity scan CID:147646, Type:copy into fixed size buffer.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: cao.xuewen <[email protected]>
Closes #5161
|
|
|
|
|
|
|
|
|
| |
In arc_state_fini() the `arc_l2c_only->arcs_list[*]` multilists
must be destroyed. This accidentally regressed in d3c2ae1c.
Reviewed by: Tom Caputi <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #5151
Closes #5152
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We must not use d_add_ci if the dentry already has the real name. Otherwise,
d_add_ci()->d_alloc_parallel() will find itself on the lookup hash and wait
on itself causing deadlock.
Tested-by: satmandu
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #5124
Closes #5141
Closes #5147
Closes #5148
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
includes BEGIN and END records
Authored by: Matt Krantz <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Reviewed by: Igor Kozhukhov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Ported-by: kernelOfTruth <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/7230
OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/12b90ee2
Closes #5112
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
coverity scan CID:147633,type: sizeof not portable
coverity scan CID:147637,type: sizeof not portable
coverity scan CID:147638,type: sizeof not portable
coverity scan CID:147640,type: sizeof not portable
In these particular cases sizeof (XX **) happens to be equal to sizeof (X *),
but this is not a portable assumption.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: luozhengzheng <[email protected]>
Closes #5144
|
|
|
|
|
|
|
|
|
| |
dbuf_read_impl() returns (SET_ERROR(err)) when err can be 0, which adds
lots of noise in tracing logs.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Isaac Huang <[email protected]>
Closes #4430
Closes #5146
|
|
|
|
|
|
|
|
| |
coverity scan CID:147504 Type: Explicit null dereferenced
Reason: passing null pointer dl to zfs_dirent_unlock
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: BearBabyLiu <[email protected]>
Closes #5131
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The type of "adjustmnt" was erroneously changed to unsigned when the compressed
ARC code was ported in d3c2ae1c0806b183a315e3d43cc8018cfdca79b5.
As a result of it being unsigned, the balanced metadata eviction logic
would evict all of the non-metadata.
Reviewed-by: Chris Severance <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed by: David Quigley <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Closes #5128
Closes #5129
|
|
|
|
|
|
|
| |
CID 147659, 150952 and 147645
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: luozhengzheng <[email protected]>
Closes #5103
|
|
|
|
|
|
|
|
|
| |
Enable ignore_hole_birth by default until all known hole birth bugs
have been resolved and relevant test cases added.
Reviewed-by: Boris Protopopov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #4809
Closes #5099
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Simplify time handling in zfs_setattr by mimicking the logic in
setattr_copy from the linux kernel. In order to achieve this
in the case when ZFS' log is being replayed it is necessary
to unconditionally set the ctime in zfs_replay_setattr.
Also use the timespec_trunc function when assigning values to the
generic inode struct. This is currently a noop since zfs sets
s_time_gran to 1, however in the future rules about precision might
change.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Chunwei Chen <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Closes #4916
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ZFS doesn't provide a custom update_time method meaning it delegates
this job to the generic VFS layer. The only time when it needs to
set the various *time values is when the inode is being marshalled
to/from the disk. Do this by moving the relevant code from
zfs_inode_update_impl to zfs_node_alloc and zfs_rezget. As a result
from this change it is no longer necessary to have multiple versions
of the zfs_inode_update function - so just nuke them and leave only
one.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Chunwei Chen <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Issue #227
Closes #4916
|
|
|
|
|
|
|
|
| |
Authored by: Dan Kimmel <[email protected]>
Reviewed by: Tom Caputi <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Ported by: David Quigley <[email protected]>
Issue #5078
|
|
|
|
|
|
|
|
| |
Reviewed by: Dan Kimmel <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Reviewed by: David Quigley <[email protected]>
Signed-off-by: Tom Caputi <[email protected]>
Issue #5078
|
|
|
|
|
|
|
|
| |
Authored by: Dan Kimmel <[email protected]>
Reviewed by: Tom Caputi <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Ported by: David Quigley <[email protected]>
Issue #5078
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Authored by: George Wilson <[email protected]>
Reviewed by: Prakash Surya <[email protected]>
Reviewed by: Dan Kimmel <[email protected]>
Reviewed by: Matt Ahrens <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Reviewed by: Tom Caputi <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
Ported by: David Quigley <[email protected]>
This review covers the reading and writing of compressed arc headers, sharing
data between the arc_hdr_t and the arc_buf_t, and the implementation of a new
dbuf cache to keep frequently access data uncompressed.
I've added a new member to l1 arc hdr called b_pdata. The b_pdata always hangs
off the arc_buf_hdr_t (if an L1 hdr is in use) and points to the physical block
for that DVA. The physical block may or may not be compressed. If compressed
arc is enabled and the block on-disk is compressed, then the b_pdata will match
the block on-disk and remain compressed in memory. If the block on disk is not
compressed, then neither will the b_pdata. Lastly, if compressed arc is
disabled, then b_pdata will always be an uncompressed version of the on-disk
block.
Typically the arc will cache only the arc_buf_hdr_t and will aggressively evict
any arc_buf_t's that are no longer referenced. This means that the arc will
primarily have compressed blocks as the arc_buf_t's are considered overhead and
are always uncompressed. When a consumer reads a block we first look to see if
the arc_buf_hdr_t is cached. If the hdr is cached then we allocate a new
arc_buf_t and decompress the b_pdata contents into the arc_buf_t's b_data. If
the hdr already has a arc_buf_t, then we will allocate an additional arc_buf_t
and bcopy the uncompressed contents from the first arc_buf_t to the new one.
Writing to the compressed arc requires that we first discard the b_pdata since
the physical block is about to be rewritten. The new data contents will be
passed in via an arc_buf_t (uncompressed) and during the I/O pipeline stages we
will copy the physical block contents to a newly allocated b_pdata.
When an l2arc is inuse it will also take advantage of the b_pdata. Now the
l2arc will always write the contents of b_pdata to the l2arc. This means that
when compressed arc is enabled that the l2arc blocks are identical to those
stored in the main data pool. This provides a significant advantage since we
can leverage the bp's checksum when reading from the l2arc to determine if the
contents are valid. If the compressed arc is disabled, then we must first
transform the read block to look like the physical block in the main data pool
before comparing the checksum and determining it's valid.
OpenZFS-issue: https://www.illumos.org/issues/6950
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7fc10f0
Issue #5078
|
|
|
|
|
|
|
|
|
|
|
|
| |
Several assignments to arc_c had no effect because it is ultimately
initialized to arc_c_max.
This aligns ZoL better with the upstream code which removed these
assignments some time ago.
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Closes #5081
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In case sav->sav_config was NULL the body of the function
would skip the iteration of the l2 cache devices and will
just cleanup the old devices. However, this wasn't very obvious
since the null check was performed after the loop body and after
the old devices were cleaned. Refactor the code so that it's now
obvious when the iteration of the l2cache devices is skipped.
This fixes the following cppcheck warning:
[module/zfs/spa.c:1552]: (error) Possible null pointer dereference: newvdevs
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Closes #5087
|
|
|
|
|
|
|
|
|
|
|
| |
Since they're allocated with spa_strdup(), they should be freed with
spa_strfree() so the proper length buffer is freed.
Reviewed-by: Richard Yao <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Closes #5082
Closes #5086
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This first phase brings over the ZFS SLM module, zfs_mod.c, to handle
auto operations in response to disk events. Disk event monitoring is
provided from libudev and generates the expected payload schema for
zfs_mod. This work leverages the recently added devid and phys_path
strings in the vdev label.
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #4673
|
|
|
|
|
|
| |
Signed-off-by: luozhengzheng <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #5055
|
|
|
|
|
|
|
|
|
| |
These allocations can never fail. Leaving the error handling
code here gives the impression they can so it has been removed.
Signed-off-by: luozhengzheng <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #5048
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
perf: 2.75x faster ddt_entry_compare()
First 256bits of ddt_key_t is a block checksum, which are expected
to be close to random data. Hence, on average, comparison only needs to
look at first few bytes of the keys. To reduce number of conditional
jump instructions, the result is computed as: sign(memcmp(k1, k2)).
Sign of an integer 'a' can be obtained as: `(0 < a) - (a < 0)` := {-1, 0, 1} ,
which is computed efficiently. Synthetic performance evaluation of
original and new algorithm over 1G random keys on 2.6GHz Intel(R) Xeon(R)
CPU E5-2660 v3:
old 6.85789 s
new 2.49089 s
perf: 2.8x faster vdev_queue_offset_compare() and vdev_queue_timestamp_compare()
Compute the result directly instead of using conditionals
perf: zfs_range_compare()
Speedup between 1.1x - 2.5x, depending on compiler version and
optimization level.
perf: spa_error_entry_compare()
`bcmp()` is not suitable for comparator use. Use `memcmp()` instead.
perf: 2.8x faster metaslab_compare() and metaslab_rangesize_compare()
perf: 2.8x faster zil_bp_compare()
perf: 2.8x faster mze_compare()
perf: faster dbuf_compare()
perf: faster compares in spa_misc
perf: 2.8x faster layout_hash_compare()
perf: 2.8x faster space_reftree_compare()
perf: libzfs: faster avl tree comparators
perf: guid_compare()
perf: dsl_deadlist_compare()
perf: perm_set_compare()
perf: 2x faster range_tree_seg_compare()
perf: faster unique_compare()
perf: faster vdev_cache _compare()
perf: faster vdev_uberblock_compare()
perf: faster fuid _compare()
perf: faster zfs_znode_hold_compare()
Signed-off-by: Gvozden Neskovic <[email protected]>
Signed-off-by: Richard Elling <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #5033
|
|
|
|
|
|
|
|
|
|
|
|
| |
`zpool get guid,freeing,leaked` shows SOURCE as `default`, it should
be `-` as those props are not editable.
Changed code to not overwrite `src` for `ZPOOL_PROP_VERSION`, so it
stays `ZPROP_SRC_NONE`. Make src const to avoid future mistakes
Signed-off-by: Hajo Möller <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #4170
|
|
|
|
|
|
|
|
|
|
|
| |
zfsctl_snapdir_inactive is defined in zfs-0.6.3. In zfs-0.6.5.7
this is declaration remains even though the implementation was
removed in commit 278bee93. Removed fastreboot_disable_highpil
which is also unused.
Signed-off-by: caoxuewen [email protected]
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #5042
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From user perspective, I would expect that ZFS is always able
to remove files and directories even when the quota is exceeded.
Authored by: Simon Klinkert <[email protected]>
Reviewed by: Dan McDonald <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Ported-by: kernelOfTruth [email protected]
Signed-off-by: Brian Behlendorf <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/6940
OpenZFS-issue: https://www.illumos.org/issues/6334
OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/9918916
Closes #5044
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For quite some time I was thinking about possibility to prefetch
ZFS indirection tables while doing sequential reads or writes.
Recent changes in predictive prefetcher made that much easier to
do. My tests on zvol with 16KB block size on 5x striped and 2x
mirrored pool of 10 disks show almost double throughput on sequential
read, and almost tripple on sequential rewrite. While for read alike
effect can be received from increasing maximal prefetch distance
(though at higher memory cost), for rewrite there is no other
solution so far.
Authored by: Alexander Motin <[email protected]>
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Ported-by: kernelOfTruth [email protected]
Signed-off-by: Brian Behlendorf <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/6322
OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/cb92f413
Closes #5040
Porting notes:
- Change from upstream in module/zfs/dbuf.c in 'int dbuf_read' due
to commit 5f6d0b6 'Handle block pointers with a corrupt logical size'
- Difference from upstream in module/zfs/dmu_zfetch.c,
uint32_t zfetch_max_idistance -> unsigned int zfetch_max_idistance
- Variables have been initialized at the beginning of the function
(void dmu_zfetch) to resemble the order of occurrence and account
for C99, C11 mode errors.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In dbuf_dirty(), we need to grab the dn_struct_rwlock before looking at
the db_blkptr, to prevent it from being changed by syncing context.
Reviewed by: Prakash Surya <[email protected]>
Reviewed by: George Wilson <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/7086
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/98fa317
Closes #5039
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fix resolves warnings reported during compiling with different gcc
optimization levels in debug mode,
Test tools:
gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC)
Linux version: 2.6.32-573.18.1.el6.x86_64, Red Hat Enterprise Linux Server release 6.1 (Santiago)
List of warnings:
CFLAGS=-O1 ./configure --enable-debug ;make
../../module/icp/core/kcf_sched.c: In function ‘kcf_aop_done’:
../../module/icp/core/kcf_sched.c:499: error: ‘fg’ may be used uninitialized in this function
../../module/icp/core/kcf_sched.c:499: note: ‘fg’ was declared here
CFLAGS=-Os ./configure --enable-debug ; make
libzfs_dataset.c: In function ‘zfs_prop_set_list’:
libzfs_dataset.c:1575: error: ‘nvl_len’ may be used uninitialized in this function
Signed-off-by: GeLiXin <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #5022
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ARC will evict meta buffers that exceed the arc_meta_limit. Before a further
investigating on whether we should take special protection on meta buffers,
this tunable make arc_meta_limit adjustable for different workloads.
People can set zfs_arc_meta_limit_percent to any value while insmod zfs.ko,
so some range check is added to guarantee a suitable arc_meta_limit.
Suggested by Tim Chase, zfs_arc_dnode_limit is changed to a percent-style
tunable as well.
Signed-off-by: GeLiXin <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #4957
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As is the case with traverse_prefetch_thread(), the deep stacks caused
by traversal require disabling reclaim in the send traverse thread.
Also, do the same for receive_writer_thread() in which similar problems
have been observed.
Signed-off-by: Tim Chase <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #4912
Closes #4998
|
|
|
|
|
|
|
|
|
|
|
| |
API Change: Module parameter set/get methods take const parameter in
Grsecurity kernel v4.7.1
Signed-off-by: Gvozden Neskovic <[email protected]>
Signed-off-by: Jason Zaman <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #4997
Closes #5001
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using a benchmark which has 32 threads creating 2 million files in the
same directory, on a machine with 16 CPU cores, I observed poor
performance. I noticed that dmu_tx_hold_zap() was using about 30% of
all CPU, and doing dnode_hold() 7 times on the same object (the ZAP
object that is being held).
dmu_tx_hold_zap() keeps a hold on the dnode_t the entire time it is
running, in dmu_tx_hold_t:txh_dnode, so it would be nice to use the
dnode_t that we already have in hand, rather than repeatedly calling
dnode_hold(). To do this, we need to pass the dnode_t down through
all the intermediate calls that dmu_tx_hold_zap() makes, making these
routines take the dnode_t* rather than an objset_t* and a uint64_t
object number. In particular, the following routines will need to have
analogous *_by_dnode() variants created:
dmu_buf_hold_noread()
dmu_buf_hold()
zap_lookup()
zap_lookup_norm()
zap_count_write()
zap_lockdir()
zap_count_write()
This can improve performance on the benchmark described above by 100%,
from 30,000 file creations per second to 60,000. (This improvement is on
top of that provided by working around the object allocation issue. Peak
performance of ~90,000 creations per second was observed with 8 CPUs;
adding CPUs past that decreased performance due to lock contention.) The
CPU used by dmu_tx_hold_zap() was reduced by 88%, from 340 CPU-seconds
to 40 CPU-seconds.
Sponsored by: Intel Corp.
Signed-off-by: Matthew Ahrens <[email protected]>
Signed-off-by: Ned Bass <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/7004
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/109
Closes #4641
Closes #4972
|