summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Change functions which return literals to return `const char*`Tomohiro Kusumi2018-03-094-5/+5
| | | | | | | | | | | | | | get_format_prompt_string() and zpool_state_to_name() return a string literal which is read-only, thus they should return `const char*`. zpool_get_prop_string() returns a non-const string after successful nv-lookup, and returns a string literal otherwise. Since this function is designed to be used for read-only purpose, the return type should also be `const char*`. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tomohiro Kusumi <[email protected]> Closes #7285
* QAT support for AES-GCMTom Caputi2018-03-0910-232/+758
| | | | | | | | | | This patch adds support for acceleration of AES-GCM encryption with Intel Quick Assist Technology. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chengfeix Zhu <[email protected]> Signed-off-by: Weigang Li <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #7282
* zdb and inuse tests don't pass with real disksPaul Zuchowski2018-03-078-15/+59
| | | | | | | | | | Due to zpool create auto-partioning in Linux (i.e. sdb1), certain utilities need to use the parition (sdb1) while others use the whole disk name (sdb). Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Zuchowski <[email protected]> Closes #6939 Closes #7261
* Take user namespaces into account in policy checksWolfgang Bumiller2018-03-0717-7/+521
| | | | | | | | | | | | | | | | | | | | | | | | | Change file related checks to use user namespaces and make sure involved uids/gids are mappable in the current namespace. Note that checks without file ownership information will still not take user namespaces into account, as some of these should be handled via 'zfs allow' (otherwise root in a user namespace could issue commands such as `zpool export`). This also adds an initial user namespace regression test for the setgid bit loss, with a user_ns_exec helper usable in further tests. Additionally, configure checks for the required user namespace related features are added for: * ns_capable * kuid/kgid_has_mapping() * user_ns in cred_t Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Wolfgang Bumiller <[email protected]> Closes #6800 Closes #7270
* ZTS: fix send-c_stream_size_estimateBrian Behlendorf2018-03-071-0/+1
| | | | | | | | | | | | The test could fail when attempting to write to a newly created volume which was missing its device node. Resolve the issue by calling block_device_wait() which blocks until udev creates the needed entry. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7276 Closes #7277
* Fix dbufstats_001_posGiuseppe Di Natale2018-03-072-4/+28
| | | | | | | | | | | | | | Implement a new helper within_tolerance to test if a value is within range of a target. Because the dbufstats and dbufs kstat file are being read at slightly different times, it is possible for stats to be slightly off. Use within_tolerance to determine if the value is "close enough" to the target. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #7239 Closes #7266
* Allow to limit zed's syslog chattinessTony Hutter2018-03-069-7/+178
| | | | | | | | | | | | | | | | | | Some usage patterns like send/recv of replication streams can produce a large number of events. In such a case, the current all-syslog.sh zedlet will hold up to its name, and flood the logs with mostly redundant information. Two mitigate this situation, this changeset introduces to new variables ZED_SYSLOG_SUBCLASS_INCLUDE and ZED_SYSLOG_SUBCLASS_EXCLUDE to zed.rc that give more control over which event classes end up in the syslog. Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Signed-off-by: Daniel Kobras <[email protected]> Closes #6886 Closes #7260
* Record skipped MMP writes in multihost_historyOlaf Faaland2018-03-0611-50/+263
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Once per pass through the MMP thread's loop, the vdev tree is walked to find a suitable leaf to write the next MMP block to. If no such leaf is found, the thread sleeps for a while and resumes at the top of the loop. Add an entry to multihost_history when no leaf can be found, and record the reason in the error column. The error code for such entries is a bitfield, displayed in hex: 0x1 At least one vdev (interior or leaf) was not writeable. 0x2 At least one writeable leaf vdev was found, but it had a pending MMP write. timestamp = the time in seconds since the epoch when no leaf could be found originally. duration = the time (in ns) during which no MMP block was written for this reason. This does not include the preceeding inter-write period nor the following inter-write period. vdev_guid = the number of sequential cycles of the MMP thread looop when this occurred. Sample output, truncated to fit: For records of skipped MMP writes the right-most column, vdev_path, is reported as "-". id txg timestamp error duration mmp_delay vdev_guid ... 936 11 1520036441 0 146264 891422313 1740883117838 ... 937 11 1520036441 0 163956 888356657 7320395061548 ... 938 11 1520036442 0 130690 885314969 7320395061548 ... 939 11 1520036442 0 2001068577 882296582 1740883117838 ... 940 11 1520036443 0 161806 882296582 7320395061548 ... 941 11 1520036443 0x2 0 998020546 1 ... 942 11 1520036444 0 136585 998020546 7320395061548 ... 943 11 1520036444 0x2 0 998020257 1 ... 944 11 1520036445 5 2002662964 994160219 1740883117838 ... 945 11 1520036445 0x2 998073118 994160219 3 ... 946 11 1520036447 0 247136 994160219 7320395061548 ... Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #7212
* Detect long config lock acquisition in mmpOlaf Faaland2018-03-061-0/+6
| | | | | | | | | | | | | | If something holds the config lock as a writer for too long, MMP will fail to issue MMP writes in a timely manner. This will result either in the pool being suspended, or in an extreme case, in the pool not being protected. If the time to acquire the config lock exceeds 1/10 of the minimum zfs_multihost_interval, report it in the zfs debug log. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #7212
* Introduce a destroy_dataset helperGiuseppe Di Natale2018-03-064-24/+47
| | | | | | | | | | | | | Datasets can be busy when calling zfs destroy. Introduce a helper function to destroy datasets and use it to destroy datasets in zfs_allow_004_pos, zfs_promote_008_pos, and zfs_destroy_002_pos. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #7224 Closes #7246 Closes #7249 Closes #7267
* Misc fixes and cleanup for project quotaNasf-Fan2018-03-054-6/+9
| | | | | | | | | | | | | | | | | | | | | | | 1) The Coverity Scan reports some issues for the project quota patch, including: 1.1) zfs_prop_get_userquota() directly uses the const quota type value as the condition check by wrong. 1.2) dmu_objset_userquota_get_ids() may cause dnode::dn_newgid to be overwritten by dnode::dn->dn_oldprojid. 2) This patch fixes related issues. It also enhances the logic for zfs_project_item_alloc() to avoid buffer overflow. 3) Skip project quota ability check if does not change project quota related things (id or flag). Otherwise, it will cause chattr (for other non project quota flags) operation failed if project quota disabled. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Fan Yong <[email protected]> Closes #7251 Closes #7265
* Linux 4.16 compat: get_disk_and_module()Giuseppe Di Natale2018-03-054-1/+29
| | | | | | | | | | As of https://github.com/torvalds/linux/commit/fb6d47a, get_disk() is now get_disk_and_module(). Add a configure check to determine if we need to use get_disk_and_module(). Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #7264
* Change checksum & IO delay ratelimit valuesTony Hutter2018-03-046-15/+56
| | | | | | | | | | | | | | | | Change checksum & IO delay ratelimit thresholds from 5/sec to 20/sec. This allows zed to actually trigger if a bunch of these events arrive in a short period of time (zed has a threshold of 10 events in 10 sec). Previously, if you had, say, 100 checksum errors in 1 sec, it would get ratelimited to 5/sec which wouldn't trigger zed to fault the drive. Also, convert the checksum and IO delay thresholds to module params for easy testing. Reviewed-by: loli10K <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #7252
* Increment zil_itx_needcopy_bytes properlychrisrd2018-03-021-2/+1
| | | | | | | | | | | | | | | | | In zil_lwb_commit() with TX_WRITE, we copy the log write record (lrw) into the log write block (lwb) and send it off using zil_lwb_add_txg(). If we also have WR_NEED_COPY, we additionally copy the lwr's data into the lwb to be sent off. If the lwr + data doesn't fit into the lwb, we send the lrw and as much data as will fit (dnow bytes), then go back and do the same with the remaining data. Each time through this loop we're sending dnow data bytes. I.e. zil_itx_needcopy_bytes should be incremented by dnow. Reviewed-by: Richard Elling <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chris Dunlop <[email protected]> Closes #6988 Closes #7176
* ZTS: fix spurious failures in mv_fileschrisrd2018-03-021-14/+3
| | | | | | | | | | | | | | | The test could fail because of a race condition between the files being generated in the background and attempting to move the files. Wait for all file generation to complete before trying to move the files around. Also, clean up the waiting: the 'wait' command without arguments waits for all child pids. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chris Dunlop <[email protected]> Closes #7220 Closes #7242 Closes #7258
* Add ZFS perf test for dbuf cacheJohn Wren Kennedy2018-02-287-5/+118
| | | | | | | | | | | | | This change adds a test for sequential reads out of the dbuf cache. It's essentially a copy of sequential_reads_cached, using a smaller data set. The sequential read tests are renamed to differentiate them. Authored by: Dan Kimmel <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: John Wren Kennedy <[email protected]> Closes #7225
* Fix some typosJohn Eismeier2018-02-2816-30/+30
| | | | | | | Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: George Melikov <[email protected]> Signed-off-by: John Eismeier <[email protected]> Closes #7237
* Fix zpool(8) list example to match actual formatTomohiro Kusumi2018-02-281-10/+10
| | | | | | | | | | | | | a05dfd00 (Illumos 5147) has swapped FRAG and EXPANDSZ, so it's natural to modify these examples. # zpool list | head -1 NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT ^^^^^^^^^^^^^^^ Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tomohiro Kusumi <[email protected]> Closes #7244
* Add Python 3 rewrite of arc_summary.pyScot W. Stevenson2018-02-286-2/+918
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add new script arc_summary3.py as a complete rewrite of the arc_summary.py tool (see issue #6873) Add new options: -g/--graph - Display crude graphic representation of ARC status and quit -r/--raw - Print all available information as minimally formatted list (for grep) -s/--section - Print a single section. This replaces -p/--page, which is kept for backwards use but marked as depreciated Add new sections with information on ZIL and SPL. Notify user if sections L2ARC and VDEV are skipped instead of failing silently. Add warning that -p/--page option is depreciated. Developed for Python 3.5. Reviewed-by: Richard Laager <[email protected]> Reviewed-by: Richard Elling <[email protected]> Reviewed by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Scot W. Stevenson <[email protected]> Closes #6873 Closes #6892
* Add SMART self-test results to zpool status -cTony Hutter2018-02-277-18/+133
| | | | | | | | | | | | | Add in SMART self-test results to zpool status|iostat -c. This works for both SAS and SATA drives. Also, add plumbing to allow the 'smart' script to take smartctl output from a directory of output text files instead of running it against the vdevs. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #7178
* Raw DRR_OBJECT records must write raw dataTom Caputi2018-02-274-55/+72
| | | | | | | | | | | | | | | | | | | | | | b1d21733 made it possible for empty metadnode blocks to be compressed to a hole, fixing a bug that would cause invalid metadnode MACs when a send stream attempted to free objects and allowing the blocks to be reclaimed when they were no longer needed. However, this patch also introduced a race condition; if a txg sync occurred after a DRR_OBJECT_RANGE record was received but before any objects were added, the metadnode block would be compressed to a hole and lose all of its encryption parameters. This would cause subsequent DRR_OBJECT records to fail when they attempted to write their data into an unencrypted block. This patch defers the DRR_OBJECT_RANGE handling to receive_object() so that the encryption parameters are set with each object that is written into that block. Reviewed-by: Kash Pande <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #7215 Closes #7236
* Incorrect maximum DVA value in DDE_GET_NDVAS()Tim Chase2018-02-261-1/+1
| | | | | | | | | | The conditional was reversed which caused garbage values to be used when calculating dds_ref_dsize. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tom Caputi <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Signed-off-by: Tim Chase <[email protected]> Closes #7234
* Fix segfault in zfs_do_bookmark()LOLi2018-02-262-2/+31
| | | | | | | | | | | | | | When invoked with wrong parameters 'zfs bookmark' fails to gracefully validate user input and crashes. This is a regression accidentally introduced in 587e228; this commit adds additional tests to the ZFS Test Suite to exercise this codepath. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: KireinaHoro <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7228 Closes #7229
* ZTS: Fix zfs_share_* test case failuresBrian Behlendorf2018-02-241-1/+7
| | | | | | | | | | | | Prevent false positives when running the zfs_share_* test cases due to leftover stale /var/lib/nfs/etab entries. When starting the test group re-synchronize the /var/lib/nfs/etab file with /etc/exports. At this point in the testing there will be no additional `zfs share` entries to add. Reviewed by: George Melikov <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7226
* Shellcheck cleanup for initrd scriptsKash Pande2018-02-236-101/+98
| | | | | | | | | Reviewed-by: Brian Behlendorf <[email protected]> Co-authored-by: Kash Pande <[email protected]> Co-authored-by: Matthew Thode <[email protected]> Signed-off-by: Kash Pande <[email protected]> Signed-off-by: Matthew Thode <[email protected]> Closes #7214
* Enable booting from nested encrypted datasetsKash Pande2018-02-233-33/+82
| | | | | | | | | | | | - enable booting from nested encrypted datasets - fix plymouth boot splash passphrase entry - optimize unlock process Co-authored-by: Kash Pande <[email protected]> Co-authored-by: Matthew Thode <[email protected]> Signed-off-by: Kash Pande <[email protected]> Signed-off-by: Matthew Thode <[email protected]> Closes #7214
* Add scrub after resilver zed scriptTony Hutter2018-02-2316-21/+187
| | | | | | | | | | | | | | | | | | | | * Add a zed script to kick off a scrub after a resilver. The script is disabled by default. * Add a optional $PATH (-P) option to zed to allow it to use a custom $PATH for its zedlets. This is needed when you're running zed under the ZTS in a local workspace. * Update test scripts to not copy in all-debug.sh and all-syslog.sh by default. They can be optionally copied in as part of zed_setup(). These scripts slow down zed considerably under heavy events loads and can cause events to be dropped or their delivery delayed. This was causing some sporadic failures in the 'fault' tests. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Richard Laager <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #4662 Closes #7086
* Fix free memory calculation on v3.14+chrisrd2018-02-238-41/+255
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide infrastructure to auto-configure to enum and API changes in the global page stats used for our free memory calculations. arc_free_memory has been broken since an API change in Linux v3.14: 2016-07-28 v4.8 599d0c95 mm, vmscan: move LRU lists to node 2016-07-28 v4.8 75ef7184 mm, vmstat: add infrastructure for per-node vmstats These commits moved some of global_page_state() into global_node_page_state(). The API change was particularly egregious as, instead of breaking the old code, it silently did the wrong thing and we continued using global_page_state() where we should have been using global_node_page_state(), thus indexing into the wrong array via NR_SLAB_RECLAIMABLE et al. There have been further API changes along the way: 2017-07-06 v4.13 385386cf mm: vmstat: move slab statistics from zone to node counters 2017-09-06 v4.14 c41f012a mm: rename global_page_state to global_zone_page_state ...and various (incomplete, as it turns out) attempts to accomodate these changes in ZoL: 2017-08-24 2209e409 Linux 4.8+ compatibility fix for vm stats 2017-09-16 787acae0 Linux 3.14 compat: IO acct, global_page_state, etc 2017-09-19 661907e6 Linux 4.14 compat: IO acct, global_page_state, etc The config infrastructure provided here resolves these issues going back to the original API change in v3.14 and is robust against further Linux changes in this area. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Chris Dunlop <[email protected]> Closes #7170
* Report duration and error in mmp_history entriesOlaf Faaland2018-02-225-13/+60
| | | | | | | | | | | | | | | | | | | | | | After an MMP write completes, update the relevant mmp_history entry with the time between submission and completion, and the error status of the write. [faaland1@toss3a zfs]$ cat /proc/spl/kstat/zfs/pool/multihost 39 0 0x01 100 8800 69147946270893 72723903122926 id txg timestamp error duration mmp_delay vdev_guid 10607 1166 1518985089 0 138301 637785455 4882... 10608 1166 1518985089 0 136154 635407747 1151... 10609 1166 1518985089 0 803618560 633048078 9740... 10610 1166 1518985090 0 144826 633048078 4882... 10611 1166 1518985090 0 164527 666187671 1151... Where duration = gethrtime_in_done_fn - gethrtime_at_submission, and error = zio->io_error. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #7190
* Do not initiate MMP writes while pool is suspendedOlaf Faaland2018-02-221-1/+1
| | | | | | | | | | | While the pool is suspended on host A, it may be imported on host B. If host A continued to write MMP blocks, it would be blindly overwriting MMP blocks written by host B, and the blocks written by host A would have outdated txg information. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #7182
* Linux 4.16 compat: use correct *_dec_and_test()Tony Hutter2018-02-223-1/+26
| | | | | | | | | | Use refcount_dec_and_test() on 4.16+ kernels, atomic_dec_and_test() on older kernels. https://lwn.net/Articles/714974/ Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes: #7179 Closes: #7211
* Fix bounds check in zio_crypt_do_objset_hmacsTom Caputi2018-02-221-4/+8
| | | | | | | | | | | | The current bounds check in zio_crypt_do_objset_hmacs() does not properly handle the possible sizes of the objset_phys_t and can therefore read outside the buffer's memory. If that memory happened to match what the check was actually looking for, the objset would fail to be owned, complaining that the MAC was invalid. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #7210
* OpenZFS 9035 - zfs: this statement may fall throughToomas Soome2018-02-214-2/+4
| | | | | | | | | | | | | Authored by: Toomas Soome <[email protected]> Reviewed by: Yuri Pankov <[email protected]> Reviewed by: Andy Fiddaman <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/9035 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/46ac8fdfc5 Closes #7206
* Allow modprobe to fail when called within systemdMatthew Thode2018-02-212-2/+2
| | | | | | | | | | This allows for systems with zfs built into the kernel manually to run these services. Otherwise the service will fail to start. Reviewed-by: loli10K <[email protected]> Reviewed-by: Kash Pande <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Matthew Thode <[email protected]> Closes #7174
* Add SMART attributes for SSD and NVMebunder20152018-02-213-2/+24
| | | | | | | | | | | | This adds the SMART attributes required to probe Samsung SSD and NVMe (and possibly others) disks when using the "zpool status -c" command. Reviewed-by: loli10K <[email protected]> Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: bunder2015 <[email protected]> Closes #7183 Closes #7193
* Allow make checkstyle and paxscript in build dirchrisrd2018-02-211-6/+7
| | | | | | | Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Chris Dunlop <[email protected]> Closes #7202
* Want 'zfs send -b'LOLi2018-02-219-120/+257
| | | | | | | | | | | | | | | This change implements 'zfs send -b' which can be used to send only received property values whether or not they are overridden by local settings. This can be very useful during "restore" operations from a backup pool because it allows to send only the property values originally sent from the backup source, even though they were later modified on the destination either by a 'zfs set' operation, explicit 'zfs inherit' or overridden during the receive process via 'zfs receive -o|-x'. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: loli10K <[email protected]> Closes #7156
* Raw receive should change key atomicallyTom Caputi2018-02-215-225/+312
| | | | | | | | | | | | | | | | | Currently, raw zfs sends transfer the encrypted master keys and objset_phys_t encryption parameters in the DRR_BEGIN payload of each send file. Both of these are processed as soon as they are read in dmu_recv_stream(), meaning that the new keys are set before the new snapshot is received. In addition to the fact that this changes the user's keys for the dataset earlier than they might expect, the keys were never reset to what they originally were in the event that the receive failed. This patch splits the processing into objset handling and key handling, the later of which is moved to dmu_recv_end() so that they key change can be done atomically. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #7200
* Prevent raw zfs recv -F if dataset is unencryptedTom Caputi2018-02-212-12/+33
| | | | | | | | | | | | | | | | | | | | | | The current design of ZFS encryption only allows a dataset to have one DSL Crypto Key at a time. As a result, it is important that the zfs receive code ensures that only one key can be in use at a time for a given DSL Directory. zfs receive -F complicates this, since the new dataset is received as a clone of the existing one so that an atomic switch can be done at the end. To prevent confusion about which dataset is actually encrypted a check was added to ensure that encrypted datasets cannot use zfs recv -F to completely replace existing datasets. Unfortunately, the check did not take into account unencrypted datasets being overriden by encrypted ones as a case. Along the same lines, the code also failed to ensure that raw recieves could not be done on top of existing unencrypted datasets, which causes amny problems since the new stream cannot be decrypted. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #7199
* Raw receives must compress metadnode blocksTom Caputi2018-02-217-41/+24
| | | | | | | | | | | | | | | | | | | Currently, the DMU relies on ZIO layer compression to free LO dnode blocks that no longer have objects in them. However, raw receives disable all compression, meaning that these blocks can never be freed. In addition to the obvious space concerns, this could also cause incremental raw receives to fail to mount since the MAC of a hole is different from that of a completely zeroed block. This patch corrects this issue by adding a special case in zio_write_compress() which will attempt to compress these blocks to a hole even if ZIO_FLAG_RAW_ENCRYPT is set. This patch also removes the zfs_mdcomp_disable tunable, since tuning it could cause these same issues. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #7198
* Remove unnecessary txg syncs from receive_object()Tom Caputi2018-02-211-1/+6
| | | | | | | | | | | | 1b66810b introduced serveral changes which improved the reliability of zfs sends when large dnodes were involved. However, these fixes required adding a few calls to txg_wait_synced() in the DRR_OBJECT handling code. Although most of them are currently necessary, this patch allows the code to continue without waiting in some cases where it doesn't have to. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #7197
* Add omitted set for os->os_next_write_rawTom Caputi2018-02-211-1/+4
| | | | | | | | | | | This one line patch adds adds a set to os->os_next_write_raw that was omitted when the code was updated in 1b66810. Without it, the code (in some instances) could attempt to write raw encrypted data as regular unencrypted data without the keys being loaded, triggering an ASSERT in zio_encrypt(). Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #7196
* Correct count_uberblocks in mmp.kshlibGiuseppe Di Natale2018-02-201-1/+1
| | | | | | | | | | | A log_must call was causing count_uberblocks to return more than just the uberblock count. Remove the log_must since it was only logging a sleep. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Olaf Faaland <[email protected]> Reviewed-by: loli10K <[email protected]> Signed-off-by: Giuseppe Di Natale <[email protected]> Closes #7191
* ZIL claiming should not start user accountingTom Caputi2018-02-205-38/+44
| | | | | | | | | | | | | | | | | | | | Currently, ZIL claiming dirties objsets which causes dsl_pool_sync() to attempt to perform user accounting on them. This causes problems for encrypted datasets that were raw received before the system went offline since they cannot perform user accounting until they have their keys loaded. This triggers an ASSERT in zio_encrypt(). Since encryption was added, the code now depends on the fact that data should only be written when objsets are owned. This patch adds a check in dmu_objset_do_userquota_updates() to ensure that useraccounting is only done when the objsets are actually owned for write. As part of this work, the zfsvfs and zvol code was updated so that it no longer lies about owning objsets readonly. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tom Caputi <[email protected]> Closes #6916 Closes #7163
* Fix coverity defects: zfs channel programsDon Brady2018-02-2014-19/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CID 173243, 173245: Memory - corruptions (OVERRUN) Added size argument to lcompat_sprintf() to avoid use of INT_MAX CID 173244: Integer handling issues (OVERFLOW_BEFORE_WIDEN) Added cast to uint64_t to avoid a 32 bit overflow warning CID 173242: Integer handling issues (CONSTANT_EXPRESSION_RESULT) Conditionally removed unused luai_numisnan() floating point check CID 173241: Resource leaks (RESOURCE_LEAK) Added missing close(fd) on error path CID 173240: (UNINIT) Fixed uninitialized variable in get_special_prop() CID 147560: Null pointer dereferences (NULL_RETURNS) Cleaned up bad code merge in dsl_dataset_promote_check() CID 28475: Memory - illegal accesses (OVERRUN) Fixed lcompat_sprintf() to use a size paramater CID 28418, 28422: Error handling issues (CHECKED_RETURN) Added function result cast to (void) to avoid warning CID 23935, 28411, 28412: Memory - corruptions (ARRAY_VS_SINGLETON) Added casts to avoid exposing result as an array Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #7181
* Project dnode should be protected by local MACTom Caputi2018-02-201-12/+26
| | | | | | | | | | | | This patch corrects a small security issue with 9c5167d1. When the project dnode was added to the objset_phys_t, it was not included in the local MAC for cryptographic protection, allowing an attacker to modify this data without the consent of the key holder. This patch does represent an on-disk format change for anyone using project dnodes on an encrypted dataset. Signed-off-by: Tom Caputi <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #7177
* Fix config issues: frame size and headerschrisrd2018-02-1511-11/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. With various (debug and/or tracing?) kernel options enabled it's possible for 'struct inode' and 'struct super_block' to exceed the default frame size, leaving errors like this in config.log: build/conftest.c:116:1: error: the frame size of 1048 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] Fix this by removing the frame size warning for config checks 2. Without the correct headers included, it's possible for declarations to be missed, leaving errors like this in the config.log: build/conftest.c:131:14: error: ‘struct nameidata’ declared inside parameter list [-Werror] Fix this by adding appropriate headers. Note: Both these issues can result in silent config failures because the compile failure is taken to mean "this option is not supported by this kernel" rather than "there's something wrong with the config test". This can lead to something merely annoying (compile failures) to something potentially serious (miscompiled or misused kernel primitives or functions). E.g. the fixes included here resulted in these additional defines in zfs_config.h with linux v4.14.19: Also, drive-by whitespace fixes in config/* files which don't mention "GNU" (those ones look to be imported from elsewhere so leave them alone). Reviewed by: Brian Behlendorf <[email protected]> Signed-off-by: Chris Dunlop <[email protected]> Closes #7169
* Address objtool check failures in lua moduleDon Brady2018-02-151-1/+3
| | | | | | | | | | | | The use of void __attribute__((noreturn)) in kernel builds was causing lots of warnings if CONFIG_STACK_VALIDATION is active. For now we just remove this attribute to achieve clean builds for the Lua module. There was no significant increase in the time to run the full channel_program ZTS tests. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes #7173
* Clarify zinject(8) explanation of -eOlaf Faaland2018-02-151-0/+3
| | | | | | | | | | | Error injection of EIO or ENXIO simply sets the zio's io_error value, rather than preventing the read or write from occurring. This is important information as it affects how the probes must be used. Reviewed-by: Giuseppe Di Natale <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Olaf Faaland <[email protected]> Closes #7172
* OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zioGeorge Wilson2018-02-142-20/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PROBLEM ======= It's possible for a parent zio to complete even though it has children which have not completed. This can result in the following panic: > $C ffffff01809128c0 vpanic() ffffff01809128e0 mutex_panic+0x58(fffffffffb94c904, ffffff597dde7f80) ffffff0180912950 mutex_vector_enter+0x347(ffffff597dde7f80) ffffff01809129b0 zio_remove_child+0x50(ffffff597dde7c58, ffffff32bd901ac0, ffffff3373370908) ffffff0180912a40 zio_done+0x390(ffffff32bd901ac0) ffffff0180912a70 zio_execute+0x78(ffffff32bd901ac0) ffffff0180912b30 taskq_thread+0x2d0(ffffff33bae44140) ffffff0180912b40 thread_start+8() > ::status debugging crash dump vmcore.2 (64-bit) from batfs0390 operating system: 5.11 joyent_20170911T171900Z (i86pc) image uuid: (not set) panic message: mutex_enter: bad mutex, lp=ffffff597dde7f80 owner=ffffff3c59b39480 thread=ffffff0180912c40 dump content: kernel pages only The problem is that dbuf_prefetch along with l2arc can create a zio tree which confuses the parent zio and allows it to complete with while children still exist. Here's the scenario: zio tree: pio |--- lio The parent zio, pio, has entered the zio_done stage and begins to check its children to see there are still some that have not completed. In zio_done(), the children are checked in the following order: zio_wait_for_children(zio, ZIO_CHILD_VDEV, ZIO_WAIT_DONE) zio_wait_for_children(zio, ZIO_CHILD_GANG, ZIO_WAIT_DONE) zio_wait_for_children(zio, ZIO_CHILD_DDT, ZIO_WAIT_DONE) zio_wait_for_children(zio, ZIO_CHILD_LOGICAL, ZIO_WAIT_DONE) If pio, finds any child which has not completed then it stops executing and goes to sleep. Each call to zio_wait_for_children() will grab the io_lock while checking the particular child. In this scenario, the pio has completed the first call to zio_wait_for_children() to check for any ZIO_CHILD_VDEV children. Since the only zio in the zio tree right now is the logical zio, lio, then it completes that call and prepares to check the next child type. In the meantime, the lio completes and in its callback creates a child vdev zio, cio. The zio tree looks like this: zio tree: pio |--- lio |--- cio The lio then grabs the parent's io_lock and removes itself. zio tree: pio |--- cio The pio continues to run but has already completed its check for ZIO_CHILD_VDEV and will erroneously complete. When the child zio, cio, completes it will panic the system trying to reference the parent zio which has been destroyed. SOLUTION ======== The fix is to rework the zio_wait_for_children() logic to accept a bitfield for all the children types that it's interested in checking. The io_lock will is held the entire time we check all the children types. Since the function now accepts a bitfield, a simple ZIO_CHILD_BIT() macro is provided to allow for the conversion between a ZIO_CHILD type and the bitfield used by the zio_wiat_for_children logic. Authored by: George Wilson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Andriy Gapon <[email protected]> Reviewed by: Youzhong Yang <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Giuseppe Di Natale <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8857 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/862ff6d99c Issue #5918 Closes #7168