summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Fix distribution detection for gentooAlexey Shvetsov2011-05-142-8/+8
| | | | | | | Also this may fix other distros because some of them also provide /etc/lsb-release not only ubuntu. Closes #244
* Update synchronous open zfs_close() commentBrian Behlendorf2011-05-131-1/+5
| | | | | | | | | | | The comment in zfs_close() pertaining to decrementing the synchronous open count needs to be updated for Linux. The code was already updated to be correct, but the comment was missed and is now misleading. Under Linux the zfs_close() hook is only called once when the final reference is dropped. This differs from Solaris where zfs_close() is called for each close. Closes #237
* Remove root 'ls' after mount workaroundAlexey Shvetsov2011-05-122-20/+0
| | | | | | This workaround was introduced to workaround issue #164. This issue was fixed by commit 5f35b19 so the workaround can be safely dropped from both the zfs.fedora and zfs.gentoo init scripts.
* Fix zfs.gentoo init script logicAlexey Shvetsov2011-05-121-1/+2
| | | | | * Fix zfs.ko module check * Check 'zfs umount -a' return value
* Make zfs.gentoo init script more gentoo style.Alexey Shvetsov2011-05-121-72/+29
| | | | | | | | * Improved compatibility with openrc * Removed LOCKFILE * Improved checksystem() function * Remove /etc/mtab check for / * General cleanup
* Merge pull request #235 from nedbass/rdevBrian Behlendorf2011-05-092-7/+16
|\ | | | | Don't store rdev in SA for FIFOs and sockets
| * Don't store rdev in SA for FIFOs and socketsNed A. Bass2011-05-092-7/+16
| | | | | | | | | | | | | | | | | | | | | | | | Update the handling of named pipes and sockets to be consistent with other platforms with regard to the rdev attribute. While all ZFS ipmlementations store the rdev for device files in a system attribute (SA), this is not the case for FIFOs and sockets. Indeed, Linux always passes rdev=0 to mknod() for FIFOs and sockets, so the value is not needed. Add an ASSERT that rdev==0 for FIFOs and sockets to detect if the expected behavior ever changes. Closes #216
* | Disable direct reclaim for z_wr_* threadsBrian Behlendorf2011-05-062-3/+7
|/ | | | | | | | | | | | | | | | | | The direct reclaim path in the z_wr_* threads must be disabled to ensure forward progress is always maintained for txg processing. This ensures that a txg will never get stuck waiting on itself because it entered the following memory reclaim callpath. ->prune_icache()->dispose_list()->zpl_clear_inode()->zfs_inactive() ->dmu_tx_assign()->dmu_tx_wait()->tgx_wait_open() It would be preferable to target this exact code path but the kernel offers no way to do this without custom patches. To avoid this we are forced to disable all reclaim for these threads. It should not be necessary to do this for other other z_* threads because they will not hold a txg open. Closes #232
* Handle NULL in nfsd .fsync() hookBrian Behlendorf2011-05-062-15/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | How nfsd handles .fsync() has been changed a couple of times in the recent kernels. But basically there are three cases we need to consider. Linux 2.6.12 - 2.6.33 * The .fsync() hook takes 3 arguments * The nfsd will call .fsync() with a NULL file struct pointer. Linux 2.6.34 * The .fsync() hook takes 3 arguments * The nfsd no longer calls .fsync() but instead used sync_inode() Linux 2.6.35 - 2.6.x * The .fsync() hook takes 2 arguments * The nfsd no longer calls .fsync() but instead used sync_inode() For once it looks like we've gotten lucky. The first two cases can actually be collased in to one if we stop using the file struct pointer entirely. Since the dentry is still passed in both cases this is possible. The last case can then be safely handled by unconditionally using the dentry in the file struct pointer now that we know the nfsd caller has been removed. Closes #230
* Fix awk usageBrian Behlendorf2011-05-062-4/+5
| | | | | | | | | | | | | | | The zpool_id and zpool_layout helper scripts have been updated to use the more common /usr/bin/awk symlink. On Fedora/Redhat systems there are both /bin/awk and /usr/bin/awk symlinks to your installed version of awk. On Debian/Ubuntu systems only the /usr/bin/awk symlink exists. Additionally, add the '\<' token to the beginning of the regex pattern to prevent partial matches. This pattern only appears to work with gawk despite the mawk man page claiming to support this extended regex. Thus you will need to have gawk installed to use these optional helper scripts. A comment has been added to the script to reflect this reality.
* Use vmem_alloc() for zfs_ioc_pool_get_history()Brian Behlendorf2011-05-061-2/+2
| | | | | | | | The default buffer size when requesting history is 128k. This is far to large for a kmem_alloc() so instead use the slower vmem_alloc(). This path has no performance concerns and the buffer is immediately free'd after its contents are copied to the user space buffer.
* Allow mounting of read-only snapshotszfs-0.6.0-rc4Brian Behlendorf2011-05-051-3/+8
| | | | | | | | With the addition of the mount helper we accidentally regressed the ability to manually mount snapshots. This commit updates the mount helper to expect the possibility of a ZFS_TYPE_SNAPSHOT. All snapshot will be automatically treated as 'legacy' type mounts so they can be mounted manually.
* Add missing ZFS tunablesBrian Behlendorf2011-05-0418-43/+178
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds module options for all existing zfs tunables. Ideally the average user should never need to modify any of these values. However, in practice sometimes you do need to tweak these values for one reason or another. In those cases it's nice not to have to resort to rebuilding from source. All tunables are visable to modinfo and the list is as follows: $ modinfo module/zfs/zfs.ko filename: module/zfs/zfs.ko license: CDDL author: Sun Microsystems/Oracle, Lawrence Livermore National Laboratory description: ZFS srcversion: 8EAB1D71DACE05B5AA61567 depends: spl,znvpair,zcommon,zunicode,zavl vermagic: 2.6.32-131.0.5.el6.x86_64 SMP mod_unload modversions parm: zvol_major:Major number for zvol device (uint) parm: zvol_threads:Number of threads for zvol device (uint) parm: zio_injection_enabled:Enable fault injection (int) parm: zio_bulk_flags:Additional flags to pass to bulk buffers (int) parm: zio_delay_max:Max zio millisec delay before posting event (int) parm: zio_requeue_io_start_cut_in_line:Prioritize requeued I/O (bool) parm: zil_replay_disable:Disable intent logging replay (int) parm: zfs_nocacheflush:Disable cache flushes (bool) parm: zfs_read_chunk_size:Bytes to read per chunk (long) parm: zfs_vdev_max_pending:Max pending per-vdev I/Os (int) parm: zfs_vdev_min_pending:Min pending per-vdev I/Os (int) parm: zfs_vdev_aggregation_limit:Max vdev I/O aggregation size (int) parm: zfs_vdev_time_shift:Deadline time shift for vdev I/O (int) parm: zfs_vdev_ramp_rate:Exponential I/O issue ramp-up rate (int) parm: zfs_vdev_read_gap_limit:Aggregate read I/O over gap (int) parm: zfs_vdev_write_gap_limit:Aggregate write I/O over gap (int) parm: zfs_vdev_scheduler:I/O scheduler (charp) parm: zfs_vdev_cache_max:Inflate reads small than max (int) parm: zfs_vdev_cache_size:Total size of the per-disk cache (int) parm: zfs_vdev_cache_bshift:Shift size to inflate reads too (int) parm: zfs_scrub_limit:Max scrub/resilver I/O per leaf vdev (int) parm: zfs_recover:Set to attempt to recover from fatal errors (int) parm: spa_config_path:SPA config file (/etc/zfs/zpool.cache) (charp) parm: zfs_zevent_len_max:Max event queue length (int) parm: zfs_zevent_cols:Max event column width (int) parm: zfs_zevent_console:Log events to the console (int) parm: zfs_top_maxinflight:Max I/Os per top-level (int) parm: zfs_resilver_delay:Number of ticks to delay resilver (int) parm: zfs_scrub_delay:Number of ticks to delay scrub (int) parm: zfs_scan_idle:Idle window in clock ticks (int) parm: zfs_scan_min_time_ms:Min millisecs to scrub per txg (int) parm: zfs_free_min_time_ms:Min millisecs to free per txg (int) parm: zfs_resilver_min_time_ms:Min millisecs to resilver per txg (int) parm: zfs_no_scrub_io:Set to disable scrub I/O (bool) parm: zfs_no_scrub_prefetch:Set to disable scrub prefetching (bool) parm: zfs_txg_timeout:Max seconds worth of delta per txg (int) parm: zfs_no_write_throttle:Disable write throttling (int) parm: zfs_write_limit_shift:log2(fraction of memory) per txg (int) parm: zfs_txg_synctime_ms:Target milliseconds between tgx sync (int) parm: zfs_write_limit_min:Min tgx write limit (ulong) parm: zfs_write_limit_max:Max tgx write limit (ulong) parm: zfs_write_limit_inflated:Inflated tgx write limit (ulong) parm: zfs_write_limit_override:Override tgx write limit (ulong) parm: zfs_prefetch_disable:Disable all ZFS prefetching (int) parm: zfetch_max_streams:Max number of streams per zfetch (uint) parm: zfetch_min_sec_reap:Min time before stream reclaim (uint) parm: zfetch_block_cap:Max number of blocks to fetch at a time (uint) parm: zfetch_array_rd_sz:Number of bytes in a array_read (ulong) parm: zfs_pd_blks_max:Max number of blocks to prefetch (int) parm: zfs_dedup_prefetch:Enable prefetching dedup-ed blks (int) parm: zfs_arc_min:Min arc size (ulong) parm: zfs_arc_max:Max arc size (ulong) parm: zfs_arc_meta_limit:Meta limit for arc size (ulong) parm: zfs_arc_reduce_dnlc_percent:Meta reclaim percentage (int) parm: zfs_arc_grow_retry:Seconds before growing arc size (int) parm: zfs_arc_shrink_shift:log2(fraction of arc to reclaim) (int) parm: zfs_arc_p_min_shift:arc_c shift to calc min/max arc_p (int)
* Prep zfs-0.6.0-rc4 tagBrian Behlendorf2011-05-031-1/+1
| | | | Create the fourth 0.6.0 release candidate tag (rc4).
* Add Gentoo/Lunar/Redhat Init ScriptsBrian Behlendorf2011-05-027-10/+419
| | | | | | | | | | | | Every distribution has slightly different requirements for their init scripts. Because of this the zfs package contains several init scripts for various distributions. These scripts have been contributed by, and are supported by, the larger zfs community. Init scripts for Gentoo/Lunar/Redhat have been contributed by: Gentoo - devsk <[email protected]> Lunar - Jean-Michel Bruenn <[email protected]> Redhat - Fajar A. Nugraha <[email protected]>
* Fully update inode when createdBrian Behlendorf2011-05-021-2/+1
| | | | | | | | | | | | | | When a new znode/inode pair is created both the znode and the inode should be immediately updated to the correct values. This was done for the znode and for most of the values in the inode, but not all of them. This normally wasn't a problem because most subsequent operations would cause the inode to be immediately updated. This change ensures the inode is now fully updated before it is inserted in to the inode hash. Closes #116 Closes #146 Closes #164
* Fix 'zfs set volsize=N pool/dataset'Brian Behlendorf2011-05-0257-11/+167
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change fixes a kernel panic which would occur when resizing a dataset which was not open. The objset_t stored in the zvol_state_t will be set to NULL when the block device is closed. To avoid this issue we pass the correct objset_t as the third arg. The code has also been updated to correctly notify the kernel when the block device capacity changes. For 2.6.28 and newer kernels the capacity change will be immediately detected. For earlier kernels the capacity change will be detected when the device is next opened. This is a known limitation of older kernels. Online ext3 resize test case passes on 2.6.28+ kernels: $ dd if=/dev/zero of=/tmp/zvol bs=1M count=1 seek=1023 $ zpool create tank /tmp/zvol $ zfs create -V 500M tank/zd0 $ mkfs.ext3 /dev/zd0 $ mkdir /mnt/zd0 $ mount /dev/zd0 /mnt/zd0 $ df -h /mnt/zd0 $ zfs set volsize=800M tank/zd0 $ resize2fs /dev/zd0 $ df -h /mnt/zd0 Original-patch-by: Fajar A. Nugraha <[email protected]> Closes #68 Closes #84
* Add zpl_export.c to the list of targets.Alejandro R. SedeƱo2011-04-292-0/+2
|
* Correct MAXUIDBrian Behlendorf2011-04-291-1/+1
| | | | | | | | | | The uid_t on most systems is in fact and unsigned 32-bit value. This is almost always correct, however you could compile your kernel to use an unsigned 16-bit value for uid_t. In practice I've never encountered a distribution which does this so I'm willing to overlook this corner case for now. Closes #165
* Implemented NFS export_operations.Gunnar Beutner2011-04-2961-12/+273
| | | | | Implemented the required NFS operations for exporting ZFS datasets using the in-kernel NFS daemon.
* Suppress 'vdev_metaslab_init' memory warningBrian Behlendorf2011-04-271-1/+1
| | | | | | | The vdev_metaslab_init() function has been observed to allocate larger than 8k chunks. However, they are not much larger than 8k and it does this infrequently so it is allowed and the warning is supressed.
* Conserve stack in dsl_scan_visit()Brian Behlendorf2011-04-261-10/+15
| | | | | | | | | | | | | | The dsl_scan_visit() function is a little heavy weight taking 464 bytes on the stack. This can be easily reduced for little cost by moving zap_cursor_t and zap_attribute_t off the stack and on to the heap. After this change dsl_scan_visit() has been reduced in size by 320 bytes. This change was made to reduce stack usage in the dsl_scan_sync() callpath which is recursive and has been observed to overflow the stack. Issue #174
* Conserve stack in dsl_scan_visitbp()Brian Behlendorf2011-04-261-5/+12
| | | | | | | | | | | This function is called recursively so everything possible must be done to limit its stack consumption. The dprintf_bp() debugging function adds 30 bytes of local variables to the function we cannot afford. By commenting out this debugging we save 30 bytes per recursion and depths of 13 are not uncommon. This yeilds a total stack saving of 390 bytes on our 8k stack. Issue #174
* Conserve stack in dsl_scan_visitbp()Brian Behlendorf2011-04-261-2/+2
| | | | | | | | | | | | | The recursive call chain dsl_scan_visitbp() -> dsl_scan_recurse() -> dsl_scan_visitdnode() -> dsl_scan_visitbp has been observed to consume considerable stack resulting in a stack overflow (>8k). The cleanest way I see to fix this with minimal impact to the existing flow of code, and with the fewest performance concerns, is to always inline dsl_scan_recurse() and dsl_scan_visitdnode(). While this will increase the function size of dsl_scan_visitbp(), by 4660 bytes, it also reduces the stack requirements by removing the function call overhead. Issue #174
* Merged pull request #212 from dajhorn/hostid.Brian Behlendorf2011-04-263-12/+5
|\ | | | | Use gethostid in the Linux convention.
| * Use gethostid in the Linux convention.Darik Horn2011-04-253-12/+5
| | | | | | | | | | | | | | | | | | | | Disable the gethostid() override for Solaris behavior because Linux systems implement the POSIX standard in a way that allows a negative result. Mask the gethostid() result to the lower four bytes, like coreutils does in /usr/bin/hostid, to prevent junk bits or sign-extension on systems that have an eight byte long type. This can cause a spurious hostid mismatch that prevents zpool import on 64-bit systems.
* | Fix zvol deadlockBrian Behlendorf2011-04-261-1/+2
|/ | | | | | | | | | | | It's possible for a zvol_write thread to enter direct memory reclaim while holding open a transaction group. This results in the system attempting to write out data to the disk to free memory. Unfortunately, this can't succeed because the the thread doing reclaim is holding open the txg which must be closed to be synced to disk. To prevent this the offending allocation is marked KM_PUSHPAGE which will prevent it from attempting writeback. Closes #191
* Fix 32-bit MAXOFFSET_T definitionBrian Behlendorf2011-04-221-7/+2
| | | | | | | | | | | Having MAXOFFSET_T defined to 0x7fffffffl was artificially limiting the maximum file size on 32-bit systems. In reality MAXOFFSET_T is used when working with 'long long' types and as such we now define it as LLONG_MAX. This resolves the 2GB file size limit for files and additionally allows zvols greater than 2GB on 32-bit systems. Closes #136 Closes #81
* Fix spurious -EFAULT when setting I/O schedulerBrian Behlendorf2011-04-221-17/+12
| | | | | | | | | | | | Occasionally we would see an -EFAULT returned when setting the I/O scheduler on a vdev. This was caused an improperly formatted user mode helper command. This commit restructures the command to something simpler, allocates space for it dynamically to save stack, and removes the retry logic which is no longer needed. Closes #169
* Enforce ARC meta-data limitsBrian Behlendorf2011-04-211-2/+15
| | | | | | | | | | | | | | | This change ensures the ARC meta-data limits are enforced. Without this enforcement meta-data can grow to consume all of the ARC cache pushing out data and hurting performance. The cache is aggressively reclaimed but this is a soft and not a hard limit. The cache may exceed the set limit briefly before being brought under control. By default 25% of the ARC capacity can be used for meta-data. This limit can be tuned by setting the 'zfs_arc_meta_limit' module option. Once this limit is exceeded meta-data reclaim will occur in 3 percent chunks, or may be tuned using 'arc_reduce_dnlc_percent'. Closes #193
* Fixed a use-after-free bug in zfs_zget().Gunnar Beutner2011-04-211-1/+23
| | | | | | | | | | Fixed a bug where zfs_zget could access a stale znode pointer when the inode had already been removed from the inode cache via iput -> iput_final -> ... -> zfs_zinactive but the corresponding SA handle was still alive. Signed-off-by: Brian Behlendorf <[email protected]> Closes #180
* Suppress 'zfs receive' memory warningBrian Behlendorf2011-04-201-1/+1
| | | | | | | | | | As part of zfs_ioc_recv() a zfs_cmd_t is allocated in the kernel which is 17808 bytes in size. This sort of thing in general should be avoided. However, since this should be an infrequent event for now we allow it and simply suppress the warning with the KM_NODEBUG flag. This can be revisited latter if/when it becomes an issue. Closes #178
* Added required runlevel info for init on DebianAniruddha Shankar2011-04-201-0/+2
| | | | | Signed-off-by: Brian Behlendorf <[email protected]> Closes #208
* Update zconfig.sh to use new zvol namesBrian Behlendorf2011-04-191-82/+99
| | | | | | | | | | | | | | | | | | | This change should have occured when we commited the new udev rules for zvols. Basically, the test script is just out of date. We need to update it to use the /dev/zvol/ device names, and to expect the more common -partN suffixes. I added a udev_trigger() call in zconfig_partition() and zconfig_zvol_device_stat() to ensure that all the udev rules have run before. This ensures the devices are available to subsequent commands and closes a small race. Finally, I was forced added a small 'sleep 1' to test 10. I was observing occassional failures in my VM due to the device still claiming to be busy. Delaying betwen the various methods of adding/removing a vdev avoids the issue. Closes #207
* Add parted and lsscsi dependencies to zfs-testBrian Behlendorf2011-04-191-0/+1
| | | | | | | | | | | | | The zfault.sh and zconfig.sh test scripts requires the parted utility, the lsscsi utility, and the scsi_debug module. To ensure the utilities are available they have been added as dependencies to zfs-test package. Checking for scsi_debug is a little more problematic because if it's missing you will need to build it. For clarity the documention has been updated to mention this. Closes #205 Closes #206
* Add Gunnar Beutner to AUTHORS for his contributionsBrian Behlendorf2011-04-191-0/+1
|
* Truncate the xattr znode when updating existing attributes.Gunnar Beutner2011-04-191-1/+7
| | | | | | | | If the attribute's new value was shorter than the old one the old code would leave parts of the old value in the xattr znode. Signed-off-by: Brian Behlendorf <[email protected]> Closes #203
* Added missing initialization for va.va_dentry in zfs_get_xattrdir.Gunnar Beutner2011-04-191-0/+1
| | | | | | | | | | | | Without this we may mistakenly believe we have a dentry and try to d_instantiate() it. This will result in the following BUG. It's important to note that while the xattr directory has an inode assoicated with it we never create a dentry for it. kernel BUG at fs/dcache.c:1418! Signed-off-by: Brian Behlendorf <[email protected]> Closes #202
* Support IEC base-2 prefixesRichard Laager2011-04-191-4/+7
| | | | Signed-off-by: Brian Behlendorf <[email protected]>
* Cleanup various Sun/Solaris-ismsRichard Laager2011-04-193-99/+11
| | | | Signed-off-by: Brian Behlendorf <[email protected]>
* Update the version in the zpool upgrade exampleRichard Laager2011-04-191-1/+1
| | | | Signed-off-by: Brian Behlendorf <[email protected]>
* Normalize the deferred destruction languageRichard Laager2011-04-191-3/+3
| | | | Signed-off-by: Brian Behlendorf <[email protected]>
* Improve the wording about hot sparesRichard Laager2011-04-191-2/+2
| | | | Signed-off-by: Brian Behlendorf <[email protected]>
* Improve some quoting consistencyRichard Laager2011-04-191-3/+3
| | | | Signed-off-by: Brian Behlendorf <[email protected]>
* Remove a stray tabRichard Laager2011-04-191-1/+1
| | | | Signed-off-by: Brian Behlendorf <[email protected]>
* Linux has "partitions", not "slices"Richard Laager2011-04-191-2/+2
| | | | Signed-off-by: Brian Behlendorf <[email protected]>
* Use Linux disk names in zpool.8Richard Laager2011-04-191-25/+25
| | | | Signed-off-by: Brian Behlendorf <[email protected]>
* More and correct an example in zpool.8Richard Laager2011-04-191-1/+1
| | | | Signed-off-by: Brian Behlendorf <[email protected]>
* Change /dev/dsk -> /dev in the man pagesRichard Laager2011-04-192-5/+5
| | | | Signed-off-by: Brian Behlendorf <[email protected]>
* Correct man page section numbers for LinuxRichard Laager2011-04-193-22/+22
| | | | Signed-off-by: Brian Behlendorf <[email protected]>