- Add a missing ACTIVECG_CLR().
ffs/ffs_snapshot.c:
- Use async/delayed writes for snapshot creation and sync/uncache these buffers
on end. Reduces the time the file system must be suspended.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
- Byte swap the list of preallocated blocks on read/write instead of access.
- Always keep this list on ip->i_snapblklist so it may be rolled back when the
newest snapshot gets removed. Fixes a rare snapshot corruption when using
more than one snapshot on a file system.
ufs/ufsmount.h:
- Make TAILQ_LAST() possible on member um_snapshots.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
lfs_balloc(), and use that to estimate the number of dirty pages belonging
to LFS (subsystem or filesystem). This is almost certainly wrong for
the case of a large mmap()ed region, but the accounting is tighter than
what we had before, and performs much better in the typical case of pages
dirtied through write().
to prevent a deadlock trying to call VOP_PUTPAGES() on a VDIROP vnode.
This can happen when a stacked filesystem is mounted on top of an LFS: an
LFS dirop needs to get a vnode, which is available from the upper layer.
The corresponding lower layer vnode, however, is VDIROP, so the upper layer
can't be cleaned out since its VOP_PUTPAGES() is passed through to the lower
layer, which waits for dirops to drain before it can proceed. Deadlock.
Tweak ufs_makeinode() and ufs_mkdir() to pass the a_vpp argument through
to VOP_VALLOC().
Partially addresses PR # 26043, though it probably does not completely fix
the problem described there.
* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
directories are created on the fly and used to increase
performance by circumventing ufs_lookup's linear search.
Dirhash is enabled by the UFS_DIRHASH option, but not
by default.
this means we can no longer look at the vnode size to determine how many
pages to request in a fault, which is good since for NFS the size can change
out from under us on the server anyway. there's also a new flag UBC_UNMAP
for ubc_release(), so that the file system code can make the decision about
whether to cache mappings for files being used as executables.
top bit in block addresses.
Also, change some daddr_t->int32_t casts (mostly as arguments to ufs_rw32(),
where they would get promoted anyway) to u_int32_t.
this condition can occur if ufs_inactive experiences failure attempting
to write the inode out. Instead, have ufs_reclaim always call VOP_UPDATE
which will only write out the inode if there are unflushed changes
calls to ensure that the vnode lock state is as expected when the VOP
call is made. Modify vnode_if.src to set the expected state according
to the documenting lock table for each VOP. Modify vnode_if.sh to emit
the checks.
Notes:
- The checks are only performed if the vnode has the VLOCKSWORK bit
set. Some file systems (e.g. specfs) don't even bother with vnode
locks, so of course the checks will fail.
- We can't actually run with VNODE_LOCKDEBUG because there are so many
vnode locking problems, not the least of which is the "use SHARED for
VOP_READ()" issue, which screws things up for the entire call chain.
Inspired by similar changes in OpenBSD, but implemented differently.
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.
Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.
Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86
Clean up some crappy pointer frobnication.
rarely in the normal case. (Note: This happens at reboot/shutdown time because
all file systems are unmounted.)
Also, for IN_MODIFY, use IN_ACCESSED, not IN_MODIFIED; otherwise "ls -l" of
your device node or FIFO would cause the time stamps to get written too
quickly.
setting those flags, it does not cause the inode to be written in the periodic
sync. This is used for writes to special files (devices and named pipes) and
FIFOs.
Do not preemptively sync updates to access times and modification times. They
are now updated in the inode only opportunistically, or when the file or device
is closed. (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)
And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced. In ext2fs, it was causing the metadata to *not* be synced. We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed
that we do in fact trickle correctly now.
- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.
Welcome to 2.0F.
Approved by: Jason R. Thorpe <thorpej@netbsd.org>
no longer use and/or need it
- removed casts from unionfs, deadfs and fdesc
(there are more to hunt down still)
- changed vfs_quotactl args argumet from caddr_t to void *
- changed vfs_quotactl structures/callers to reflect the api change
Compiled fine and ran for about a day. Approved/reviewed by
christos@netbsd.org and gimpy@netbsd.org.
VOP_STRATEGY(bp) is replaced by one of two new functions:
- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.
DEV_STRATEGY(bp) is used only for block-to-block device situations.
add compatibility for filesystems created before FFSv2 integration
these patches are from pr port-macppc/23926 and should also fix
problems discussed in pr kern/21404 and pr kern/21283
previous error conditions.
If "(flags & (V_WAIT|V_PCATCH)) == V_WAIT" the return value is always zero.
Ignore the return value in these cases.
From Darrin B. Jewell.
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.
From FreeBSD with slight modifications.
Approved by: Frank van der Linden <fvdl@netbsd.org>
panics in ffs_full_fsync because v_specmountpoint requires that the NULL
v_specinfo be followed.
Tidy up in the same order in all error paths so compiler can merge the
code sequences.
Fixes PR kern/22419