directory. This doesn't affect non-wapbl renames; it affects wapbl because
one of the lock acquisitions was moved up past where this case otherwise
fails.
PR 40163 from Lloyd Parkes.
ffs_snapshot_read(): Use IO_ALTSEMANTICS to allow reading a snapshot vnode
beyond file system size. Needed to read the snapblklist
on mount.
Persistent snapshots work again.
Should fix PR kern/37425: fss_snapshot_mount panic during fsck.
ffs_snapshot_read(): Allow the kernel to read beyond file system size.
Persistent snapshots work again.
Should fix PR kern/37425: fss_snapshot_mount panic during fsck.
snapshot use. Adjust ffs_blkfree_common to get the fs instance passed
in, the original commit didn't account blocks in the snapshots
correctly. Assert that ffs_blkfree is used with the primary fs instance
and that ffs_checkfreefile is only used for snapshots. Move the bdwrite
from ffs_blkfree_common into the caller for symmetry. This creates a
redundant write of unmodified data for ffs_blkfree_snap if a double free
of a block happens.
Reviewed and tested by hannken@.
block in the cylinder groups of the filesystem. The other user is the
snapshot code, which wants to modify the copied cylinder groups. Use
different frontends to distinguish the cases in preparation for fine
grained locking for cylinder groups.
This changes the order of hook processing as the copy-on-write handlers
are called after the journal processing. This makes more sense as the
journal overwrite is logically part of the disk IO.
- Count frags, not blocks to get the file system size.
- Cannot use blksize() here, it depends on vnode size.
- Correctly update xfersize on short reads.
and wants to busy a page while another thread calls VOP_PUTPAGES on the same
vnode, takes pages busy and wants to start a wapbl transaction.
Reviewed by: Jason Thorpe <thorpej@netbsd.org>
helper functions to enhance readability. Adjust comments to reality
and test the main error paths.
While here, expand and remove the last FreeBSD->NetBSD conversion macros.
No functional change intended.
- Add UFS_WAPBL_BEGIN() / UFS_WAPBL_END() where needed.
- Expunge WAPBL log inodes from snapshots.
- Ffs_copyonwrite() and ffs_snapblkfree() must run inside a WAPBL transaction.
- Add ffs_gop_write() as a wrapper around genfs_gop_write() that makes sure
genfs_gop_write() gets always called inside a WAPBL transaction.
- Add VOP_PUTPAGES() flag PGO_JOURNALLOCKED to tag calls to VOP_PUTPAGES()
inside a WAPBL transaction.
Reviewed by: Simon Burge <simonb@netbsd.org>, Greg Oster <oster@netbsd.org>
PGO_JOURNALLOCKED / ffs_gop_write() part presented on tech-kern@.
snapshots. With this policy in place:
- Separate the snapshot vnode lock from the snapshot common lock.
Snapshots no longer need recursive vnode locks.
- Use a mutex (si_snaplock) to serialize creation, deletion, reading and
writing of snapshots.
- Move ffs_read() for snapshots into ffs_snapshot.c.
Reviewed by: Jason Thorpe <thorpej@netbsd.org>
While here change ffs_copyonwrite() to fail requests from pagedaemon that need
to copy-on-write.
repeatable panic in fstrans_getstate() found while searching for a
different USB bug. Also makes the code somewhat more readable.
Patch from Juergen Hannken-Illjes with a small rearrangement from me.
Approved by: hannken
an UFS2 file system. With the current cylinder group buffer busy it
calls ffs_getblk(). This runs through copy-on-write and may need the
current cylinder group buffer to allocate a new block for the snapshot.
While here write the cylinder group buffer synchronously after
cg_initediblk was changed because fsck_ffs will trust it.
Reviewed by: Jason Thorpe <thorpej@netbsd.org>
Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.
OK'd by core@, releng@.
Release allocated indir blocks on non-softdep file systems instead
of writing them twice.
It is sufficient to clean dirty data pages to avoid UBC inconsistencies.
ffs_snapblkfree() and wrsnapblk():
If a snapshots effective link count is zero there is no need
to use synchronous writes.
ffs_copyonwrite():
Defer locking the snapshots until there is a need to copy the block.
wrsnapblk():
Use vn_rdwr() instead of bwrite() to write to the snapshots.
mlelstv@ points out FreeBSD fixed the same thing a couple of years
ago - here's the commit message they used on rev 1.127:
Fixes a bug that caused UFS2 filesystems bigger than 2TB to
prematurely report that they were full and/or to panic the kernel
with the message ``ffs_clusteralloc: allocated out of group''.
Submitted by: Henry Whincup <henry@jot.to>
its not on a free list.
Also change buf_init() to not automatically mark buffers `busy' since this
only makes sense for bufcache buffers.
Mark all buf_init'd buffers 'busy' on the places where they ought to be
flagged as such to not confuse the buffer cache.
Fixes PR 38923.
If the number of deletes in progress is getting too high, newdirrem()
requests the syncer to flush faster, and in some cases will block to
prevent deletes accumulating faster than the disk can service them.
The syncer will try to lock vnodes that the remover holds locked, leading
to the syncer and remover proceeding in lockstep and making very little
overall forward progress.
Put a hook into ufs_rmdir() and ufs_remove() so that the softdep code
can pace itself without holding vnode locks if the number of deletes is
running out of control.
Use ufs_getlbns()/bread() instead.
Saves some reads and removes deep recursion with possible deadlock
when ffs_balloc() runs copy-on-write on the buffer returned.
run through copy-on-write. Call fscow_run() with valid data where possible.
The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.
- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.
- Always run copy-on-write on buffers returned from ffs_balloc().
- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.
Welcome to 4.99.63
Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
Make VFS hooks dynamic while we're here and say farewell to VFS_ATTACH and
VFS_HOOKS_ATTACH linksets.
As a consequence, most of the file systems can now be loaded as new style
modules.
Quick sanity check by ad@.
Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.
Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:
- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
and is only ever write locked in dounmount(). A write hold can't be taken
on this lock if the current LWP could hold a vnode lock.
- kmutex_t mnt_updating. This is taken by threads updating the mount, for
example when going r/o -> r/w, and is only present to serialize updates.
In order to take this lock, a read hold must first be taken on
mnt_unmounting, and the two need to be held across the operation.
One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
The previous fix worked, but it opened a window where mounts could have
disappeared from mountlist while the caller was traversing it using
vfs_trybusy(). Fix that.
The symptom was that sometimes file systems would occasionally not appear
in output from 'df' or 'mount' if the system was busy. Resolution:
- Make mount locks work somewhat like vm_map locks.
- vfs_trybusy() now only fails if the mount is gone, or if someone is
unmounting the file system. Simple contention on mnt_lock doesn't
cause it to fail.
- vfs_busy() will wait even if the file system is being unmounted.
we no longer need to guard against access from hardware interrupt handlers.
Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:
- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.
- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.
- The system spends less time at IPL_SCHED, and there is less lock activity.
initialization/finalization of snapshot private data on creation/deletion
of struct ufsmount.
Snapshot mounts no longer may fail silently because kmem_alloc() fails.
Welcome to 4.99.60
Ok: Andrew Doran <ad@netbsd.org>
pushing the syncer before considering rate limiting the deletes. We hold
vnodes locked and it's likely that the syncer will try to lock them while
flushing, leading to the syncer and remover proceeding in lockstep and
making very little forward progress. XXX this is not a solution.
- Reference count the mfsnode to fix an aincent bug. Only destroy when
reference count drops to zero. In mfs_start(), busy the mount and get
a reference to the mfsnode to prevent it disappearing while the server
is running. If the file system is gone already, vfs_busy() will fail.
- Always destroy the bufq.
- Use a global mfs_lock for simplicity.
- Replace use of malloc/free. Fixes broken MALLOC_TYPE change.