- Replace the ugly sync loop in ffs_full_fsync() and ffs_vfs_fsync() with
vflushbuf(). This loop is a relic of softdeps and not needed anymore.
- Add ffs_spec_fsync() for device nodes on ffs file systems that calls
spec_fsync() like all other file systems do and then updates the ctime.
Discussed on tech-kern.
Should fix PRs:
PR #41192 wapbl diagnostic panic during cgdconfig
PR #41977 kernel diagnostic assertion "rw_lock_held(&wl->wl_rwlock)" failed
PR #42149 wapbl locking panic if watching DVD
PR #42551 Lockdebug assert in wapbl when running zpool
hack is ffs_sync().
- Use the generic lock operations for ffs.
- Change ffs_sync() to omit the vnode lock while suspending.
Reviewed by: Antti Kantee <pooka@netbsd.org>
- atime updates were not being synced.
ffs_sync:
- In some cases the sync vnode was acting like now dead /usr/sbin/update.
It was examining vnodes that it should have ignored.
- It would find dirty inodes and try to flush them. Often ffs_fsync()
cheerfully ignored the flush request due to the fsync bug. Such inodes
remained dirty and were repeatedly re-examined by the syncer until
vnode reclaim or system shutdown.
- We were marking our place in the per-mount vnode list even though in
most cases there was not flush to perform. While not a bug, this wasted
CPU cycles because a TAILQ_NEXT would have sufficed.
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep
Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc
- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.
- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.
- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.
- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.
- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.
- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
and wants to busy a page while another thread calls VOP_PUTPAGES on the same
vnode, takes pages busy and wants to start a wapbl transaction.
Reviewed by: Jason Thorpe <thorpej@netbsd.org>
- Add UFS_WAPBL_BEGIN() / UFS_WAPBL_END() where needed.
- Expunge WAPBL log inodes from snapshots.
- Ffs_copyonwrite() and ffs_snapblkfree() must run inside a WAPBL transaction.
- Add ffs_gop_write() as a wrapper around genfs_gop_write() that makes sure
genfs_gop_write() gets always called inside a WAPBL transaction.
- Add VOP_PUTPAGES() flag PGO_JOURNALLOCKED to tag calls to VOP_PUTPAGES()
inside a WAPBL transaction.
Reviewed by: Simon Burge <simonb@netbsd.org>, Greg Oster <oster@netbsd.org>
PGO_JOURNALLOCKED / ffs_gop_write() part presented on tech-kern@.
snapshots. With this policy in place:
- Separate the snapshot vnode lock from the snapshot common lock.
Snapshots no longer need recursive vnode locks.
- Use a mutex (si_snaplock) to serialize creation, deletion, reading and
writing of snapshots.
- Move ffs_read() for snapshots into ffs_snapshot.c.
Reviewed by: Jason Thorpe <thorpej@netbsd.org>
While here change ffs_copyonwrite() to fail requests from pagedaemon that need
to copy-on-write.
repeatable panic in fstrans_getstate() found while searching for a
different USB bug. Also makes the code somewhat more readable.
Patch from Juergen Hannken-Illjes with a small rearrangement from me.
Approved by: hannken
Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.
OK'd by core@, releng@.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.
quick consensus on tech-kern
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.
Implemented for file systems of type ffs.
The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.
Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.
Welcome to 4.99.9 (new vfs op vfs_suspendctl).
- remove GOP_SIZE_READ/GOP_SIZE_WRITE flags.
they have not been used since the change.
- ufs_balloc_range: remove code which has been no-op since the change.
thanks Konrad Schroder for explaining the original intention of the code.
- ffs_gop_size: don't extend past eof, in the case of GOP_SIZE_MEM.
otherwise genfs_getpages end up to allocate pages past eof unnecessarily.
backing file per attribute type indexed by inode number to hold the extended
attributes.
This is working pretty well on my test systems, except for the "autostart"
feature. I need someone with a better handle on the VFS locking protocol
to go over that.
This is a work-in-progress. There are parts of this that could be re-factored
allowing this approach to be used on other types of file systems.
Adapted from FreeBSD.