simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)
releng@ acknowledged
- more verbose output if vfs.wapbl.verbose_commit >= 2.
namely, time taken for each DIOCCACHESYNC calls.
wapbl_flush: 1322826000.785245900 this transaction = 546304 bytes
wapbl_cache_sync: 1: dev 0x0 0.017572724
wapbl_cache_sync: 2: dev 0x0 0.007199825
wapbl_flush: 1322826011.860771302 this transaction = 431104 bytes
wapbl_cache_sync: 1: dev 0x0 0.019469753
wapbl_cache_sync: 2: dev 0x0 0.009473410
wapbl_flush: 1322829266.489154342 this transaction = 187904 bytes
wapbl_cache_sync: 1: dev 0x4 0.022270180
wapbl_cache_sync: 2: dev 0x4 0.030749402
- fix a comment.
- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.
- Simplify locking in some pmap(9) modules by removing P->V locking.
- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).
- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.
- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.
Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
(and much lower) value. When flushing the log these deallocations will
produce new blocks and that may execeed the journal size resulting in
a "wapbl_flush: current transaction too big to flush" panic.
Seen when removing a large snapshot.
Adresses PR #44568 (WAPBL doens't play nice with snapshots).
protect wl_dealloc* members. Take the mutex here and change the lock
requirements of these fields to "writer lock or mutex".
This error lead to file system corruption and "freeing free block" panics.
-wrong initialization reported in a followup to PR bin/43336
(looks harmless because it applies to zero-initialized memory, so
LIST_INIT() is a no-op)
-wrong loop count in reply misses a hash bucket (PR kern/43827)
(this was introduced by a post-netbsd-5 change, so it isn't related
to the PR above)
conditionally build DEV_BSIZE adjustments for kernel. fsck_ffs shares
the same code but accesses physical blocks.
Also compute correct block numbers for each physical sector.
Also change access to filesystem blocks to be done by fragment instead
of by physical block. Fragments are the fundamental blocks of the
filesystem.
For a theoretical filesystem that accesses the disk in smaller units
than stored in mp->mnt_fs_bshift, the assumption might be wrong. But
this will also break other subsystems. The value mp->mnt_dev_bshift
which formerly represents the physical sector size is currently only
virtual in NetBSD (always DEV_BSIZE).
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc
- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.
- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.
- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.
- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.
- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.
- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
Merge wapbl_replay_get_inodes into wapbl_replay_prescan. Change the
logic to determine the head: It doesn't make sense to update it if the
last inode record seen was not the beginning of the journal, as the
beginning of the journal might not be 0, so always update inodeshead.
transactions. The initial prescan has already sorted out what blocks are
in the journal and removed any revoced blocks, so the hash table is
authorative.
This changes the order of hook processing as the copy-on-write handlers
are called after the journal processing. This makes more sense as the
journal overwrite is logically part of the disk IO.