lfs_balloc(), and use that to estimate the number of dirty pages belonging
to LFS (subsystem or filesystem). This is almost certainly wrong for
the case of a large mmap()ed region, but the accounting is tighter than
what we had before, and performs much better in the typical case of pages
dirtied through write().
be assured that the last byte of a file is always allocated. Previously
a file extension could cause the filesystem to be flushed, writing an
inconsistent inode to disk. Although this condition would be corrected
the next time blocks were written to disk, an intervening crash would leave
the filesystem in an inconsistent state, leaving fsck_lfs to complain
of an inode "partially truncated".
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled). Re-enable the LFS
statistics in sysctl, while I'm there. A bit of a rototill.
* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
setting those flags, it does not cause the inode to be written in the periodic
sync. This is used for writes to special files (devices and named pipes) and
FIFOs.
Do not preemptively sync updates to access times and modification times. They
are now updated in the inode only opportunistically, or when the file or device
is closed. (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)
And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced. In ext2fs, it was causing the metadata to *not* be synced. We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed
that we do in fact trickle correctly now.
VOP_STRATEGY(bp) is replaced by one of two new functions:
- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.
DEV_STRATEGY(bp) is used only for block-to-block device situations.
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.
Bump the kernel rev up to 1.6V
filesystem (and other things that needed to be fixed before the tests
would complete), to wit:
* Include the fs ident in the filehandle; improve stale filehandle checks.
* Change definition of blksize() to use the on-dinode size instead of
the inode's i_size, so that fsck_lfs will work properly again.
* Use b_interlock in lfs_vtruncbuf.
* Postpone dirop reclamation until after the seglock has been released,
so that lfs_truncate is not called with the segment lock held.
* Don't loop in lfs_fsync(), just write everything and wait.
* Be more careful about the interlock/uobjlock in lfs_putpages: when we
lose this lock, we have to resynchronize dirtiness of pages in each
block.
* Be sure to always write indirect blocks and update metadata in
lfs_putpages; fixes a bug that caused blocks to be accounted to the
wrong segment.
64 bit block pointers, extended attribute storage, and a few
other things.
This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.
Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
of segments to mark. However, this may be much more than lfs_nseg.
Originally this wasn't a big problem, since only the structures in the
diskblock were changed, but nowadays there's a mirror of the segflags
in the in-core superblock. This problem caused the code to walk
way past the end of that allocated area, causing memory corruption
in other kernel structures. So, use lfs_nseg as the maximum, as it should be.
While here, simplify the loop; it had become an obfuscated piece of
code overtime.
when the filesystem is unmounted, relocking the Ifile when its lock is
draining. (We can't use vfs_busy() since the process is sleeping for a
good long time.) Clean up / organize lfs.h, while I'm here.
In lfs_update_single, assert that disk addresses are either negative, or
are still positive when converted to int32_t, to prevent recurrence of a
negative/positive block problem.