We may sleep in it, or even recurse, with softdeps. Instead, grab
the lock later, but check if noone else has beaten us to the VFS_VGET
operation, and if so, roll back getnewvnode using vinsheadfree, and
just return.
Change the space computation to appear to change the size of the *disk*
rather than the *bytes used* when more segment summaries and inode
blocks are written. Try to estimate the amount of space that these will
take up when more files are written, so the disk size doesn't change too
much.
Regularize error returns from lfs_valloc, lfs_balloc, lfs_truncate: they
now fail entirely, rather than succeeding half-way and leaving the fs in
an inconsistent state.
Rewrite lfs_truncate, mostly stealing from ffs_truncate. The old
lfs_truncate had difficulty truncating a large file to a non-zero size
(indirect blocks were not handled appropriately).
Unmark VDIROP on fvp after ufs_remove, ufs_rmdir, so these can be
reclaimed immediately: this vnode would not be written to disk again
anyway if the removal succeeded, and if it failed, no directory
operation occurred.
ufs_makeinode and ufs_mkdir now remove IN_ADIROP on error.
instead of keeping it always == 1. (The ifile version number is
increased on vfree.) May address PR #7213, but I haven't been able to
test thoroughly enough to say for sure.
references (locked for VOP_INACTIVE at the end of vrele) and it's okay.
Check the return value of lfs_vref where appropriate.
Fixes PR #s 10285 and 10352.
require it to be set via tunefs(8). Silently ignore it when doing
an update mount of a writeable filesystem, the FFS/softdep code isn't ready
for this yet.
the head of the inode free list (on the superblock) always matches the
rest of the free list (in the ifile).
Protect lfs_fragextend with seglock, to prevent the segment byte count
fudging from making its way to disk.
Don't try to inactivate dirop vnodes that are still in the middle of
their dirop (may address PR#10285).
* Move the clearing of IN_MODIFIED and IN_ACCESSED later, so they are not
cleared if the bread() failed.
* Explicitly set waitfor to 0 in the softdep case, if IN_MODIFIED is not
set (mirroring the bwrite()/bdwrite() decision).
case, which created inodes with dependencies, but no IN_* flag set,
so the dependencies were never flushed (after the waitfor check in
ffs_update was removed).
blocks are detached from the vnode at this point. When the dependencies are
broken to enable writing the blocks, the vnode will be regenerated. (The only
reason we sync buffers in this case is that they have to be detached from the
vnode.)
All the dirop vnops now mark the inodes with a new flag, IN_ADIROP, which
is removed as soon as the dirop is done (as opposed to VDIROP which stays
until the file is written). To address one issue raised in PR#9357.
queueing up buffers and awakening the MFS server process to do the I/O,
we do the I/O to the server process's address space directly using
facilities provided by UVM.
This makes it possible for buffers attempting to flush out while the
MFS is being unmounted to actually do the I/O, where before it would
fail if the server process wasn't in the MFS idle loop (i.e. had been
signaled and was attempting to exit).
Should fix kern/10122 (I can no longer reproduce the problem described
in the PR when running with these changes), and any number of other
MFS-related complaints made by people over time.
a set of flags ("flags"). Two flags are defined, UPDATE_WAIT and
UPDATE_DIROP.
Under the old semantics, VOP_UPDATE would block if waitfor were set,
under the assumption that directory operations should be done
synchronously. At least LFS and FFS+softdep do not make this
assumption; FFS+softdep got around the problem by enclosing all relevant
calls to VOP_UPDATE in a "if(!DOINGSOFTDEP(vp))", while LFS simply
ignored waitfor, one of the reasons why NFS-serving an LFS filesystem
did not work properly.
Under the new semantics, the UPDATE_DIROP flag is a hint to the
fs-specific update routine that the call comes from a dirop routine, and
should be wait for, or not, accordingly.
Closes PR#8996.
buffer cache flags, to marking the inode and/or indirect blocks with a
special disk address UNWRITTEN==-2 when a block is accounted for. (This
address is never written to disk, but only used in-core. This is essentially
the same method of block accounting as on the UBC branch, where the buffer
headers don't exist.) Make sure that truncation is handled properly,
especially in the case of holey files.
Fixes PR#9994.
superblock (whose disk address is stored in the primary superblock). Also,
refuse to mount a filesystem whose superblocks overlap or where the alt.
superblock has a lower disk address than the primary superblock.
Solves PR#10001.
- lfs_truncate extends the file if called with length > i_ffs_size;
- lfs_truncate errors out if called with length < 0;
- lfs_balloc block accounting corrected for the case of blocks read
into the cache before they exist on disk;
- mp->mnt_stat.f_iosize is initialized in lfs_mountfs.
an optimalization strategy change is logged into syslog. Default
is 0 (to not log). This replaces the recent not quite "right"
change to only log the change if kernel is compiled with DEBUG.
Resources are initialized still just once (on first call).
Add ufs_done(), which takes care of freeing all resources allocated in
ufs_init(). The resources are freed only when last user of the code exits.
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading.
For each leaf filesystem, add appropriate vfs_done routine.
Also remember how many times ffs_init() was called and do
the appropriate initialization on first call only. In ffs_done(),
destroy the resources when called by the last user of ffs code.
Change mfs to call ffs_init()/ffs_done() appropriately.
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading. This fixes random panics
when LKM for filesystem using pools was loaded and unloaded several
times.
For each leaf filesystem, add appropriate vfs_done routine.
For symlinks > 60 chars we were bzero'ing part of (struct inode) past the
actual inode struct, corrupting memory following the current (struct inode)
resuling in a 'panic: pool_get(lfsinopl): free list modified' later.
This could also be the cause of random panics. With this fix LFS seems to be
useable for me now.
mode and ownership bits are flushed to disk before the vnode is
reclaimed.
The check, introduced in the softdep merge, assumes that if no blocks
are dirty, no file data *or metadata* needs to be flushed to disk. This
is true of ffs, but is not true of lfs, and may not be true of other
filesystems.
Tested by myself and Bill Squier <groo@cs.stevens-tech.edu>.
1.4 branch.
* Use a separate per-fs lock, instead of ufs_hashlock, to protect the Inode
free list. This seems to prevent the "lockmgr: %d, not exclusive lock holder
%d, unlocking" message I was mis-attributing last night to an unlocked vnode
being passed to vrele.
* Change calling semantics of lfs_ifind, to give better error reporting:
If fed a struct buf, it can report the block number of the offending inode
block as well as the inode number.
* Back out rev 1.10 of lfs_subr.c, since the replacement code was slightly
uglier while being functionally identical.
* Make lfs_vunref use the same free list convention as vrele/vput, so that
vget does not remove vnodes from a hash list they are not on.
from being inactivated under some conditions. Removed vnodes are now
inactivated when the VDIROP flag is cleared, and to prevent block
accounting problems this clearing has been postponed until
lfs_segunlock.
This prevents a rare condition in which Ifile "ifile" blocks, that is, the
blocks of the ifile which point VOP_VGET at the inode block containing the
requested inode, from being "unwritten" when cleaning during intense disk
activity.
dirop writing. In particular, lfs_writevnodes now writes all buffers from
a flushed vnode whether cleaning or not, and the same with the Ifile; and
lfs_segwrite does not attempt to write data from other non-cleaning vnodes,
even if a vnode is being flushed.
(Previously buffers could be marked dirty by the cleaner, and possibly by
other means.)
Also check for softdep mount in vfs_shutdown before trying to bawrite
buffers, since other filesystems don't need it and lfs doesn't bawrite.
(This fragment reviewed by fvdl.)
Partially addresses PR#8964.
depend on the initial lookups being doen with SAVESTART), and b) check
return values for errors.
Should fix PR 8491 for ufs - two simultaneous identical renames will now
work correctly. One will succeed, one will fail.
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.
Bump version number to 1.4O
system crashed, inodes could be allocated that were not referenced. (Though
not a serious problem, it evidences itself in phase 4 of fsck_lfs.) Fix
this by marking if_daddr with UNASSIGNED before the inodes are actually
written; at mount time the ifile is checked for UNASSIGNED entries and
any that are found are linked back into the free list. (The latter
functionality should move into the roll-forward agent when it materializes.)
post-mortem of a production machine. Also, take the active dirop
count off of the fs and make it global (since it is measuring a global
resource) and tie the threshold value LFS_MAXDIROP to desiredvnodes.
fail, because the particular block being requested was always in the cache
(although other routines that cannot afford to call lfs_check have in the
meantime stuffed the cache full of dirty blocks). Partially addresses PR 8383.
not set, unlock the vnode before calling the device's close routine and
relock it after it returns. tty close routines will sleep waiting for
buffers to drain, which won't happen often times as the other side needs
to grab the vnode lock first.
Make all unmount routines lock the device vnode before calling VOP_CLOSE().
filesystem. In particular,
- Fix mknod deadlock, described in PR 8172.
- Enable lfs_mountroot.
- Make lfs_writevnodes treat filesystems mounted on lfs device nodes properly,
by flushing that device rather than trying to add blocks to the device inode.
This, in combination with lfs boot blocks, will allow operation of an all-lfs
system.
call with F_FSCTL set and F_SETFL calls generate calls to a new
fileop fo_fcntl. Add genfs_fcntl() and soo_fcntl() which return 0
for F_SETFL and EOPNOTSUPP otherwise. Have all leaf filesystems
use genfs_fcntl().
Reviewed by: thorpej
Tested by: wrstuden
a bug in fragment extension that could run the count negative. Also, don't
overcount for inodes, and don't count segment summaries. Thus, for empty
segments the live bytes count should now be exactly zero.
will DTRT with vnodes marked VDIROP. In particular, the message
"flushing VDIROP" will no longer appear, and the filesystem will remain
stable in the event of a crash.
This was particularly a problem with NFS-exported LFSes, since fsync
was called on every file close.
if the version number is higher than we know about. This allows, e.g.,
changes in the format of the ifile, segment size restrictions and boundaries,
etc., which would not affect existing fields in the superblock, but which
would drastically affect the filesystem, to be smoothly integrated at a
later date.
on (nodes which are not marked IN_MODIFIED/IN_CLEANING, but which have dirty
buffers), by marking them with the appropriate flag if dirtybuffers were added
while the write was in progress.
conditions. Also change the default setting of lfs_clean_vnhead to 0, which
seems to make the locking problems go away (although this is difficult to
test as I can't reliably reproduce them).
then immediately reloaded, their dinodes were located in an inode block
which was not on disk at the advertized location, nor in the cache (although
it would be flushed to disk next segment write). Fix this by using getblk()
instead of lfs_newbuf() for inode blocks.
for the first write. If this is not done, the cleaner may try to clean the
current segment out from under the writer if the filesystem is mounted after
a crash (or any other time that the dirty:clean segment ration is high enough).
* The MNT_UPDATE case had a null pointer dereference. (This is a good example
of why blindly adding bogus initializiers is a FUNDAMENTALLY BAD IDEA!)
* Make sure the whole ufsmount is zeroed, as the export code relies on this.
* If we decided to use the second/alternate superblock, make sure to copy the
in-core version from the right buffer.
Also, reenable NFS exporting.
in turn forces a flush of the vnode, whether or not it is involved in a dirop.
(This can happen during a remove or rmdir, when the directory is shrunk.)
Because of the nature of dirops, however, flushing a vnode involved in a dirop
is disallowed (and was marked with a panic). This patch has lfs_truncate
call a specialized vinvalbuf that only invalidates buffers following the new
end-of-file, and thus does not require a flush. Also the panic is demoted,
in case I missed any other path to lfs_vflush.
namely, toggle whether vnodes loaded only for cleaning (as opposed to
normal filesystem use) are freed to the *head* of the vnode free list,
rather than the tail. This should avoid a possible cache flushing
effect, if the cleaner cleans a segment containing a large number of
live inodes.
dirop is completely written to disk. This means that ordinary calls to
ufs vnops which would ordinarily call VOP_INACTIVE through vrele/vput,
don't. This patch detects that condition after such vnops have been
run, and calls VOP_INACTIVE if it would ordinarily have been called by
the ufs call.
if we are short on vnodes, lfs_vflush from another process can grab a
vnode that lfs_markv has already processed but not yet written; but
lfs_markv holds the seglock. When lfs_vflush gets around to writing it,
the context for copyin is gone. So, now lfs_markv calls copyin itself,
rather than having lfs_writeseg do it.
the LFS since the 4.4lite2 code was merged into NetBSD.
TODO updated to remove everything marked DONE in 4.4, and add in a list
of more current things to do.
Get rid of comments about the cleaner syscall code and missing fragment
support from README.
include:
- DIROP segregation is enabled, and greater care is taken
to make sure that a checkpoint completes. Fsck is not
needed to remount the filesystem.
- Several checks to make sure that the LFS subsystem does not
overuse various resources (memory, in particular).
- The cleaner routines, lfs_markv in particular, are completely
rewritten. A buffer overflow is removed. Greater care is taken
to ensure that inodes come from where lfs_cleanerd say they come
from (so we know nothing has changed since lfs_bmapv was called).
- Fragment allocation is fixed, so that writes beyond end-of-file
do the right thing.
minor of libc and the major of libutil). For little-endian architectures
merge the bnswap() assembly versions with nto* and hton* using symbols
aliasing. Use symbol renaming for the bswap function in this case to avoid
namespace pollution.
Declare bswap* in machine/bswap.h, not machine/endian.h. For little-endian
machines, common code for inline macros go in machine/byte_swap.h
Sync libkern with libc.
Adjust #include in kernel sources for machine/bswap.h.
Make ext2fs_init() call ufs_init(). it was doing the init by itself,
testing for extern done != 0. This bug was hidden by the fact that
ext2fs_init() is called before ffs_init().