so that they get reused with a invalid pointer to a mount structure.
As a workaround, free the vnodes used to create the in-filesystem journal
immediately.
pathbuf object passed to namei as work space instead. (For now a pnbuf
pointer appears in struct nameidata, to support certain unclean things
that haven't been fixed yet, but it will be going away in the future.)
This removes the need for the SAVENAME and HASBUF namei flags.
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.
Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).
The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.
the earlier change caused data corruption by freeing pages
without invaliding their mappings. instead of the trylock/retry,
just take the genfs-node lock before calling VOP_GETPAGES()
and pass a new flag to tell it that we're already holding this lock.
Note: there is a billion ways to make the kernel panic by trying
to mount a garbage file system and I don't imagine we'll ever get
close to fixing even half of them. However, for this one failing
gracefully is a bonus since Xen DomU only does 32k MAXBSIZE and
the 64k MAXBSIZE file systems are out there (PR port-xen/43727).
Tested by compiling sys/rump with CPPFLAGS+=-DMAXPHYS=32768 (all
tests in tests/fs still pass). I don't know how we're going to
translate this into an easy regression test, though. Maybe with
a hacked newfs?
in genfs_do_putpages() and uao_put().
Use 'v_uobj.uo_npages' to check for an empty memq.
Put some assertions where these marker pages may not appear.
Ok: YAMAMOTO Takashi <yamt@netbsd.org>
in the vnode. All LK_* flags move from sys/lock.h to sys/vnode.h. Calls
to vlockmgr() in file systems get replaced with VOP_LOCK() or VOP_UNLOCK().
Welcome to 5.99.34.
Discussed on tech-kern.
- VOP_LOCK(vp, flags): Limit the set of allowed flags to LK_EXCLUSIVE,
LK_SHARED and LK_NOWAIT. LK_INTERLOCK is no longer allowed as it
makes no sense here.
- VOP_ISLOCKED(vp): Remove the for some time unused return value
LK_EXCLOTHER. Mark this operation as "diagnostic only".
Making a lock decision based on this operation is no longer allowed.
Discussed on tech-kern.
system drivers where it was missing from and fixes one buggy
implementation. The arguably weird semantics of the check are
maintained (v_size vs. va_bytes, overwrite).
* XXX: Get extra reference to LFS vfsops. This prevents unload,
* but also prevents kernel panic due to text being unloaded
* from below lfs_writerd. When lfs_writerd can exit, remove
* this!!!
*/
Could use UFS_OPS, but:
1) the lfs kernel module depends on full ffs already anway
2) lfs is being split from ufs, so this will automatically
go away soon
3) chances of anyone wanting an lfs-only kernel are pretty slim
4) i'm too lazy to figure out how to test ffs_snapgone() is
still called properly if I change the call ;)
new helper function.
Use this information to query physical sector sizes for WAPBL
instead of hardcoded defaults.
No longer limits physical sector sizes to 512 bytes.
- drop the notion of frags (LFS fragments) vs fsb (FFS fragments)
The code uses a complicated unity function that just makes the
code difficult to understand.
- support larger sector sizes. Fix disk address computations
to use DEV_BSIZE in the kernel as required by device drivers
and to use sector sizes in userland.
- Fix several locking bugs in lfs_bio.c and lfs_subr.c.
allocated to extend the file to the new size. Releasing all pages
may release pages that contains previously-written data not yet flushed
to disk. Should fix PR kern/35704
- {ffs,lfs,ext2fs}_truncate(): Even if the inode's size is the same as
the new length, call uvm_vnp_setsize(). *_truncate() may have been
called by *_write() in the error path (e.g. block allocation failure
because of quota of file system full), and at this point v_writesize
has been set to the desired size of the file and not reverted to the
old size. Not adjusting v_writesize to the real size cause
genfs_do_io() to write to disk past the real end of the file.
Unlike other filesystems this has some side issues because
the shift values are stored in the superblock and because
userland utitlies share the same fsbtodb macros.
-> the kernel now ignores the value stored in the superblock.
-> the macro adaption is only done for defined(_KERNEL) code.
getcleanvnode() sets v_type to VNON after releasing v_interlock.
So the thread doing quotaon(), quotaoff() or qsync() could vget()
a vnode which is being recycled in getcleanvnode(), after is has
been cleaned and v_interlock released, but before v_type has been
reset, leading to KASSERT(vp->v_usecount == 1) firing in
getnewvnode(), or qsync() dereferending a NULL pointer as in
PR kern/42205.
Fix by using the same tests as other ffs function traversing the mount
list: also check for VTOI(vp) == NULL, and VI_XLOCK in addition
to VI_CLEAN.
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.
no functional change
reference while we were getting the v_interlock.
vget(): attempt prevent it from returning a clean vnode:
if the vnode is being inactivated (by vrelel()), wait for
vrelel() to complete (or return EBUSY if we can't wait), and return
ENOENT if the vnode has been vclean'ed by vrelel()
Fix kern/41147 in a better way, hopefully fix other related race conditions.
hack is ffs_sync().
- Use the generic lock operations for ffs.
- Change ffs_sync() to omit the vnode lock while suspending.
Reviewed by: Antti Kantee <pooka@netbsd.org>
vput(vp);
error = VFS_VGET(vp->v_mount, ...);
just isn't right. Because of vnode caching this *probably* never bit
anyone, except maybe under very heavy load, but still.
Note that the race also exists between 2 nfs client, one of them doing the rm.
In ufs_ihashget(), vget() can return a vnode that has been vclean'ed because
vget() can sleep. After vget returns, check that vp is still connected with
ip, and that ip still points to the inode we want. This fix the NULL
pointer dereference in ufs_fhtovp() I've been seeing on a NFS server.
XXX I have no idea why using vput() instead of
vlockmgr(vp->v_vnlock, LK_RELEASE); vrele(vp); does not work.
> Fix bug introduced in revision 1.174(*) where a NULL fspec with an MNT_UPDATE
> command would always return EINVAL. This broke fsck on root, where fsck'ing
> a dirty root would always return an error causing rc to resort in a reboot.
(*) This is "Apply the NFS exports list rototill patch" change
in ext2fs_vfsops.c rev 1.91.
> Change ffs_mount, in MNT_UPDATE case, to check dev_t's for equality
> instead of just vnode pointers. Fixes erroneous "does not match mounted
> device" errors from mount(8) in the presence of MFS /dev, init.root, &c.
Fixes LOCKDEBUG panic which is the same one mentioned in PR kern/41078
on trying to mount_ext2fs against a raw device, while that panic
seems to have another route cause around module_autoload() in
sys/miscfs/specfs/spec_vnops.c:spec_open().
command would always return EINVAL. This broke fsck on root, where fsck'ing
a dirty root would always return an error causing rc to resort in a reboot.
check_console, veriexecclose, veriexec_delete, veriexec_file_add,
emul_find_root, coff_load_shlib (sh3 version), coff_load_shlib,
compat_20_sys_statfs, compat_20_netbsd32_statfs,
ELFNAME2(netbsd32,probe_noteless), darwin_sys_statfs,
ibcs2_sys_statfs, ibcs2_sys_statvfs, linux_sys_uselib,
osf1_sys_statfs, sunos_sys_statfs, sunos32_sys_statfs,
ultrix_sys_statfs, do_sys_mount, fss_create_files (3 of 4),
adosfs_mount, cd9660_mount, coda_ioctl, coda_mount, ext2fs_mount,
ffs_mount, filecore_mount, hfs_mount, lfs_mount, msdosfs_mount,
ntfs_mount, sysvbfs_mount, udf_mount, union_mount, sys_chflags,
sys_lchflags, sys_chmod, sys_lchmod, sys_chown, sys_lchown,
sys___posix_chown, sys___posix_lchown, sys_link, do_sys_pstatvfs,
sys_quotactl, sys_revoke, sys_truncate, do_sys_utimes, sys_extattrctl,
sys_extattr_set_file, sys_extattr_set_link, sys_extattr_get_file,
sys_extattr_get_link, sys_extattr_delete_file,
sys_extattr_delete_link, sys_extattr_list_file, sys_extattr_list_link,
sys_setxattr, sys_lsetxattr, sys_getxattr, sys_lgetxattr,
sys_listxattr, sys_llistxattr, sys_removexattr, sys_lremovexattr
All have been scrutinized (several times, in fact) and compile-tested,
but not all have been explicitly tested in action.
XXX: While I haven't (intentionally) changed the use or nonuse of
XXX: TRYEMULROOT in any of these places, I'm not convinced all the
XXX: uses are correct; an audit might be desirable.