Commit Graph

1789 Commits

Author SHA1 Message Date
hannken 53b57e3385 Extend the range of fstrans transactions to a sequence of vnode operations
on a locked vnode.  This leaves a suspended file system and therefore a
snapshot with either all or no operations of such a sequence done.
2010-12-27 18:49:42 +00:00
mlelstv 6c899f7536 For update mounts the root vnode is already in use and we must not
free it. Since the mount persists even when the update fails,
this is not a problem either.
2010-12-24 13:38:57 +00:00
mlelstv 5eee906941 mount(2) doesn't remove vnodes from the freelist in the error path,
so that they get reused with a invalid pointer to a mount structure.

As a workaround, free the vnodes used to create the in-filesystem journal
immediately.
2010-12-23 14:43:37 +00:00
matt 6a66466f0c Move counting of faults, traps, intrs, soft[intr]s, syscalls, and nswtch
from uvmexp to per-cpu cpu_data and move them to 64bits.  Remove unneeded
includes of <uvm/uvm_extern.h> and/or <uvm/uvm.h>.
2010-12-20 00:25:23 +00:00
eeh 3c8b71849f Byebye deadlock. 2010-12-18 00:01:46 +00:00
hannken 3b57b82b8f Keep a reference to the snapshot vnode until it gets removed from the
snapshot list.
2010-12-12 10:29:25 +00:00
hannken f29d5492f8 syncsnap: Use bbusy() to take a buffer from v_dirtyblkhd. 2010-12-12 10:28:22 +00:00
dholland 14402d0ff1 Abolish the SAVENAME and HASBUF flags. There is now always a buffer,
so the path in a struct componentname is now always valid during VOP
calls.
2010-11-30 10:43:01 +00:00
dholland d4eb05390d Abolish struct componentname's cn_pnbuf. Use the path buffer in the
pathbuf object passed to namei as work space instead. (For now a pnbuf
pointer appears in struct nameidata, to support certain unclean things
that haven't been fixed yet, but it will be going away in the future.)

This removes the need for the SAVENAME and HASBUF namei flags.
2010-11-30 10:29:57 +00:00
dholland 8f6ed30d57 Introduce struct pathbuf. This is an abstraction to hold a pathname
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.

Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).

The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.
2010-11-19 06:44:33 +00:00
chs fca58884f4 replace the earlier workaround for PR 40389 with a better fix.
the earlier change caused data corruption by freeing pages
without invaliding their mappings.  instead of the trylock/retry,
just take the genfs-node lock before calling VOP_GETPAGES()
and pass a new flag to tell it that we're already holding this lock.
2010-09-01 16:56:19 +00:00
hannken 559469276d ffs_reclaim: don't free an already free inode. This may happen when
ffs_fhtovp() gets a free inode and releases it.
2010-08-12 07:41:49 +00:00
pooka 6e5ca1ed9e add a linefeed to the previous 2010-08-09 17:12:18 +00:00
pooka 5140c8efdf Return error if we try to mount a file system with block size > MAXBSIZE.
Note: there is a billion ways to make the kernel panic by trying
to mount a garbage file system and I don't imagine we'll ever get
close to fixing even half of them.  However, for this one failing
gracefully is a bonus since Xen DomU only does 32k MAXBSIZE and
the 64k MAXBSIZE file systems are out there (PR port-xen/43727).

Tested by compiling sys/rump with CPPFLAGS+=-DMAXPHYS=32768 (all
tests in tests/fs still pass).  I don't know how we're going to
translate this into an easy regression test, though.  Maybe with
a hacked newfs?
2010-08-09 15:50:13 +00:00
hannken bb874d13b8 Free the on disk inodes in the reclaim routine. 2010-08-04 10:43:53 +00:00
hannken c84e81cad1 Add vm page flag PG_MARKER and use it to tag dummy marker pages
in genfs_do_putpages() and uao_put().
Use 'v_uobj.uo_npages' to check for an empty memq.
Put some assertions where these marker pages may not appear.

Ok: YAMAMOTO Takashi <yamt@netbsd.org>
2010-07-29 10:54:50 +00:00
hannken 3a7edffde9 ext2fs,ffs: free on disk inodes in the reclaim routine.
Remove now unneeded vnode flag VI_FREEING.

Welcome to 5.99.38.

Ok: Andrew Doran <ad@netbsd.org>
2010-07-28 11:03:47 +00:00
jakllsch a9e9b32ddd Make DEBUG_EXT2 work with 64-bit size_t. 2010-07-27 05:15:56 +00:00
hannken fb62bef947 Make holding v_interlock mandatory for callers of vget().
Announced some time ago on tech-kern.
2010-07-21 17:52:09 +00:00
hannken 245651a23d Remove vlockmgr(). Generic vnode lock operations now use a rwlock located
in the vnode.  All LK_* flags move from sys/lock.h to sys/vnode.h.  Calls
to vlockmgr() in file systems get replaced with VOP_LOCK() or VOP_UNLOCK().

Welcome to 5.99.34.

Discussed on tech-kern.
2010-07-01 13:00:54 +00:00
hannken 07c29bfbcb Undo last commit and don't try to lock vnodes in lfs_unmark_dirop()
as we may deadlock trying to write the superblock.

Should fix PR #43503 Can't create device nodes on LFS.
2010-06-25 10:03:52 +00:00
hannken 1423e65b26 Clean up vnode lock operations pass 2:
VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
2010-06-24 12:58:48 +00:00
hannken f6c438ba23 Clean up vnode lock operations:
- VOP_LOCK(vp, flags): Limit the set of allowed flags to LK_EXCLUSIVE,
   LK_SHARED and LK_NOWAIT.  LK_INTERLOCK is no longer allowed as it
   makes no sense here.

- VOP_ISLOCKED(vp): Remove the for some time unused return value
  LK_EXCLOTHER.  Mark this operation as "diagnostic only".
  Making a lock decision based on this operation is no longer allowed.

Discussed on tech-kern.
2010-06-24 07:54:46 +00:00
hannken 67c30e0802 Initialize the initial snap block list's count.
From Antti Kantee <pooka@netbsd.org>.
2010-06-02 09:56:59 +00:00
pooka 5e918e3804 Add a comment describing an observed boom-crash-burn problem in
the code.  Fixing it will require a full tank of gas, half a pack
of cigarettes, sunglasses, darkness, and most importantly:
someone else.
2010-05-25 11:02:07 +00:00
dbj 1da7b01e3b switch from 4 clause to 2 clause BSD license. 2010-04-24 19:58:13 +00:00
pooka 0c20c076ce Enforce RLIMIT_FSIZE before VOP_WRITE. This adds support to file
system drivers where it was missing from and fixes one buggy
implementation.  The arguably weird semantics of the check are
maintained (v_size vs. va_bytes, overwrite).
2010-04-23 15:38:46 +00:00
hannken bceb3e2b5b Add fstrans transactions to ufs_close(), ufs_getattr(), ufs_chmod()
and ufs_chown().  These functions change file system state.
2010-04-13 09:27:58 +00:00
pooka 242bf1c3e7 Stop exposing fifofs internals and leave only fifo_vnodeop_p visible. 2010-03-29 13:11:32 +00:00
hannken afab4c313c Allow ufs_inactive() while a file system is suspending. Removes a possible
deadlock between vrele() and ffs_sync() during suspension.
2010-03-15 09:20:10 +00:00
pooka d1d083d372 fs_lfs.h is no longer necessary 2010-03-02 19:59:09 +00:00
pooka 5de34505ef load lfs syscalls in modload 2010-03-02 19:34:49 +00:00
pooka 96798ffe71 /*
* XXX: Get extra reference to LFS vfsops.  This prevents unload,
 * but also prevents kernel panic due to text being unloaded
 * from below lfs_writerd.  When lfs_writerd can exit, remove
 * this!!!
 */
2010-03-02 19:30:34 +00:00
pooka 1a66ba1950 Remove fs_mfs.h from users because it is now unnecessary and don't
generate fs_mfs.h anymore.
2010-03-02 17:28:08 +00:00
pooka d6f18673d1 Make mfs_initminiroot() mandatory. Allows to remove #ifdef MFS. 2010-03-02 17:20:02 +00:00
pooka 4f49fb9915 Don't generate unused fs_thefs.h headers. 2010-03-02 16:43:48 +00:00
pooka 2e6110dc37 Remove last #ifdef FFS. Do this by making lfs include ffs.
Could use UFS_OPS, but:

  1) the lfs kernel module depends on full ffs already anway
  2) lfs is being split from ufs, so this will automatically
     go away soon
  3) chances of anyone wanting an lfs-only kernel are pretty slim
  4) i'm too lazy to figure out how to test ffs_snapgone() is
     still called properly if I change the call ;)
2010-03-02 15:18:22 +00:00
pooka 2dfc1bdbed scortch ufs_vnops.c cargo cult headers 2010-03-02 14:45:55 +00:00
mlelstv ef95b640b0 Store physical block numbers in superblock that point to the journal.
Calculate position of both commit headers correctly for disks with
large sectors.
Correct calculation of circular buffer size.
2010-02-27 12:04:19 +00:00
mlelstv 6d6d11f709 Replace individual queries for partition information with
new helper function.
Use this information to query physical sector sizes for WAPBL
instead of hardcoded defaults.
No longer limits physical sector sizes to 512 bytes.
2010-02-23 20:41:41 +00:00
mlelstv 03c7f48412 For the UVM_PAGE_TRKOWN test do not require that the relevant pages
must exist.
2010-02-21 13:55:58 +00:00
eeh 836736c39d Fix root filesystem support. 2010-02-18 01:14:00 +00:00
mlelstv 7974872552 Three changes in a single commit.
- drop the notion of frags (LFS fragments) vs fsb (FFS fragments)
  The code uses a complicated unity function that just makes the
  code difficult to understand.

- support larger sector sizes. Fix disk address computations
  to use DEV_BSIZE in the kernel as required by device drivers
  and to use sector sizes in userland.

- Fix several locking bugs in lfs_bio.c and lfs_subr.c.
2010-02-16 23:20:30 +00:00
mlelstv b10f49caa8 There is no code left that uses disk size data, so don't query it. 2010-02-11 19:50:34 +00:00
mlelstv b44bbb30f5 There is no code left that uses disk size data, so don't query it.
This also failed when querying the simulated block device from mfs.
Fixes PR kern/42782.
2010-02-11 00:06:16 +00:00
bouyer be891954ad - ufs_balloc_range(): on error, only PG_RELEASED the pages that were
allocated to extend the file to the new size. Releasing all pages
  may release pages that contains previously-written data not yet flushed
  to disk. Should fix PR kern/35704
- {ffs,lfs,ext2fs}_truncate(): Even if the inode's size is the same as
  the new length, call uvm_vnp_setsize(). *_truncate() may have been
  called by *_write() in the error path (e.g. block allocation failure
  because of quota of file system full), and at this point v_writesize
  has been set to the desired size of the file and not reverted to the
  old size. Not adjusting v_writesize to the real size cause
  genfs_do_io() to write to disk past the real end of the file.
2010-02-07 17:12:40 +00:00
mlelstv bb2d547d2f Correct addressing of superblock updates. 2010-02-05 20:03:36 +00:00
mlelstv 748a0d77b1 Fix block shift to work with different device block sizes.
Unlike other filesystems this has some side issues because
the shift values are stored in the superblock and because
userland utitlies share the same fsbtodb macros.

-> the kernel now ignores the value stored in the superblock.
-> the macro adaption is only done for defined(_KERNEL) code.
2010-01-31 10:54:10 +00:00
mlelstv 5e340cd634 Replace individual queries for partition information with
new helper function.
2010-01-31 10:50:23 +00:00
mlelstv 928ded5f56 Fix block shift to work with different device block sizes. 2010-01-31 10:37:57 +00:00
mlelstv ba0d32752c Replace individual queries for partition information with
new helper function.
2010-01-31 10:30:40 +00:00
bouyer aa0e1a2ecf vclean() actually sets v_tag to VT_NON but doesn't touch v_type.
getcleanvnode() sets v_type to VNON after releasing v_interlock.
So the thread doing quotaon(), quotaoff() or qsync() could vget()
a vnode which is being recycled in getcleanvnode(), after is has
been cleaned and v_interlock released, but before v_type has been
reset, leading to KASSERT(vp->v_usecount == 1) firing in
getnewvnode(), or qsync() dereferending a NULL pointer as in
PR kern/42205.
Fix by using the same tests as other ffs function traversing the mount
list: also check for VTOI(vp) == NULL, and VI_XLOCK in addition
to VI_CLEAN.
2010-01-15 19:46:35 +00:00
pooka c3183f3251 The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase).  Plenty of mix'n match upper/lowercase has creeped
into the tree since then.  Nuke the macros and convert all callsites
to lowercase.

no functional change
2010-01-08 11:35:07 +00:00
eeh 9d21c97885 Fix some more hangs and deadlocks. 2009-12-07 04:12:10 +00:00
tsutsui 5517f8a4c3 Add definitions for more reserved inodes. 2009-11-27 11:16:54 +00:00
yamt a646187eba use NULL instead of 0 for pointers 2009-11-18 12:22:48 +00:00
eeh bfd5cc9df2 This should fix a deadlock. 2009-11-17 22:49:24 +00:00
pooka 8454e30192 Create unwind log in global variable instead of automatic variable.
memory leak spotted by njoly's valgrind run
2009-11-17 17:08:57 +00:00
pooka 2e098f1f4e ... actually, define compat only for the kernel. Userlandia should
see only one version of the interfaces.
2009-11-05 17:16:36 +00:00
pooka 5207b24e34 Include compat/sys/time_types.h instead of compat/sys/time.h.
Fixes lint drama with interface name collisions.
2009-11-05 16:59:55 +00:00
pooka c584ccaa0d Include compat code by default. 2009-11-05 11:54:49 +00:00
bouyer 6b8161200e getcleanvnode(): don't vclean() the vnode if it has gained another
reference while we were getting the v_interlock.
vget(): attempt prevent it from returning a clean vnode:
  if the vnode is being inactivated (by vrelel()), wait for
  vrelel() to complete (or return EBUSY if we can't wait), and return
  ENOENT if the vnode has been vclean'ed by vrelel()
Fix kern/41147 in a better way, hopefully fix other related race conditions.
2009-11-05 08:18:02 +00:00
hannken d35df7da38 Now that softdep has left the tree the only place needing the ffs_lock()
hack is ffs_sync().

- Use the generic lock operations for ffs.
- Change ffs_sync() to omit the vnode lock while suspending.

Reviewed by: Antti Kantee <pooka@netbsd.org>
2009-11-04 09:45:05 +00:00
christos 2ef9c80a16 compile without COMPAT_50 2009-10-30 00:53:29 +00:00
eeh f50f807334 Fix up numoutput accounting. 2009-10-29 18:20:11 +00:00
christos fc0e85c95e PR/42246: NAKAJIMA Yoshihiro: provide COMPAT_50 for LFS 2009-10-29 17:10:32 +00:00
pooka 447898cbb0 update i_uid and i_gid after chown 2009-10-21 17:37:21 +00:00
bouyer 6d07b400dc Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !
2009-10-19 18:41:07 +00:00
hannken 4e246abd4c No longer abuse TAILQ internal data. 2009-10-15 10:05:48 +00:00
hannken df5d842a2c ufs_rmdir(): move fstrans_done() after vput(). No more unlinked and
zero-sized directory inodes in snapshots.
2009-10-14 09:40:27 +00:00
hannken 8deb3262b5 Fix a deadlock where fscow_disestablish() blocks because outstanding
copy-on-write operations wait for si_snaplock.
2009-10-13 12:38:14 +00:00
rmind ae2795775d ufsdirhash_recycle(): modify ufs_dirhashmem atomically. 2009-10-05 23:48:08 +00:00
dholland 6f7fa47c46 Avoid nasal demons. Code of the form
vput(vp);
   error = VFS_VGET(vp->v_mount, ...);

just isn't right. Because of vnode caching this *probably* never bit
anyone, except maybe under very heavy load, but still.
2009-09-28 00:39:03 +00:00
bouyer 7de71fb523 PR kern/41147: race between nfsd and local rm
Note that the race also exists between 2 nfs client, one of them doing the rm.
In ufs_ihashget(), vget() can return a vnode that has been vclean'ed because
vget() can sleep. After vget returns, check that vp is still connected with
ip, and that ip still points to the inode we want. This fix the NULL
pointer dereference in ufs_fhtovp() I've been seeing on a NFS server.

XXX I have no idea why using vput() instead of
vlockmgr(vp->v_vnlock, LK_RELEASE); vrele(vp); does not work.
2009-09-20 14:00:24 +00:00
bouyer b9440228c5 If the WAPBL journal can't be read (ffs_wapbl_replay_start() fails),
mount the filesystem anyway if MNT_FORCE is present.
This allows to still boot single-user a system with a corrupted
WAPBL on /, and so get a chance to run fsck to fix it.
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
2009-09-13 14:30:21 +00:00
bouyer 32992733fa Allow tunefs to clear any type of WAPBL log, not only in-filesystem
ones. Discussed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
2009-09-13 14:13:23 +00:00
tsutsui e7713433d4 Move declaration of ufs_hashlock into <ufs/ufs_extern.h> from each c source. 2009-09-13 05:17:36 +00:00
tsutsui 80d52b1bc6 Use proper macro, some KNF, fix typo. 2009-09-12 14:59:59 +00:00
tsutsui 58c74e6160 Whitespace nits. 2009-09-12 11:35:46 +00:00
tsutsui a811b3a680 Migrate from u_intNN_t to uintNN_t. 2009-09-12 11:27:39 +00:00
tsutsui d592174fdd Reduce diffs a bit between ext2fs_reload() and ffs_reload(). 2009-09-12 02:50:38 +00:00
tsutsui 2620184bc7 Add a missed brelse(9) call after bread(9) in ext2fs_reload().
This may close PR kern/28712 (ext2fs hang on mount after fsck).
2009-09-12 02:32:14 +00:00
tsutsui 91f14b108d Pull a fix from ffs_vfsops.c rev 1.248:
> Fix bug introduced in revision 1.174(*) where a NULL fspec with an MNT_UPDATE
> command would always return EINVAL. This broke fsck on root, where fsck'ing
> a dirty root would always return an error causing rc to resort in a reboot.
(*) This is "Apply the NFS exports list rototill patch" change
    in ext2fs_vfsops.c rev 1.91.
2009-09-12 02:25:39 +00:00
tsutsui f551f24480 Pull a fix for mount function from ffs_vfsops.c rev1.186:
> Change ffs_mount, in MNT_UPDATE case, to check dev_t's for equality
> instead of just vnode pointers.  Fixes erroneous "does not match mounted
> device" errors from mount(8) in the presence of MFS /dev, init.root, &c.
2009-09-12 01:43:52 +00:00
tsutsui f2831b63aa Fix botch around argument check in ext2fs_mount(). Taken from ffs_vfsops.c.
Fixes LOCKDEBUG panic which is the same one mentioned in PR kern/41078
on trying to mount_ext2fs against a raw device, while that panic
seems to have another route cause around module_autoload() in
sys/miscfs/specfs/spec_vnops.c:spec_open().
2009-09-11 15:59:07 +00:00
wiz 8b28c44203 Add missing parenthesis in #ifdef LFS_USE_B_INVAL.
From Henning Petersen in PR 41841.
2009-08-07 13:58:38 +00:00
pooka b828513eea Compensate v_numoutput & nestbuf for lfs's rather peculiar I/O habits. 2009-08-05 15:39:57 +00:00
pooka e7780eca66 remember to nestiobuf_done() too 2009-08-05 14:37:01 +00:00
pooka 307631b2c8 Use nestiobuf instead of homerolled equivalent. 2009-08-05 14:09:26 +00:00
bouyer 94fb626feb Fix previous: mutex_destroy() the right mutex 2009-08-02 20:50:33 +00:00
bouyer 9bbfba8140 Add missing mutex_destroy() before pool_cache_put(). Prevents a
"Mutex error: lockdebug_alloc: already initialized" panic.
2009-08-01 09:08:53 +00:00
pooka 7ec7a51957 Don't free extattr resources until it is certain that unmount
succeeds.  Also, "unmount system call" -> "unmount vfs operation"
in comment just so that our comments aren't 15+ years outdated.
2009-07-31 20:58:50 +00:00
pooka 7982dc729e Restore error behaviour bulldozed in rev 1.246.
might fix PR kern/41769
2009-07-23 01:10:02 +00:00
dholland b58d0cd33a typo in comment 2009-07-22 04:49:19 +00:00
dholland 22ba08b022 minor knf 2009-07-19 04:16:23 +00:00
dholland 0b98e26158 typo in comment 2009-07-19 03:39:14 +00:00
christos 48e6aff258 Fix bug introduced in revision 1.174 where a NULL fspec with an MNT_UPDATE
command would always return EINVAL. This broke fsck on root, where fsck'ing
a dirty root would always return an error causing rc to resort in a reboot.
2009-07-06 16:07:18 +00:00
elad 009f5d2f88 Where possible, extract the file-system's access() routine to two internal
functions: the first checking if the operation is possible (regardless of
permissions), the second checking file-system permissions, ACLs, etc.

Mailing list reference:

	http://mail-index.netbsd.org/tech-kern/2009/06/21/msg005311.html
2009-07-03 21:17:40 +00:00
dholland effcf1af5c Convert 67 namei call sites to use namei_simple, in these functions:
check_console, veriexecclose, veriexec_delete, veriexec_file_add,
emul_find_root, coff_load_shlib (sh3 version), coff_load_shlib,
compat_20_sys_statfs, compat_20_netbsd32_statfs,
ELFNAME2(netbsd32,probe_noteless), darwin_sys_statfs,
ibcs2_sys_statfs, ibcs2_sys_statvfs, linux_sys_uselib,
osf1_sys_statfs, sunos_sys_statfs, sunos32_sys_statfs,
ultrix_sys_statfs, do_sys_mount, fss_create_files (3 of 4),
adosfs_mount, cd9660_mount, coda_ioctl, coda_mount, ext2fs_mount,
ffs_mount, filecore_mount, hfs_mount, lfs_mount, msdosfs_mount,
ntfs_mount, sysvbfs_mount, udf_mount, union_mount, sys_chflags,
sys_lchflags, sys_chmod, sys_lchmod, sys_chown, sys_lchown,
sys___posix_chown, sys___posix_lchown, sys_link, do_sys_pstatvfs,
sys_quotactl, sys_revoke, sys_truncate, do_sys_utimes, sys_extattrctl,
sys_extattr_set_file, sys_extattr_set_link, sys_extattr_get_file,
sys_extattr_get_link, sys_extattr_delete_file,
sys_extattr_delete_link, sys_extattr_list_file, sys_extattr_list_link,
sys_setxattr, sys_lsetxattr, sys_getxattr, sys_lgetxattr,
sys_listxattr, sys_llistxattr, sys_removexattr, sys_lremovexattr

All have been scrutinized (several times, in fact) and compile-tested,
but not all have been explicitly tested in action.

XXX: While I haven't (intentionally) changed the use or nonuse of
XXX: TRYEMULROOT in any of these places, I'm not convinced all the
XXX: uses are correct; an audit might be desirable.
2009-06-29 05:08:15 +00:00
ad fe924bec61 +/*
+ * NOTE: COORDINATE ON-DISK FORMAT CHANGES WITH THE FREEBSD PROJECT.
+ */
2009-06-28 09:26:18 +00:00