Commit Graph

799 Commits

Author SHA1 Message Date
hannken
d84a65dd80 VOP_OPEN() needs a locked vnode. All these copy-and-pasted xxxfs_mount()
implementations need more review.
2011-11-14 18:35:12 +00:00
christos
a96ee3ab95 use getdiskinfo() 2011-11-13 23:10:34 +00:00
hannken
34f54c83be As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.
2011-10-07 09:35:04 +00:00
chs
4ce2757928 strengthen the assertions about pages existing during block allocation,
which were incorrectly relaxed last year.  add some comments so that
the intent of these is hopefully clearer.

in ufs_balloc_range(), don't free pages or mark them dirty if
allocating their backing store failed.  this fixes PR 45369.
2011-09-20 14:01:32 +00:00
christos
b866ba6e1a fix sign-compare warnings 2011-08-14 12:37:09 +00:00
hannken
661fcc7b37 ffs_copyonwrite(): If the write is to the in-file-system journal
there is no need to lock and check the snapshots.
2011-07-01 14:28:21 +00:00
manu
d8abff28ef Implement extended attribute listing for UFS1.
Modify lsextattr(8) so that it does not expect each attribute name to be
prefixed by its length. This enable extattr_list_(file|link|fd) to
return a buffer matching its documentation. This also makes the interface
similar to what Linux and FUSE do, which is nice for interoperability.

Note that since we had no EA implementation supporting listing, we do
not break anything.
2011-06-27 16:34:47 +00:00
mrg
ff721708ed fix an off by one array overflow found by GCC 4.5.3. 2011-06-22 04:01:33 +00:00
manu
448e1c49b2 Add mount -o extattr option to enable extended attributs (corrently only
for UFS1).
Remove kernel option for EA backing store autocreation and do it by
default. Add a sysctl so that autocreated attriutr size can be modified.
2011-06-17 14:23:50 +00:00
hannken
d296304e60 Rename uvm_vnp_zerorange(struct vnode *, off_t, size_t) to
ubc_zerorange(struct uvm_object *, off_t, size_t, int) changing
the first argument to an uvm_object and adding a flags argument.

Modify tmpfs_reg_resize() to zero the backing store (aobj) instead
of the vnode.  Ubc_purge() no longer panics when unmounting tmpfs.

Keep uvm_vnp_zerorange() until the next kernel version bump.
2011-06-16 09:21:02 +00:00
rmind
e225b7bd09 Welcome to 5.99.53! Merge rmind-uvmplock branch:
- Reorganize locking in UVM and provide extra serialisation for pmap(9).
  New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
  the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
  Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
  kernel-lock on some ports).  Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
2011-06-12 03:35:36 +00:00
bouyer
a3a7248ce7 Fix bad cut'n'paste in copyright. Pointed out by dyoung@ 2011-06-07 14:56:12 +00:00
hannken
f7e12f18b3 Revert previous commit. Locking the snapshot vnode while the file system
is suspended extends the suspension until the vnode gets unlocked by
the caller of ffs_snapshot().

Resuming the file system before expunging all snapshots and syncing the
snapshot creates races and deadlocks with journaling file systems at least.
2011-05-08 18:37:15 +00:00
hannken
b28fa91685 Before expunging all snapshots take the snapshot lock and resume the file
system as this is sufficient for the remaining operations.

Reduces the time the file system is suspended and should make this time
independent of the number of snapshots already present.
2011-04-29 09:45:15 +00:00
hannken
bb3ca01e60 Cleanup ffs fsync and make devices on wapbl enabled file systems work here:
- Replace the ugly sync loop in ffs_full_fsync() and ffs_vfs_fsync() with
  vflushbuf().  This loop is a relic of softdeps and not needed anymore.

- Add ffs_spec_fsync() for device nodes on ffs file systems that calls
  spec_fsync() like all other file systems do and then updates the ctime.

Discussed on tech-kern.

Should fix PRs:
PR #41192 wapbl diagnostic panic during cgdconfig
PR #41977 kernel diagnostic assertion "rw_lock_held(&wl->wl_rwlock)" failed
PR #42149 wapbl locking panic if watching DVD
PR #42551 Lockdebug assert in wapbl when running zpool
2011-04-27 07:24:52 +00:00
hannken
7c9d6febb5 ffs_snapshot(): return an error if the node is an invalid snapshot. 2011-04-23 08:23:52 +00:00
hannken
36046fc79f Try to keep snapshot indirect blocks contiguous.
This speeds up snapshot creation by a factor of ~3 and reduces
the file system suspension time by a factor of ~5.
2011-04-23 07:36:02 +00:00
hannken
21d54ad389 Preallocate all cylinder group blocks so we no longer redo ~50% of
the cylinder groups while the file system is suspended.
This was removed in error with Rev 1.16.

From Manuel Bouyer <bouyer@netbsd.org> via tech-kern.
2011-04-18 07:36:13 +00:00
hannken
186b31b4b7 ffs_fsync: no need for wapbl_vptomp() here -- vnode is always VREG. 2011-04-15 15:54:11 +00:00
mlelstv
8e9bf29753 Don't abort when APPLE_UFS autodetection cannot read the apple ufs label
due to sector size or alignment problems. Autodetection is only a safety
measure, you should mark the filesystem type in the BSD disklabel.
2011-03-27 08:04:50 +00:00
bouyer
063f96f3c2 merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
2011-03-06 17:08:10 +00:00
rmind
f09748f46c {ffs_nodealloccg,ext2fs_nodealloccg,ext2fs_mapsearch}: use XOR and ffs()
to find free bits in the inode and block bitmaps, instead of the loop.

Obtained from FreeBSD (changes by jhb).
2011-03-06 04:46:26 +00:00
hannken
05e91bfee8 fss(4): Allow FSSIOCSET to set the initial flags. Add a new flag
"FSS_UNLINK_ON_CREATE" to unlink the backing store before
        the snapshot gets created.

With this change dump(8) no longer dumps the zero-sized, but named
snapshot it is working on.  Same applies to fsck_ffs(8).
2011-02-24 09:38:57 +00:00
dyoung
062b9b2f31 Initialize blkno to 0 right before the snapblkaddr() call that GCC does
not understand so that if ffs_copyonwrite() sprouts a new code path that
does not initialize blkno, the compiler has the chance to reveal it.
2011-02-23 17:05:33 +00:00
hannken
0ca683e1ba Quiesce CC ('blkno' may be used uninitialized in this function). 2011-02-23 08:53:21 +00:00
he
0f003a45cb Move blocks_in_journal() in under #ifndef FFS_NO_SNAPSHOT, all uses
are under that ifdef anyway; this allows build with FFS_NO_SNAPSHOT defined.
2011-02-22 20:25:54 +00:00
hannken
296ec9e30e Change the snapshot lock:
- No need to take the snapshot lock while the file system is suspended.
- Allow ffs_copyonwrite() one level of recursion with snapshots locked.
- Do the block address lookup with snapshots locked.
- Take the snapshot lock while removing a snapshot from the list.

While hunting deadlocks change the transaction scope for ffs_snapremove().
We could deadlock from UFS_WAPBL_BEGIN() with a buffer held.
2011-02-21 09:29:21 +00:00
bouyer
e09a28661e Initialize error in snapshot_expunge(); if the list is empty error would
be returned uninitialized. t_snapshot_v2 was failing for me when
librumpffs was compiled DGB=-g.
No idea why gcc didn't catch this ...
2011-02-18 14:48:54 +00:00
hannken
28125a4542 Revert rev. 1.101. Dead snapshots would hang around until unmount.
Adresses PR #44568 (WAPBL doens't play nice with snapshots).
2011-02-18 08:39:13 +00:00
hannken
6f85587813 Refine the scope of WAPBL transactions so we should no longer get
a "wapbl_flush: current transaction too big to flush" panic when
creating or removing snapshots on larger logging disks.

Adresses PR #44568 (WAPBL doens't play nice with snapshots).
2011-02-16 19:43:50 +00:00
hannken
53b57e3385 Extend the range of fstrans transactions to a sequence of vnode operations
on a locked vnode.  This leaves a suspended file system and therefore a
snapshot with either all or no operations of such a sequence done.
2010-12-27 18:49:42 +00:00
mlelstv
6c899f7536 For update mounts the root vnode is already in use and we must not
free it. Since the mount persists even when the update fails,
this is not a problem either.
2010-12-24 13:38:57 +00:00
mlelstv
5eee906941 mount(2) doesn't remove vnodes from the freelist in the error path,
so that they get reused with a invalid pointer to a mount structure.

As a workaround, free the vnodes used to create the in-filesystem journal
immediately.
2010-12-23 14:43:37 +00:00
matt
6a66466f0c Move counting of faults, traps, intrs, soft[intr]s, syscalls, and nswtch
from uvmexp to per-cpu cpu_data and move them to 64bits.  Remove unneeded
includes of <uvm/uvm_extern.h> and/or <uvm/uvm.h>.
2010-12-20 00:25:23 +00:00
hannken
3b57b82b8f Keep a reference to the snapshot vnode until it gets removed from the
snapshot list.
2010-12-12 10:29:25 +00:00
hannken
f29d5492f8 syncsnap: Use bbusy() to take a buffer from v_dirtyblkhd. 2010-12-12 10:28:22 +00:00
hannken
559469276d ffs_reclaim: don't free an already free inode. This may happen when
ffs_fhtovp() gets a free inode and releases it.
2010-08-12 07:41:49 +00:00
pooka
6e5ca1ed9e add a linefeed to the previous 2010-08-09 17:12:18 +00:00
pooka
5140c8efdf Return error if we try to mount a file system with block size > MAXBSIZE.
Note: there is a billion ways to make the kernel panic by trying
to mount a garbage file system and I don't imagine we'll ever get
close to fixing even half of them.  However, for this one failing
gracefully is a bonus since Xen DomU only does 32k MAXBSIZE and
the 64k MAXBSIZE file systems are out there (PR port-xen/43727).

Tested by compiling sys/rump with CPPFLAGS+=-DMAXPHYS=32768 (all
tests in tests/fs still pass).  I don't know how we're going to
translate this into an easy regression test, though.  Maybe with
a hacked newfs?
2010-08-09 15:50:13 +00:00
hannken
3a7edffde9 ext2fs,ffs: free on disk inodes in the reclaim routine.
Remove now unneeded vnode flag VI_FREEING.

Welcome to 5.99.38.

Ok: Andrew Doran <ad@netbsd.org>
2010-07-28 11:03:47 +00:00
hannken
fb62bef947 Make holding v_interlock mandatory for callers of vget().
Announced some time ago on tech-kern.
2010-07-21 17:52:09 +00:00
hannken
1423e65b26 Clean up vnode lock operations pass 2:
VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
2010-06-24 12:58:48 +00:00
hannken
67c30e0802 Initialize the initial snap block list's count.
From Antti Kantee <pooka@netbsd.org>.
2010-06-02 09:56:59 +00:00
dbj
1da7b01e3b switch from 4 clause to 2 clause BSD license. 2010-04-24 19:58:13 +00:00
pooka
242bf1c3e7 Stop exposing fifofs internals and leave only fifo_vnodeop_p visible. 2010-03-29 13:11:32 +00:00
mlelstv
ef95b640b0 Store physical block numbers in superblock that point to the journal.
Calculate position of both commit headers correctly for disks with
large sectors.
Correct calculation of circular buffer size.
2010-02-27 12:04:19 +00:00
mlelstv
6d6d11f709 Replace individual queries for partition information with
new helper function.
Use this information to query physical sector sizes for WAPBL
instead of hardcoded defaults.
No longer limits physical sector sizes to 512 bytes.
2010-02-23 20:41:41 +00:00
mlelstv
03c7f48412 For the UVM_PAGE_TRKOWN test do not require that the relevant pages
must exist.
2010-02-21 13:55:58 +00:00
mlelstv
b44bbb30f5 There is no code left that uses disk size data, so don't query it.
This also failed when querying the simulated block device from mfs.
Fixes PR kern/42782.
2010-02-11 00:06:16 +00:00
bouyer
be891954ad - ufs_balloc_range(): on error, only PG_RELEASED the pages that were
allocated to extend the file to the new size. Releasing all pages
  may release pages that contains previously-written data not yet flushed
  to disk. Should fix PR kern/35704
- {ffs,lfs,ext2fs}_truncate(): Even if the inode's size is the same as
  the new length, call uvm_vnp_setsize(). *_truncate() may have been
  called by *_write() in the error path (e.g. block allocation failure
  because of quota of file system full), and at this point v_writesize
  has been set to the desired size of the file and not reverted to the
  old size. Not adjusting v_writesize to the real size cause
  genfs_do_io() to write to disk past the real end of the file.
2010-02-07 17:12:40 +00:00
mlelstv
bb2d547d2f Correct addressing of superblock updates. 2010-02-05 20:03:36 +00:00
mlelstv
748a0d77b1 Fix block shift to work with different device block sizes.
Unlike other filesystems this has some side issues because
the shift values are stored in the superblock and because
userland utitlies share the same fsbtodb macros.

-> the kernel now ignores the value stored in the superblock.
-> the macro adaption is only done for defined(_KERNEL) code.
2010-01-31 10:54:10 +00:00
mlelstv
5e340cd634 Replace individual queries for partition information with
new helper function.
2010-01-31 10:50:23 +00:00
pooka
c3183f3251 The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase).  Plenty of mix'n match upper/lowercase has creeped
into the tree since then.  Nuke the macros and convert all callsites
to lowercase.

no functional change
2010-01-08 11:35:07 +00:00
hannken
d35df7da38 Now that softdep has left the tree the only place needing the ffs_lock()
hack is ffs_sync().

- Use the generic lock operations for ffs.
- Change ffs_sync() to omit the vnode lock while suspending.

Reviewed by: Antti Kantee <pooka@netbsd.org>
2009-11-04 09:45:05 +00:00
bouyer
6d07b400dc Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !
2009-10-19 18:41:07 +00:00
hannken
4e246abd4c No longer abuse TAILQ internal data. 2009-10-15 10:05:48 +00:00
hannken
8deb3262b5 Fix a deadlock where fscow_disestablish() blocks because outstanding
copy-on-write operations wait for si_snaplock.
2009-10-13 12:38:14 +00:00
bouyer
b9440228c5 If the WAPBL journal can't be read (ffs_wapbl_replay_start() fails),
mount the filesystem anyway if MNT_FORCE is present.
This allows to still boot single-user a system with a corrupted
WAPBL on /, and so get a chance to run fsck to fix it.
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
2009-09-13 14:30:21 +00:00
bouyer
32992733fa Allow tunefs to clear any type of WAPBL log, not only in-filesystem
ones. Discussed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
2009-09-13 14:13:23 +00:00
tsutsui
e7713433d4 Move declaration of ufs_hashlock into <ufs/ufs_extern.h> from each c source. 2009-09-13 05:17:36 +00:00
pooka
7ec7a51957 Don't free extattr resources until it is certain that unmount
succeeds.  Also, "unmount system call" -> "unmount vfs operation"
in comment just so that our comments aren't 15+ years outdated.
2009-07-31 20:58:50 +00:00
pooka
7982dc729e Restore error behaviour bulldozed in rev 1.246.
might fix PR kern/41769
2009-07-23 01:10:02 +00:00
christos
48e6aff258 Fix bug introduced in revision 1.174 where a NULL fspec with an MNT_UPDATE
command would always return EINVAL. This broke fsck on root, where fsck'ing
a dirty root would always return an error causing rc to resort in a reboot.
2009-07-06 16:07:18 +00:00
dholland
effcf1af5c Convert 67 namei call sites to use namei_simple, in these functions:
check_console, veriexecclose, veriexec_delete, veriexec_file_add,
emul_find_root, coff_load_shlib (sh3 version), coff_load_shlib,
compat_20_sys_statfs, compat_20_netbsd32_statfs,
ELFNAME2(netbsd32,probe_noteless), darwin_sys_statfs,
ibcs2_sys_statfs, ibcs2_sys_statvfs, linux_sys_uselib,
osf1_sys_statfs, sunos_sys_statfs, sunos32_sys_statfs,
ultrix_sys_statfs, do_sys_mount, fss_create_files (3 of 4),
adosfs_mount, cd9660_mount, coda_ioctl, coda_mount, ext2fs_mount,
ffs_mount, filecore_mount, hfs_mount, lfs_mount, msdosfs_mount,
ntfs_mount, sysvbfs_mount, udf_mount, union_mount, sys_chflags,
sys_lchflags, sys_chmod, sys_lchmod, sys_chown, sys_lchown,
sys___posix_chown, sys___posix_lchown, sys_link, do_sys_pstatvfs,
sys_quotactl, sys_revoke, sys_truncate, do_sys_utimes, sys_extattrctl,
sys_extattr_set_file, sys_extattr_set_link, sys_extattr_get_file,
sys_extattr_get_link, sys_extattr_delete_file,
sys_extattr_delete_link, sys_extattr_list_file, sys_extattr_list_link,
sys_setxattr, sys_lsetxattr, sys_getxattr, sys_lgetxattr,
sys_listxattr, sys_llistxattr, sys_removexattr, sys_lremovexattr

All have been scrutinized (several times, in fact) and compile-tested,
but not all have been explicitly tested in action.

XXX: While I haven't (intentionally) changed the use or nonuse of
XXX: TRYEMULROOT in any of these places, I'm not convinced all the
XXX: uses are correct; an audit might be desirable.
2009-06-29 05:08:15 +00:00
ad
fe924bec61 +/*
+ * NOTE: COORDINATE ON-DISK FORMAT CHANGES WITH THE FREEBSD PROJECT.
+ */
2009-06-28 09:26:18 +00:00
ad
a94f2ab36f Reserve a bit for FS_GJOURNAL (from FreeBSD). 2009-05-12 21:01:02 +00:00
elad
9e9887cc59 Introduce several actions/requests for authorizing file-system related
operations, specifically quota and block allocation from reserved space.

Modify ufs_quotactl() to accomodate passing "mp" earlier by vfs_busy()ing
it a little bit higher.

Mailing list reference:

	http://mail-index.netbsd.org/tech-kern/2009/04/26/msg004936.html

Note that the umapfs request mentioned in this thread was NOT added as
there is still on-going discussion regarding the proper implementation.
2009-05-07 19:26:08 +00:00
elad
54bf8cc67a Add genfs_can_mount() and use it to prevent some more code duplication of
the security checks when mounting a device (VOP_ACCESS() + kauth(9) call)).

Proposed with no objections on tech-kern@:

	http://mail-index.netbsd.org/tech-kern/2009/04/20/msg004859.html

The vnode is always expected to be locked, so no locking is done outside
the file-system code.
2009-04-25 18:53:44 +00:00
sborrill
71d4bf3caa Fix random 'filesystem full' messages by trapping a couple of 32-bit
overflow areas missed in rev 1.110 and switching cgbase().

Kudos to rump_ffs!
2009-04-25 08:32:32 +00:00
tsutsui
d779b85d3e Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
2009-04-18 14:58:02 +00:00
ad
393ca6e076 fsync:
- atime updates were not being synced.

ffs_sync:

- In some cases the sync vnode was acting like now dead /usr/sbin/update.
  It was examining vnodes that it should have ignored.

- It would find dirty inodes and try to flush them. Often ffs_fsync()
  cheerfully ignored the flush request due to the fsync bug. Such inodes
  remained dirty and were repeatedly re-examined by the syncer until
  vnode reclaim or system shutdown.

- We were marking our place in the per-mount vnode list even though in
  most cases there was not flush to perform. While not a bug, this wasted
  CPU cycles because a TAILQ_NEXT would have sufficed.
2009-03-29 10:29:00 +00:00
ad
2600da8765 ffs_sync: ensure that we *do* flush atime updates periodically.
ffs_update() was eating the flag.
2009-03-21 14:35:48 +00:00
cegger
e2cb85904d bcopy -> memcpy 2009-03-18 17:06:41 +00:00
cegger
c363a9cb62 bzero -> memset 2009-03-18 16:00:08 +00:00
dholland
2fe837fcf9 typo in comment 2009-02-23 03:01:13 +00:00
ad
59fcf21389 PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
2009-02-22 20:28:05 +00:00
ad
430f67aa17 PR kern/39564 wapbl performance issues with disk cache flushing
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc

- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
  buffers being invalidated. Problem discovered and patch by dholland@.

- If the syncer fails to lazily sync a vnode due to lock contention,
  retry 1 second later instead of 30 seconds later.

- Flush inode atime updates every ~10 seconds (this makes most sense with
  logging). Presently they didn't hit the disk for read-only files or
  devices until the file system was unmounted. It would be better to trickle
  the updates out but that would require more extensive changes.

- Fix issues with file system corruption, busy looping and other nasty
  problems when logging and non-logging file systems are intermixed,
  with one being the root file system.

- For logging, do not flush metadata on an inode-at-a-time basis if the sync
  has been requested by ioflush. Previously, we could try hundreds of log
  sync operations a second due to inode update activity, causing the syncer
  to fall behind and metadata updates to be serialized across the entire
  file system. Instead, burst out metadata and log flushes at a minimum
  interval of every 10 seconds on an active file system (happens more often
  if the log becomes full). Note this does not change the operation of
  fsync() etc.

- With the flush issue fixed, re-enable concurrent metadata updates in
  vfs_wapbl.c.
2009-02-22 20:10:25 +00:00
ad
74d10dbea4 PR kern/40469 5.0_BETA/amd64 INSTALL kernel panics when installing on log-enabled filesystems
PR kern/40470 WAPBL corrupts ext2fs

Don't touch inodes at all unless VOP_FSYNC(). Might fix the ext2fs problem,
I am not sure.
2009-02-01 17:36:43 +00:00
yamt
18be80bfbe 0 -> NULL 2009-01-31 09:22:08 +00:00
yamt
e52a72295f wapbl_log_position: 1 -> MNT_WAIT 2009-01-31 09:14:15 +00:00
lukem
c5eb4ab601 fix -Wsign-compare issues 2009-01-18 11:56:51 +00:00
pooka
a5ae82a57e Revert 1.101, author did not provide a justification. 2009-01-15 21:26:03 +00:00
christos
461a86f9bd merge christos-time_t 2009-01-11 02:45:45 +00:00
hannken
fc6d5c7578 Remove superfluous "vp->v_vnlock = &vp->v_lock".
Observed by: YAMAMOTO Takashi <yamt@netbsd.org>
2009-01-03 15:29:08 +00:00
christos
437cf02e63 Don't try to ffs_update VT_NON vnodes 2008-12-28 16:27:00 +00:00
cegger
f1b926ed8b ffs_update: sprinkle KASSERTs 2008-12-23 11:32:08 +00:00
ad
f1ec31c6b1 Add a comment. 2008-12-22 12:18:48 +00:00
ad
0472423773 PR kern/40246 current panics when removing swap devices
Someone was smoking crack when they decided to unconditionally OR FSYNC_VFS
into the flags for block devices.
2008-12-22 11:46:33 +00:00
ad
83f7350f6d PR kern/40210 5.0 BETA WAPBL related crash 2008-12-21 10:44:32 +00:00
hannken
e1e7ee242d Restore a line removed by mistake with the last commit.
Should fix PR 40225 panic: indiracct: missing indir.
2008-12-19 11:36:10 +00:00
cegger
9b87d582bd kill MALLOC and FREE macros. 2008-12-17 20:51:31 +00:00
hannken
59f928fb25 ffs_copyonwrite(): Only use si_snapblklist if it is already allocated.
ffs_snapshot_read(): Use IO_ALTSEMANTICS to allow reading a snapshot vnode
                     beyond file system size.  Needed to read the snapblklist
                     on mount.

Persistent snapshots work again.

Should fix PR kern/37425: fss_snapshot_mount panic during fsck.
2008-12-07 19:51:07 +00:00
hannken
8e313cc27b Revert previous -- ALL reads are from kernel space.
Still open: PR kern/37425: fss_snapshot_mount panic during fsck.
2008-12-07 18:55:58 +00:00
hannken
7dbaf06e71 ffs_copyonwrite(): Only use si_snapblklist if it is already allocated.
ffs_snapshot_read(): Allow the kernel to read beyond file system size.

Persistent snapshots work again.

Should fix PR kern/37425: fss_snapshot_mount panic during fsck.
2008-12-07 10:01:09 +00:00
joerg
9a364d2ed3 Split ffs_freefile into a frontend for normal cylinder group and for
snapshot use. Adjust ffs_blkfree_common to get the fs instance passed
in, the original commit didn't account blocks in the snapshots
correctly. Assert that ffs_blkfree is used with the primary fs instance
and that ffs_checkfreefile is only used for snapshots. Move the bdwrite
from ffs_blkfree_common into the caller for symmetry. This creates a
redundant write of unmodified data for ffs_blkfree_snap if a double free
of a block happens.

Reviewed and tested by hannken@.
2008-12-06 20:05:55 +00:00
joerg
6cdfaeec55 Revert last. Conditionalize variables on FFS_EI. 2008-12-01 13:45:51 +00:00
cegger
9c45aac9d8 build fix: remove unused variables 2008-12-01 13:33:39 +00:00
joerg
740a2c079c ffs_blkfree is used in two different ways. The normal usage is to free a
block in the cylinder groups of the filesystem. The other user is the
snapshot code, which wants to modify the copied cylinder groups. Use
different frontends to distinguish the cases in preparation for fine
grained locking for cylinder groups.
2008-12-01 13:22:06 +00:00
joerg
dfd7714b8f Split ffs_blkalloc into a frontend that does inode based consistency
checks and a backend that just asserts them. Use the backend in
ffs_wapbl_abort_sync_metadata instead of faking an inode.
2008-11-30 16:20:44 +00:00
ad
bed0008a9a Remove #ifdef LFS from the ufs code. 2008-11-13 11:09:45 +00:00
joerg
e09eb39f96 wapbl_replay_free needs the reply to have been stopped, so make sure
that the changes happen in the right order. Reported by veego@
2008-11-11 21:02:54 +00:00
joerg
3fbdfc8af9 Reduce internals of WAPBL exposed to the rest of the system. 2008-11-10 20:12:13 +00:00
joerg
ecbfc2933c Remove XXXUBC code for ffs_reallocblks, that has been conditionalized in
2002 and #if 0'ed in 2005. It would need a considerable amount of work
to bring back and obscures the more important block allocation.
2008-11-06 22:31:08 +00:00
joerg
564d6ccca2 Fix indentation. 2008-10-30 17:03:09 +00:00
hannken
06529f4f6d Correct previous.
- Count frags, not blocks to get the file system size.
- Cannot use blksize() here, it depends on vnode size.
- Correctly update xfersize on short reads.
2008-10-23 17:16:24 +00:00
hannken
02630b7919 When computing the requests hard limit in ffs_snapshot_read()
use the file system size, not the size of the snapshot vnode.
2008-10-23 14:25:21 +00:00
hannken
44f3404f57 Break a deadlock where one thread has a wapbl transaction, calls VOP_GETPAGES
and wants to busy a page  while  another thread calls VOP_PUTPAGES on the same
vnode, takes pages busy and wants to start a wapbl transaction.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>
2008-10-10 09:21:58 +00:00
pooka
ed8826a34e Remove some of my debugging code which was not meant to be committed
in the wapbl merge.
2008-09-23 15:27:59 +00:00
freza
4ebea2e5f3 Revert previous, pooka@ points out it's wrong. 2008-09-21 23:22:00 +00:00
freza
262577121d WAPBL: in '%s: replaying log to disk' message use the path we're
trying to mount on instead of the misleading last-mounted-on
    path. Reported by jmcneill.
2008-09-21 21:08:22 +00:00
hannken
72538db7ba Adjust some WAPBL transactions:
- Put transaction inside cgaccount() to simplify caller.
- No vget() / vrele() inside a transaction.
2008-09-08 14:22:31 +00:00
joerg
fe12f360b8 Move successful removal of unreferenced inodes under WAPBL_DEBUG to not
spam the console.

OK simon@
2008-09-08 03:16:43 +00:00
hannken
15d57e4f81 Ffs_snapshot() has become a huge monster over the time. Break it into
helper functions to enhance readability.  Adjust comments to reality
and test the main error paths.

While here, expand and remove the last FreeBSD->NetBSD conversion macros.

No functional change intended.
2008-09-02 08:51:46 +00:00
hannken
ced96ac82f ffs_truncate() always runs with journal locked. Propagate this information
to VOP_PUTPAGES().

Report from Lars Nordlund on current-users@
2008-08-30 08:25:53 +00:00
hannken
86492b8668 Sync the just created snapshot to disk.
Invalidate short ( < fs_bsize ) buffers.  We will always read full
size buffers later.

Should fix PR #39402
2008-08-25 13:34:53 +00:00
hannken
24b2cb27c8 Add missing vput() for logvp.
Fixes PR #39400
2008-08-24 15:33:37 +00:00
hannken
83fe3aa0dd Merge the _ufs1 and _ufs2 variants of the expunge and accounting functions.
Remove some unneeded UFS_FSNEEDSWAP().

Saves ~250 lines of redundant code.
2008-08-24 09:51:47 +00:00
hannken
88400c4373 Add snapshot support for logging ffs file systems.
- Add UFS_WAPBL_BEGIN() / UFS_WAPBL_END() where needed.

- Expunge WAPBL log inodes from snapshots.

- Ffs_copyonwrite() and ffs_snapblkfree() must run inside a WAPBL transaction.

- Add ffs_gop_write() as a wrapper around genfs_gop_write() that makes sure
  genfs_gop_write() gets always called inside a WAPBL transaction.

- Add VOP_PUTPAGES() flag PGO_JOURNALLOCKED to tag calls to VOP_PUTPAGES()
  inside a WAPBL transaction.

Reviewed by: Simon Burge <simonb@netbsd.org>,  Greg Oster <oster@netbsd.org>

PGO_JOURNALLOCKED / ffs_gop_write() part presented on tech-kern@.
2008-08-22 10:48:22 +00:00
hannken
9d026d04ef ffs_suspendctl: make sure everything is on disk and the on disk log is empty. 2008-08-15 17:32:32 +00:00
hannken
93d3ff0e45 Deny read/write access to snapshot vnodes. We use fss(4) to read from
snapshots.  With this policy in place:

- Separate the snapshot vnode lock from the snapshot common lock.
  Snapshots no longer need recursive vnode locks.

- Use a mutex (si_snaplock) to serialize creation, deletion, reading and
  writing of snapshots.

- Move ffs_read() for snapshots into ffs_snapshot.c.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>

While here change ffs_copyonwrite() to fail requests from pagedaemon that need
to copy-on-write.
2008-08-12 10:14:37 +00:00
hannken
601ab263e0 Do not call UFS_WAPBL_*() when ffs_freefile() is acting on a snapshot.
While here replace the test for VBLK with a convenience variable.
2008-08-06 12:54:26 +00:00
pooka
b43847b66c zu, not zd, to print size_t 2008-08-05 13:39:29 +00:00
simonb
ba0675032b Only allow WAPBL to operate with UFS2 style superblocks.
Problem reported by Takeshi Nakayama.
2008-08-04 15:55:11 +00:00
simonb
53aeec60eb When checking if there's enough space at the end of a partition,
compare bytes vs bytes, not sectors vs bytes.

Problem discovered and fix tested by Michael Hitch.
2008-08-02 02:23:51 +00:00
oster
9921f79434 Make MSDOS filesystems work again after WAPBL merge. Fixes a quite
repeatable panic in fstrans_getstate() found while searching for a
different USB bug.  Also makes the code somewhat more readable.

Patch from Juergen Hannken-Illjes with a small rearrangement from me.

Approved by: hannken
2008-07-31 23:49:50 +00:00
hannken
85271ee0ad Ffs snapshots don't work (yet) with WAPBL:
- no snapshot creation on logging file systems.
- refuse to mount logging file systems with persistent snapshots.

Ok: Simon Burge <simonb@netbsd.org>
2008-07-31 15:37:56 +00:00
hannken
fb6f2c9b30 Resolve a deadlock when fs_nodealloccg() initializes more inodes on
an UFS2 file system.  With the current cylinder group buffer busy it
calls ffs_getblk().  This runs through copy-on-write and may need the
current cylinder group buffer to allocate a new block for the snapshot.

While here write the cylinder group buffer synchronously after
cg_initediblk was changed because fsck_ffs will trust it.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>
2008-07-31 09:35:09 +00:00
simonb
0800f2fb9f Be consistent with #define<tab>. 2008-07-31 08:44:21 +00:00
simonb
36d65f1138 Merge the simonb-wapbl branch. From the original branch commit:
Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
   journaling code.  Originally written by Darrin B. Jewell while
   at Wasabi and updated to -current by Antti Kantee, Andy Doran,
   Greg Oster and Simon Burge.

OK'd by core@, releng@.
2008-07-31 05:38:04 +00:00
hannken
4685b65ada ffs_snapshot():
Release allocated indir blocks on non-softdep file systems instead
    of writing them twice.
    It is sufficient to clean dirty data pages to avoid UBC inconsistencies.

ffs_snapblkfree() and wrsnapblk():
    If a snapshots effective link count is zero there is no need
    to use synchronous writes.

ffs_copyonwrite():
    Defer locking the snapshots until there is a need to copy the block.

wrsnapblk():
    Use vn_rdwr() instead of bwrite() to write to the snapshots.
2008-07-30 10:09:30 +00:00
hannken
f67742b3c8 expunge_ufs*(): Use the buffer cache to update the inodes on the snapshot like
the rest of snapshot creation does.
2008-07-15 08:20:56 +00:00
simonb
27ae933bee Fix potential 32-bit overflow problem in the blockpref code.
mlelstv@ points out FreeBSD fixed the same thing a couple of years
ago - here's the commit message they used on rev 1.127:

   Fixes a bug that caused UFS2 filesystems bigger than 2TB to
   prematurely report that they were full and/or to panic the kernel
   with the message ``ffs_clusteralloc: allocated out of group''.

   Submitted by:   Henry Whincup <henry@jot.to>
2008-07-11 05:31:44 +00:00
rumble
28f5ebd853 Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
2008-06-28 01:34:05 +00:00
reinoud
f6a70673ba Mark a buffer busy in getnewbuf() when it came from the pool_cache since
its not on a free list.

Also change buf_init() to not automatically mark buffers `busy' since this
only makes sense for bufcache buffers.

Mark all buf_init'd buffers 'busy' on the places where they ought to be
flagged as such to not confuse the buffer cache.

Fixes PR 38923.
2008-06-17 14:53:10 +00:00
ad
e89db9644e When setting DONE on the buffer, assert that there are no waiters in
biowait().
2008-06-04 17:46:21 +00:00
hannken
15e90e4bbc ufs/ffs: replace calls to getblk() with ffs_getblk(). Now all buffers
have been run through copy-on-write and async mounts work again.

Fixes PR kern/38820

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
2008-06-03 09:47:49 +00:00
ad
c35a9dfad1 Put a TNF copyright on it. 2008-05-31 21:39:13 +00:00
ad
3592ae4882 XXX softdep:
If the number of deletes in progress is getting too high, newdirrem()
requests the syncer to flush faster, and in some cases will block to
prevent deletes accumulating faster than the disk can service them.

The syncer will try to lock vnodes that the remover holds locked, leading
to the syncer and remover proceeding in lockstep and making very little
overall forward progress.

Put a hook into ufs_rmdir() and ufs_remove() so that the softdep code
can pace itself without holding vnode locks if the number of deletes is
running out of control.
2008-05-31 21:37:08 +00:00
hannken
336f2a69f4 ffs_copyonwrite(): stop abusing ffs_balloc() to get a block address.
Use ufs_getlbns()/bread() instead.
Saves some reads and removes deep recursion with possible deadlock
when ffs_balloc() runs copy-on-write on the buffer returned.
2008-05-29 10:00:50 +00:00
hannken
5d2bff060a Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write.  Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn().  If set the caller
  intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
  may clear the buffer and runs copy-on-write.  Process possible errors
  from getblk() or fscow_run().  Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
2008-05-16 09:21:59 +00:00
rumble
a1221b6d4a Convert file systems to dynamically attach with the new module interface.
Make VFS hooks dynamic while we're here and say farewell to VFS_ATTACH and
VFS_HOOKS_ATTACH linksets.

As a consequence, most of the file systems can now be loaded as new style
modules.

Quick sanity check by ad@.
2008-05-10 02:26:09 +00:00
ad
42d0626726 PR kern/38141 lookup/vfs_busy acquire rwlock recursively
Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.

Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:

- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
  sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
  and is only ever write locked in dounmount(). A write hold can't be taken
  on this lock if the current LWP could hold a vnode lock.

- kmutex_t mnt_updating. This is taken by threads updating the mount, for
  example when going r/o -> r/w, and is only present to serialize updates.
  In order to take this lock, a read hold must first be taken on
  mnt_unmounting, and the two need to be held across the operation.

One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
2008-05-06 18:43:44 +00:00
ad
e071d39c84 - Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.
2008-05-05 17:11:16 +00:00
ad
928a6b2096 PR kern/38135 vfs_busy/vfs_trybusy confusion
The previous fix worked, but it opened a window where mounts could have
disappeared from mountlist while the caller was traversing it using
vfs_trybusy(). Fix that.
2008-04-30 12:49:16 +00:00
ad
baa3395f8f PR kern/38057 ffs makes assuptions about devvp file system
PR kern/33406 softdeps get stuck in endless loop

Introduce VFS_FSYNC() and call it when syncing a block device, if it
has a mounted file system.
2008-04-29 18:18:08 +00:00
hannken
2fd9e21242 Replace get/setspecific with a void pointer in struct ufsmount. Use explicit
initialization/finalization of snapshot private data on creation/deletion
of struct ufsmount.
Snapshot mounts no longer may fail silently because kmem_alloc() fails.

Welcome to 4.99.60

Ok: Andrew Doran <ad@netbsd.org>
2008-04-17 09:52:47 +00:00
ad
0701eb1ec7 newdirrem: if the number of deletes in progress is getting too high, start
pushing the syncer before considering rate limiting the deletes. We hold
vnodes locked and it's likely that the syncer will try to lock them while
flushing, leading to the syncer and remover proceeding in lockstep and
making very little forward progress. XXX this is not a solution.
2008-04-11 16:25:38 +00:00
ad
be04ac4896 Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.
2008-03-27 19:06:51 +00:00
matt
e2ca3f7504 Merge all the *different* definitions of bufqueues into one common one. 2008-02-20 17:13:29 +00:00