Commit Graph

133 Commits

Author SHA1 Message Date
mlelstv
7974872552 Three changes in a single commit.
- drop the notion of frags (LFS fragments) vs fsb (FFS fragments)
  The code uses a complicated unity function that just makes the
  code difficult to understand.

- support larger sector sizes. Fix disk address computations
  to use DEV_BSIZE in the kernel as required by device drivers
  and to use sector sizes in userland.

- Fix several locking bugs in lfs_bio.c and lfs_subr.c.
2010-02-16 23:20:30 +00:00
pooka
2e098f1f4e ... actually, define compat only for the kernel. Userlandia should
see only one version of the interfaces.
2009-11-05 17:16:36 +00:00
pooka
5207b24e34 Include compat/sys/time_types.h instead of compat/sys/time.h.
Fixes lint drama with interface name collisions.
2009-11-05 16:59:55 +00:00
pooka
c584ccaa0d Include compat code by default. 2009-11-05 11:54:49 +00:00
christos
fc0e85c95e PR/42246: NAKAJIMA Yoshihiro: provide COMPAT_50 for LFS 2009-10-29 17:10:32 +00:00
dholland
0b98e26158 typo in comment 2009-07-19 03:39:14 +00:00
hannken
5d2bff060a Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write.  Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn().  If set the caller
  intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
  may clear the buffer and runs copy-on-write.  Process possible errors
  from getblk() or fscow_run().  Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
2008-05-16 09:21:59 +00:00
martin
ce099b4099 Remove clause 3 and 4 from TNF licenses 2008-04-28 20:22:51 +00:00
ad
b2fa822a33 The buffer LOCKED flag need not be under the protection of bufcache_lock,
BUSY is enough.
2008-02-15 13:30:56 +00:00
ad
e01dd1a1f8 Use pool_cache. 2008-01-03 19:28:48 +00:00
ad
4a780c9ae2 Merge vmlocking2 to head. 2008-01-02 11:48:20 +00:00
ad
7dad9f7391 Merge from vmlocking:
- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
2007-10-10 20:42:20 +00:00
ad
5c3b2b3f2d Merge ffs locking & brelse changes from the vmlocking branch. 2007-10-08 18:01:27 +00:00
perseant
9234ba6fd8 Change references to SEGM_W_DIROPS to SEGM_CKP, and replace the logic that
formerly used SEGM_W_DIROPS in lfs_segwrite() appropriately.  This prevents
a problem in which processes could get stuck in "buffers" sleep forever.
2007-05-16 19:11:37 +00:00
perseant
9be0ebd9da Install a new sysctl, vfs.lfs.ignore_lazy_sync, which causes LFS to ignore
the "smooth" syncer, as if vfs.sync.*delay = 0, but only for LFS.  The
default is "on", i.e., ignore lazy sync.

Reduce the amount of polling/busy-waiting done by lfs_putpages().  To
accomplish this, copied genfs_putpages() and modified it to indicate which
page it was that caused it to return with EDEADLK.  fsync()/fdatasync()
should no longer ever fail with EAGAIN, and should not consume huge
quantities of cpu.

Also, try to make dirops less likely to be written as the result of a
VOP_PUTPAGES(), while ensuring that they are written regularly.
2007-04-17 01:16:46 +00:00
ad
9abeea588a Replace some uses of lockmgr() / simplelocks. 2007-02-15 15:40:50 +00:00
perseant
2ac2813b6e Use lockstatus instead of a homebrewed locking system to control
LFCNWRAPSTOP and LFCNWRAPGO.

Be less verbose about the various looping checks: use log() rather than
printf(), and only log anything if we are really looping ("count = 2" is
not an error condition).

Allow dirops sleeping on available space to be interruptible.
2006-09-28 23:08:23 +00:00
perseant
8c43e08b21 Don't remark a locked inode with IN_MODIFIED after writing it to disk,
if we ourselves hold the lock.  This prevents e.g. mknod from hanging
indefinitely.

Also, always use the return value from VOP_ISLOCKED to determine whether
we hold the lock or someone else does, rather than looking into the lock
structure ourselves.
2006-09-15 18:50:49 +00:00
yamt
9d3e3eab23 merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
	- implement an alternative replacement policy
2006-09-15 15:51:12 +00:00
perseant
437e855235 Changes to help the roll-forward agent, to wit:
* Mark being-deleted files in the Ifile so we can finish deleting them
  at fs mount time.
* Flag the Ifile with "cleaner must clean" when writers are waiting for
  the cleaner, rather than relying solely on the cleaner's estimation of
  whether it should clean or not.
* Note partial segments written by a user agent (in particular,
  fsck_lfs) so that repeated rolls forward don't interfere with one
  another.
* Add a new fcntl, LFCNPASS, that allows the log to wrap exactly once,
  for better testing of the validity of checkpoints.
* Keep track of the on-disk nlink count when cleaning, so that we don't
  partially complete directory operations while cleaning.
* Ensure that every single Ifile inode write represents a consistent
  view of the filesystem.  In particular, the accounting for the segment
  we are writing the inode into must be correct, and the accounting for
  the segment that inode used to reside in must be correct.  Rather than
  just rewriting the inode if we wrote it wrong, rewrite the necessary
  ifile blocks before writing the inode so we never write it wrong.
* Don't unmark any VDIROP vnodes if we haven't written them to disk,
  avoiding yet another problem with the "wait for the cleaner" error
  return from lfs_putpages().

Also, move the last callback to an aiodone call, so we no longer do any
memory management from interrupt context.
2006-09-01 19:41:28 +00:00
martin
12cf319c62 Fix size confusion with lfs_fhandle - and as it now turns out to be the same
as the lfs compat_30_fhandle, g/c the latter.
Add an alias for the LFCNIFILEFH fcntl, so that binaries compiled in the
meantime (with too large lfs_fhandle) continue to work.

This makes vfs_cleanerd work again after the kernel checks filehandle size
more strictly (problem reported by Kurt Schreiner on current-users).
2006-08-06 12:34:12 +00:00
martin
b4cb63a646 Make filehandles opaque to userland 2006-07-31 16:34:42 +00:00
perseant
20227e112e Note partial segments that are written by the cleaner, to help out the
roll-forward agent.
2006-07-20 23:16:50 +00:00
martin
3fb505e6b2 Version the lfs_cleanerd internal fcntl() for filehandles too,
so old cleaners should work with newer kernels.
2006-07-13 22:05:52 +00:00
martin
a3b5baed42 Fix alignement problems for fhandle_t, exposed by gcc4.1.
While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.

Discussed on tech-kern, with lots of help from yamt (thanks!).
2006-07-13 12:00:24 +00:00
perseant
1c57171fe3 Change LFCNWRAP{STOP,GO} to make them more suitable for snapshotting; in
particular, the caller can now choose whether to wait for the condition
to be met, and if the caller of LFCNWRAPSTOP dies or otherwise closes
the descriptor, the filesystem is started again.  Updated the ckckp
regression test to use the new semantics.

dump_lfs(8) now uses the fcntls to implement LFS-style snapshotting through
the -X flag, addressing PR#33457 albeit not using fss(4).  Fixed a couple
other problems with dump_lfs that manifested themselves during testing.
2006-06-24 05:28:54 +00:00
elad
fc9422c9d9 integrate kauth. 2006-05-14 21:31:52 +00:00
perseant
285f68c114 Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when
many inodes are cleaned at once.  Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.

Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
2006-05-12 23:36:11 +00:00
perseant
ce053245eb Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree".  The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.

Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done.  This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
2006-05-04 04:22:55 +00:00
perseant
481da54fc1 Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().

A couple of locking fixes are also included as well.
2006-04-30 21:19:42 +00:00
perseant
0268059112 Introduce two fcntl calls that freeze the filesystem right at the point
where segment 0 is being considered for writing.  This allows for automated
checkpoint vailidity scanning, and could be used (in conjunction with the
existing LFCNREWIND) for e.g. snapshot dumps as well.

Include a regression test that does such scanning.

When writing the Ifile, loop through the dirty block list three times to
make sure that the checkpoint is always consistent (the first and second
times the Ifile blocks can cross a segment boundary; not so the third time
unless the segments are very small).  Discovered by using the aforementioned
regression test.
2006-04-17 20:02:34 +00:00
perseant
81ded5df65 Make lfs_vref/lfs_vunref not need to know about VXLOCK and VFREEING
explicitly (especially since we didn't know about VFREEING at all before),
but notice the EBUSY return from vget() instead.

Fix some more MP locking protocol issues, most of which were pointed out by
Christian Ehrhardt this morning on tech-kern.
2006-04-13 23:46:28 +00:00
perseant
07ebfab840 Optimize the free list search a little more; in particular use words
instead of bytes for the index, and never search below fs->lfs_freehd.

Fix a bug in the previous version of the search (an erroneous assumption
that ino_t was signed).

Free the bitmap when we unmount the filesystem.
2006-04-10 21:20:19 +00:00
perseant
39ce23c169 Implement a somewhat finer-grained mechanism for paging LFS-backed pages.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
2006-04-08 00:26:34 +00:00
perseant
ff84dd347a Keep the free list ordered. This solves a problem first pointed out to me
by Michel Oey, in which an aged LFS writes up to an extra Ifile block for
every file created; and paves the way for the truncation of the Ifile when
many files are deleted.
2006-04-08 00:16:56 +00:00
perseant
d28248e84e Make the segment lock aware of LWPs. Fixes a (somewhat confusing)
"lockmgr: pid 3997, not exclusive lockholder 3997, unlocking" panic I
encountered while running blogbench on an LFS.
2006-04-07 23:44:14 +00:00
perseant
dddf5c5171 Improvements to LFS's paging mechanism, to wit:
* Acknowledge that sometimes there are more dirty pages to be written to
  disk than clean segments.  When we reach the danger line,
  lfs_gop_write() now returns EAGAIN.  The caller of VOP_PUTPAGES(), if
  it holds the segment lock, drops it and waits for the cleaner to make
  room before continuing.

* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
  a page busy blocks on the cleaner while the cleaner blocks on the
  segment lock while lfs_putpages blocks on the page).
2006-03-24 20:05:32 +00:00
tls
a67eab5ee4 From Konrad Schroeder, in response to strange df output on anoncvs.netbsd.org:
We were returning the wrong value for free space.  Now we're not.
2006-03-17 23:21:01 +00:00
christos
95e1ffb156 merge ktrace-lwp. 2005-12-11 12:16:03 +00:00
christos
3544d898ac split out lfs_itimes(). It is used in fsck_lfs. 2005-09-13 04:13:25 +00:00
christos
a12024da06 Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
2005-09-12 16:24:41 +00:00
christos
0b0eb1328b Don't overload MAXNAMLEN, use a separate constant for each filesystem type. 2005-08-23 08:05:13 +00:00
yamt
84c9e5bbc1 whitespace. 2005-08-22 09:08:17 +00:00
christos
b0e192f2b6 change ino_t to u_int32_t for syscall compatibility. 2005-08-22 08:53:03 +00:00
christos
bce5269120 Move extern kernel variable declarations, into a _KERNEL protected session
so that the don't pollute userland's namespace.
2005-07-31 20:18:32 +00:00
christos
273df63602 - sprinkle const
- avoid shadow variables.
2005-05-29 21:25:24 +00:00
perseant
2ecd1730c0 Keep track of the number of segments reclaimed, since the cleaner doesn't
do this anymore (it hasn't for quite some time).  Add a couple of conditional
debugging messages to indicate why segments are not cleaned, in the event
that lfs_segclean is used.

Make the LFCNSEGWAITALL fcntl work again.
2005-05-20 19:48:25 +00:00
perseant
2f695b5476 Provide a resize_lfs(8), including kernel and cleaner support. The current
implementation requires the fs to be mounted while resizing.  Tested in both
directions, and everything appears to work happily, but ymmv.
2005-04-23 19:47:51 +00:00
perseant
f4a7694fc9 Keep per-inode, per-fs, and subsystem-wide counts of blocks allocated through
lfs_balloc(), and use that to estimate the number of dirty pages belonging
to LFS (subsystem or filesystem).  This is almost certainly wrong for
the case of a large mmap()ed region, but the accounting is tighter than
what we had before, and performs much better in the typical case of pages
dirtied through write().
2005-04-19 20:59:05 +00:00
perseant
5923fa20f1 Make userland compile again. 2005-04-16 19:52:09 +00:00