Commit Graph

1495 Commits

Author SHA1 Message Date
ad 93a5fa21eb Destroy the fraglock on unmount. 2007-02-15 17:47:56 +00:00
ad 9abeea588a Replace some uses of lockmgr() / simplelocks. 2007-02-15 15:40:50 +00:00
ad b07ec3fc38 Merge newlock2 to head. 2007-02-09 21:55:00 +00:00
elad 2ddc81b57b Add missing ')'. Noted by Paul Goyette. 2007-02-07 05:54:42 +00:00
bouyer 0ce1424e8f in ufs_dirremove swap ep->d_reclen before use if needed (affect UFS_DIRHASH
only).
While there remove an unneeded swap before compare against 0 in ufs_direnter().
Both pointed out by Pawel Jakub Dawidek on tech-kern@, thanks !
2007-02-06 20:49:20 +00:00
hannken 4d607243ba Change fstrans enum types to upper case.
No functional change.

From Antti Kantee <pooka@netbsd.org>
2007-01-29 15:42:50 +00:00
hubertf eda05c6413 Remove more duplicate headers.
Patch by Slava Semushin <slava.semushin@gmail.com>

Again, this was tested by comparing obj files from a pristine and a patched
source tree against an i386/ALL kernel, and also for src/sbin/fsck_ffs,
src/sbin/fsdb and src/usr.sbin/makefs. Only changes in assert() line numbers
were detected in 'objdump -d' output.
2007-01-29 01:52:43 +00:00
hannken 1b9c6382e3 New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE.  This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
2007-01-19 14:49:08 +00:00
isaki be241a0e51 Correct indent. 2007-01-07 09:33:18 +00:00
elad 1e70d64818 Consistent usage of KAUTH_GENERIC_ISSUSER. 2007-01-04 16:55:29 +00:00
perseant df25fd6968 Change VONWORKLST handling to better match its other uses; in particular,
check memq and clear VWRITEMAPDIRTY at the same time.
2007-01-03 02:42:23 +00:00
elad d4e1860d1a Add KAUTH_SYSTEM_CHSYSFLAGS so we can get rid of the last three
securelevel references (ufs, ext2fs, tmpfs).

Intentionally undocumented.
2007-01-02 11:18:56 +00:00
yamt 90b1120ae3 ufs_readdir: start from offsets known to be valid,
rather than assuming users feed us valid offsets.
2006-12-26 14:50:08 +00:00
yamt 8bf7662829 merge yamt-splraiseipl branch.
- finish implementing splraiseipl (and makeiplcookie).
	  http://mail-index.NetBSD.org/tech-kern/2006/07/01/0000.html
	- complete workqueue(9) and fix its ipl problem, which is reported
	  to cause audio skipping.
	- fix netbt (at least compilation problems) for some ports.
	- fix PR/33218.
2006-12-21 15:55:21 +00:00
chs 975004d9f1 several ext2fs fixes provided by Barry Bouwsma:
- set ip->i_e2fs_dtime to time_second, not time_uptime.
 - don't allow ipref to go negative
 - fs->e2fs.e2fs_icount is a valid inode number, allow it.
2006-12-09 22:07:48 +00:00
chs c398ae9734 a smorgasbord of improvements to vnode locking and path lookup:
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
   these now always return the parent vnode locked.  namei() works as before.
   lookup() and various other paths no longer acquire vnode locks in the
   wrong order via vrele().  fixes PR 32535.
   as a nice side effect, path lookup is also up to 25% faster.
 - the above allows us to get rid of PDIRUNLOCK.
 - also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
 - remove an assumption in layer_node_find() that all file systems implement
   a recursive VOP_LOCK() (unionfs doesn't).
 - require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
   fill in eopnotsupp() for file systems that don't support being exported
   and remove the checks for NULL.  (layerfs calls these without checking.)
 - in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
   adjust which vnode is locked.  fixes PR 33374.
 - apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
2006-12-09 16:11:50 +00:00
hannken 818b049a35 On snapshot creation be sure the snapshot vnode has valid quota information.
Fixes PR kern/35121
2006-12-02 17:21:11 +00:00
joerg f39fcf5763 LFS will never set SF_SNAPSHOT and doesn't support ffs_snapgone anyway.
So conditionally the calls to that function on the inclusion of FFS and
allow a LFS-only kernel to link.
2006-11-16 22:29:03 +00:00
christos 5d48d92007 ifdef out an unused function if !FFS_NO_SNAPSHOT 2006-11-16 21:21:34 +00:00
christos 168cd830d2 __unused removal on arguments; approved by core. 2006-11-16 01:32:37 +00:00
jmmv b1fe4841a1 Let ext2fs be built even when none of ffs, lfs and mfs are present. 2006-11-13 16:12:54 +00:00
reinoud dc5b5420b9 Revisit mnt_vnodelist TAILQ patch. Remove all suspicious TAILQ_FOREACH()
loops where vnodes can get removed or added during the loops. This could
lead to panic's on unmount since nodes are skipped or otherwise
TAILQ_NEXT(0xdeadbeef, ...) was dereferenced.
2006-10-25 22:01:54 +00:00
drochner bffbce1754 import a fix from FreeBSD (rev.1.185):
After a rmdir()ed directory has been truncated, force an update of
the directory's inode after queuing the dirrem that will decrement
the parent directory's link count.  This will force the update of
the parent directory's actual link to actually be scheduled.  Without
this change the parent directory's actual link count would not be
updated until ufs_inactive() cleared the inode of the newly removed
directory, which might be deferred indefinitely.  ufs_inactive()
will not be called as long as any process holds a reference to the
removed directory, and ufs_inactive() will not clear the inode if
the link count is non-zero, which could be the result of an earlier
system crash.
[plus description about problems woth background fsck solved
by this; irrelevant to NetBSD]

For me, the good effect is at least that I'm getting less filesystem
inconsistencies after a crash.

Approved by christos quite a while ago.
2006-10-24 19:36:26 +00:00
reinoud 0ce809091d Replace the LIST structure mp->mnt_vnodelist to a TAILQ structure since all
vnodes were synced and processed backwards. This meant that the last
accessed node was processed first and the earlierst last.

An extra benefit is the removal of the ugly hack from the Berkly days on
LFS.

In the proces, i've also replaced the various variations hand written loops
by the TAILQ_FOREACH() macro's.
2006-10-20 18:58:12 +00:00
yamt 65a8f1c211 ffs_truncate: don't forget to zero the past eof in the case of
blocksize < pagesize.  PR/33777 from Simon Burge.
XXX check other filesystems, esp. lfs.
2006-10-17 11:39:18 +00:00
yamt 72fc92b729 ffs_alloc: remove an assertion which is no longer true. 2006-10-15 12:23:56 +00:00
yamt 560c0c565c don't use g_glock directly. 2006-10-14 09:17:26 +00:00
yamt b7cedb8e34 handle_workitem_freefrag/handle_workitem_freeblocks:
don't fake up inode/vnode pair.
2006-10-14 07:26:29 +00:00
hannken 3e0dbf3bc5 Add __unused to unused function arguments. 2006-10-13 10:21:21 +00:00
thorpej 25433eb1b5 ufs_quotactl(): consume the arguments even if QUOTAS is not defined. 2006-10-12 04:24:40 +00:00
christos 4d595fd7b1 - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
2006-10-12 01:30:41 +00:00
chs 33c1fd1917 add support for O_DIRECT (I/O directly to application memory,
bypassing any kernel caching for file data).
2006-10-05 14:48:32 +00:00
christos b64edcaded fix empty if 2006-10-04 15:53:24 +00:00
christos 45234b0cee Coverity CID 3690: Add KASSERT to check for reverse INULL. 2006-10-03 19:04:25 +00:00
christos df9ed85b34 redo previous: It is better to add a KASSERT, since this is code is same
with ufs.
2006-10-03 19:01:29 +00:00
christos b6bf786e1c Coverity CID 3690: Reverse INULL: Add KASSERT. 2006-10-03 18:59:22 +00:00
christos e11b3365c9 Coverity CID 3689: dp cannot be NULL at this point, so don't check for it. 2006-10-03 18:54:08 +00:00
christos 2b372f902d Coverity CID 3156: async = TRUE when LFS_READWRITE is defined, leading to
dead code. Ifdef the dead code appropriately (from Arnaud Lacombe)
2006-10-03 18:24:48 +00:00
christos f1a4e9cae0 Coverity CID 2949: comment out dead code (from Arnaud Lacombe) 2006-09-29 19:37:11 +00:00
perseant 2ac2813b6e Use lockstatus instead of a homebrewed locking system to control
LFCNWRAPSTOP and LFCNWRAPGO.

Be less verbose about the various looping checks: use log() rather than
printf(), and only log anything if we are really looping ("count = 2" is
not an error condition).

Allow dirops sleeping on available space to be interruptible.
2006-09-28 23:08:23 +00:00
jld 1b78265f0e Change ffs_mount, in MNT_UPDATE case, to check dev_t's for equality
instead of just vnode pointers.  Fixes erroneous "does not match mounted
device" errors from mount(8) in the presence of MFS /dev, init.root, &c.

No objections on tech-kern.
2006-09-21 00:11:30 +00:00
perseant 8c43e08b21 Don't remark a locked inode with IN_MODIFIED after writing it to disk,
if we ourselves hold the lock.  This prevents e.g. mknod from hanging
indefinitely.

Also, always use the return value from VOP_ISLOCKED to determine whether
we hold the lock or someone else does, rather than looking into the lock
structure ourselves.
2006-09-15 18:50:49 +00:00
yamt 9d3e3eab23 merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
	- implement an alternative replacement policy
2006-09-15 15:51:12 +00:00
christos 7465b73617 add missing initializers 2006-09-02 07:04:01 +00:00
christos d74781a938 - add missing initializers
- comment out impossible code
2006-09-02 06:48:00 +00:00
christos 0dc26f6dcb remove impossible test 2006-09-02 06:46:04 +00:00
perseant 437e855235 Changes to help the roll-forward agent, to wit:
* Mark being-deleted files in the Ifile so we can finish deleting them
  at fs mount time.
* Flag the Ifile with "cleaner must clean" when writers are waiting for
  the cleaner, rather than relying solely on the cleaner's estimation of
  whether it should clean or not.
* Note partial segments written by a user agent (in particular,
  fsck_lfs) so that repeated rolls forward don't interfere with one
  another.
* Add a new fcntl, LFCNPASS, that allows the log to wrap exactly once,
  for better testing of the validity of checkpoints.
* Keep track of the on-disk nlink count when cleaning, so that we don't
  partially complete directory operations while cleaning.
* Ensure that every single Ifile inode write represents a consistent
  view of the filesystem.  In particular, the accounting for the segment
  we are writing the inode into must be correct, and the accounting for
  the segment that inode used to reside in must be correct.  Rather than
  just rewriting the inode if we wrote it wrong, rewrite the necessary
  ifile blocks before writing the inode so we never write it wrong.
* Don't unmark any VDIROP vnodes if we haven't written them to disk,
  avoiding yet another problem with the "wait for the cleaner" error
  return from lfs_putpages().

Also, move the last callback to an aiodone call, so we no longer do any
memory management from interrupt context.
2006-09-01 19:41:28 +00:00
christos 676e77765a fix missing initializers 2006-08-30 01:28:53 +00:00
christos 57b45699b2 fix incomplete initializer. 2006-08-30 01:26:47 +00:00
martin 12cf319c62 Fix size confusion with lfs_fhandle - and as it now turns out to be the same
as the lfs compat_30_fhandle, g/c the latter.
Add an alias for the LFCNIFILEFH fcntl, so that binaries compiled in the
meantime (with too large lfs_fhandle) continue to work.

This makes vfs_cleanerd work again after the kernel checks filehandle size
more strictly (problem reported by Kurt Schreiner on current-users).
2006-08-06 12:34:12 +00:00
martin b4cb63a646 Make filehandles opaque to userland 2006-07-31 16:34:42 +00:00
ad f474dceb13 Use the LWP cached credentials where sane. 2006-07-23 22:06:03 +00:00
perseant 1e9b73d972 Oops, commit the correct version of lfs_rfw.c. The roll-forward functionality
is known not to work in this version (as it did not previously) but it should
at least compile.
2006-07-20 23:56:27 +00:00
perseant 83771be892 Separate the (non-working) LFS kernel roll-forward code into its own file,
lfs_rfw.c.
2006-07-20 23:49:07 +00:00
perseant 20227e112e Note partial segments that are written by the cleaner, to help out the
roll-forward agent.
2006-07-20 23:16:50 +00:00
perseant 186ffd50ab Loop on the check for lfs_nowrap, so we don't allow a process to squeeze by. 2006-07-20 23:15:39 +00:00
perseant 5fdcd70349 Move the kauth checks up front, so that all new LFS fcntl calls are subject
to the check for superuser privilege.
2006-07-20 23:14:09 +00:00
perseant 8c161d1081 Don't try to write all the vnodes, when the cleaner needs a vnode to be
recycled.
2006-07-20 23:12:26 +00:00
martin 74709a8860 Apply _KERNEL_OPT 2006-07-13 22:08:00 +00:00
martin 3fb505e6b2 Version the lfs_cleanerd internal fcntl() for filehandles too,
so old cleaners should work with newer kernels.
2006-07-13 22:05:52 +00:00
martin a3b5baed42 Fix alignement problems for fhandle_t, exposed by gcc4.1.
While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.

Discussed on tech-kern, with lots of help from yamt (thanks!).
2006-07-13 12:00:24 +00:00
perseant a2aa7212a8 Protect lfs_order_freelist() with the segment lock. 2006-07-06 22:27:19 +00:00
perseant b8ec630ade Fix a typo that caused a "multiple free" panic on unmounting a resized lfs. 2006-07-06 22:14:18 +00:00
perseant b99e4c8268 Don't wake up the cleaner if the filesystem is unwrappable, and fix the
compatibility fcntls.

Also includes one-line fixes for an MP locking bug and a zero-length FINFO
problem that manifested during testing.
2006-06-29 19:28:21 +00:00
perseant 1c57171fe3 Change LFCNWRAP{STOP,GO} to make them more suitable for snapshotting; in
particular, the caller can now choose whether to wait for the condition
to be met, and if the caller of LFCNWRAPSTOP dies or otherwise closes
the descriptor, the filesystem is started again.  Updated the ckckp
regression test to use the new semantics.

dump_lfs(8) now uses the fcntls to implement LFS-style snapshotting through
the -X flag, addressing PR#33457 albeit not using fss(4).  Fixed a couple
other problems with dump_lfs that manifested themselves during testing.
2006-06-24 05:28:54 +00:00
yamt e408053d1b fix a simonb-timecounters regression.
the precision of getnanotime() is not suitable for file timestamps.
esp. when it's nfs-exported.

- introduce vfs_timestamp().
  (the name is from freebsd.  currently merely a wrapper of nanotime())
- for ufs-like filesystems, use it rather than getnanotime().

XXX check other filesystems.
2006-06-23 14:13:02 +00:00
hannken 442bf57d1c softdep_sync_metadata: If vp is a block device it may have new I/O requests
posted for it even if the vnode is locked. This will deadlock with wmesg
"softgetdbuf" if it gets a BMSAFEMAP dependency as here we have "bp == nbp"
and try to get a buffer we already own.

Approved by: Frank van der Linden <fvdl@netbsd.org>
2006-06-12 16:37:00 +00:00
kardel 1276c3051e PR 33697: complete timecounter conversion 2006-06-11 09:26:04 +00:00
kardel de4337ab21 merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
  time.tv_sec -> time_second
- struct timeval mono_time is gone
  mono_time.tv_sec -> time_uptime
- access to time via
	{get,}{micro,nano,bin}time()
	get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
  Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
  NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
2006-06-07 22:33:33 +00:00
perseant 402f3abc7a Read the inode version number fro a more reliable source, quelling a
diagnostic assertion panic.
2006-05-24 21:08:00 +00:00
cube d897e3cfdb Include <sys/kauth.h> because it's needed. 2006-05-21 22:51:27 +00:00
perseant 0e0bb04d7a Fix a bug in which FINFOs were written with a version number of zero.
Add assertions and add this to the DEBUG fip test in lfs_writeseg.
2006-05-20 01:10:18 +00:00
perseant 6e53d31f5c Break out the finfo array manipulation code into two new functions,
lfs_acquire_finfo() and lfs_release_finfo().  Add a debugging check
for zero-length finfo arrays in the segment summary to avoid future
regressions.
2006-05-18 23:15:09 +00:00
perseant 758cf626b4 Don't duplicate the LFS_STARVED_FOR_SEGS check (an oversight that came
in with rev 1.210).
2006-05-18 00:57:13 +00:00
perseant 48e300c97f Don't be quite so eager to error out from lfs_putpages() when pages are
busy; if we've sensed a possible 3-way deadlock and are not the pagedaemon,
relock and try again.
2006-05-17 19:47:09 +00:00
christos f1e7ec5164 we need <sys/kauth.h> for the kernel. 2006-05-15 03:01:50 +00:00
christos 2536b870ce Don't include <sys/kauth.h>; breaks userland (newfs_lfs) 2006-05-15 00:45:57 +00:00
elad fc9422c9d9 integrate kauth. 2006-05-14 21:31:52 +00:00
christos 12b7ab5f0b Correct a bogus expression gcc4 found. 2006-05-14 05:27:59 +00:00
perseant 285f68c114 Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when
many inodes are cleaned at once.  Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.

Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
2006-05-12 23:36:11 +00:00
mrg 084c052803 quell GCC 4.1 uninitialised variable warnings.
XXX: we should audit the tree for which old ones are no longer needed
after getting the older compilers out of the tree..
2006-05-10 21:53:14 +00:00
perseant 935530188d Change VOP_FCNTL to take an unlocked vnode. Approved by wrstuden@. 2006-05-04 16:48:16 +00:00
perseant ce053245eb Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree".  The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.

Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done.  This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
2006-05-04 04:22:55 +00:00
perseant e807d08027 Fix a "locking against myself": lfs_flush_dirops() doesn't need to lock the
vnodes to write their blocks, since it holds the segment lock.
2006-05-02 00:52:26 +00:00
perseant 8696fd25e2 Don't ever partially write dirops, even if we need the cleaner to run.
This increases the chances of the "no clean segments" panic slightly,
but allows us to run the ckckp regression test successfully to completion.
2006-05-01 19:47:29 +00:00
perseant 8fc4e510a9 Add an explicit list initialization that was missing from my last commit. 2006-04-30 21:59:58 +00:00
perseant 481da54fc1 Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().

A couple of locking fixes are also included as well.
2006-04-30 21:19:42 +00:00
yamt 1d3a67174f remove unused FFS_NAMES and LFS_NAMES. 2006-04-23 14:15:12 +00:00
perseant 7119533fb9 Fix a fencepost error in the bitmap handling in extend_ifile(), and another
in lfs_freelist_prev().
2006-04-22 00:12:45 +00:00
perseant 7cd0266a27 Regression test improvements:
Move the stop for LFCNWRAPSTOP to the point at which writing at segment 0
is really about to commence, since this is what the test expects (and
incidentally what a snapshotting utility wants as well).

More correctly reconstruct the on-disk state at every checkpoint, rather
than relying on the entire state at the point of wrapping to be accurate
(that is only true the first time we wrap).  Add a "make abort" target to
make rerunning the test more convenient when it has failed and we're done
analyzing the failure.
2006-04-22 00:10:54 +00:00
perseant 5f627fe958 Avoid a possible sign overflow condition in lfs_truncate, which would result
in a buffer overflow (underflow).  Coverity CID 1521.
2006-04-19 00:22:15 +00:00
perseant 80a505b9f7 Don't roll forward if we aren't given a process context. Coverity CID 1076. 2006-04-18 23:40:47 +00:00
perseant e52cd940c0 Get rid of the LFS_FORCE_WRITE case. We never really used it, and it could
panic the kernel if cleaner daemon passed the right combination of arguments.
Coverity CID 2741.
2006-04-18 22:42:33 +00:00
perseant f58c67b02f Yet another MP locking issue. 2006-04-18 21:41:20 +00:00
christos 53ae068fc6 Coverity CID 746: Remove dead code. lbn >= NDADDR is mutually exclusive to
snapshot_locked == 0.
2006-04-18 21:39:03 +00:00
perseant 0268059112 Introduce two fcntl calls that freeze the filesystem right at the point
where segment 0 is being considered for writing.  This allows for automated
checkpoint vailidity scanning, and could be used (in conjunction with the
existing LFCNREWIND) for e.g. snapshot dumps as well.

Include a regression test that does such scanning.

When writing the Ifile, loop through the dirty block list three times to
make sure that the checkpoint is always consistent (the first and second
times the Ifile blocks can cross a segment boundary; not so the third time
unless the segments are very small).  Discovered by using the aforementioned
regression test.
2006-04-17 20:02:34 +00:00
christos 0bc8039fc6 Coverity CID 1166: Add KASSERT before deref. 2006-04-15 05:32:29 +00:00
christos 3d772305a8 Coverity CID 1169: Add KASSERT before deref. 2006-04-15 05:31:18 +00:00
christos e14b3e8165 Coverity CID 2858: Avoid NULL deref. 2006-04-15 05:29:10 +00:00
christos 17ed031f90 Coverity CID 2499: Fix uninitialize variable use. 2006-04-15 05:19:08 +00:00
christos 6555ff0ad3 From my posting of April 3 to tech-kern:
My understanding is that the CLRSIG() is supposed to clear the signal
that was sent to the syncer process to prevent it from being delivered
to the syncer process in case unmounting fails, so that the syncer process
does not die while the filesystem is still mounted. The typical scenario
is, the syncher process is tsleep()ing in the kernel, and waking up when
it needs to do work. If someone sends a signal to it, eg. kill -TERM
the mfs process, then the kernel will try to unmount the mfs filesystem
before delivering the signal to the process. If that unmount fails, then
we should not really kill the process because that will hang the mount.
So we call CLRSIG() to stop the signal from being delivered.

So the first call to issignal() will return the signal number that was
sent to the syncer process (unless someone malicious was able to send
a lower numbered signal between the time tsleep() returned and we called
issignal()... something that is not really easy to do). But you are
right, we should not be calling it many times as a side effect of this
macro.

Rewrite CLRSIG() clear all the signals and call issignal() the correct
number of times.
2006-04-15 01:16:40 +00:00
perseant 81ded5df65 Make lfs_vref/lfs_vunref not need to know about VXLOCK and VFREEING
explicitly (especially since we didn't know about VFREEING at all before),
but notice the EBUSY return from vget() instead.

Fix some more MP locking protocol issues, most of which were pointed out by
Christian Ehrhardt this morning on tech-kern.
2006-04-13 23:46:28 +00:00
perseant 575f22cf94 Another MP locking fix. 2006-04-11 22:08:00 +00:00
perseant 74b70f471b Remove mostly useless BUFPAGES warning message from lfs_{un,}mount. 2006-04-10 23:51:50 +00:00
bouyer eb7f9aba74 Revert previous; I mixed bpp and *bpp when reading ffs_balloc_ufs1().
ffs_balloc() will always allocate a new buffer or leave it as NULL,
so coverity is wrong here, we're not using a freed argument.
2006-04-10 22:01:06 +00:00
bouyer a4181a9049 If we brelse ibp, set ibp to NULL, to avoid reusing it later in balloc()
or in our code at the next iteration.
Coverity ID 2706
2006-04-10 21:50:18 +00:00
perseant 07ebfab840 Optimize the free list search a little more; in particular use words
instead of bytes for the index, and never search below fs->lfs_freehd.

Fix a bug in the previous version of the search (an erroneous assumption
that ino_t was signed).

Free the bitmap when we unmount the filesystem.
2006-04-10 21:20:19 +00:00
perseant 017f856cba Don't leak vnode references if we fail to lock a vnode in lfs_flush_pchain().
Also fix another (probably only academic) simple_lock protocol error.
2006-04-10 21:17:21 +00:00
perseant fbf75b2bf7 Correct a locking bug in the recent pager optimization. 2006-04-10 18:42:48 +00:00
yamt 539544d937 ffs_gop_size: revert a problematic part of 1.78.
problems reported by Kouichirou Hiratsuka and Jukka Salmi on current-users@.
2006-04-09 21:59:35 +00:00
perseant 39ce23c169 Implement a somewhat finer-grained mechanism for paging LFS-backed pages.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
2006-04-08 00:26:34 +00:00
perseant ff84dd347a Keep the free list ordered. This solves a problem first pointed out to me
by Michel Oey, in which an aged LFS writes up to an extra Ifile block for
every file created; and paves the way for the truncation of the Ifile when
many files are deleted.
2006-04-08 00:16:56 +00:00
perseant 7c22dcc8a6 Several minor bug fixes:
* Correct (weak) segment lock assertions in lfs_fragextend and lfs_putpages.
* Keep IN_MODIFIED set if we run out of avail in lfs_putpages.
* Don't try to (re)write buffers on a VBLK vnode; fixes a panic I found
  while running with an LFS root.
* Raise priority of LFCNSEGWAIT to PVFS; PUSER is way too low for
  something the pagedaemon is relying on.
2006-04-07 23:59:28 +00:00
perseant d28248e84e Make the segment lock aware of LWPs. Fixes a (somewhat confusing)
"lockmgr: pid 3997, not exclusive lockholder 3997, unlocking" panic I
encountered while running blogbench on an LFS.
2006-04-07 23:44:14 +00:00
uwe 7494d34448 Tell config to generate fs_ffs.h as vfs_bio.c checks for defined(FFS).
Include that header in vfs_bio.c so that bioops are not redefined.
2006-04-05 00:52:16 +00:00
pavel 929734802b Correct typo in a panic message. 2006-04-04 17:12:57 +00:00
perseant 51afd83ada Make sure we unlock to zero when avoiding 3-way deadlock; otherwise we
simply have a different form of deadlock.
2006-04-01 00:13:01 +00:00
perseant 418bf18f53 Handle the "filesystem is clean" flag correctly when upgrading from
read-only to read-write mount.  This makes "root on lfs" work for me,
although it looks like a different traceback from PR#32667.
2006-03-31 02:31:37 +00:00
yamt c5fcdd1719 some cleanups after the introduction of GOP_SIZE_MEM flag.
- remove GOP_SIZE_READ/GOP_SIZE_WRITE flags.
  they have not been used since the change.
- ufs_balloc_range: remove code which has been no-op since the change.
  thanks Konrad Schroder for explaining the original intention of the code.
- ffs_gop_size: don't extend past eof, in the case of GOP_SIZE_MEM.
  otherwise genfs_getpages end up to allocate pages past eof unnecessarily.
2006-03-30 12:40:06 +00:00
perseant 0a4e8d80c1 Double-checkpoint on unmount. This ensures that vnodes belonging to removed
files are really freed, preventing occasional spurious EBUSY returns from
vflush().
2006-03-28 23:57:41 +00:00
perseant afc725a1c7 Don't let the pagedaemon wait for pages, since that is just asking for
a deadlock.
2006-03-28 01:29:55 +00:00
perseant dddf5c5171 Improvements to LFS's paging mechanism, to wit:
* Acknowledge that sometimes there are more dirty pages to be written to
  disk than clean segments.  When we reach the danger line,
  lfs_gop_write() now returns EAGAIN.  The caller of VOP_PUTPAGES(), if
  it holds the segment lock, drops it and waits for the cleaner to make
  room before continuing.

* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
  a page busy blocks on the cleaner while the cleaner blocks on the
  segment lock while lfs_putpages blocks on the page).
2006-03-24 20:05:32 +00:00
hannken cd28767efa ffs_balloc*(): Add an assertion for "bpp != NULL" if B_METAONLY is set.
From Coverity CIDs 1170..1173
2006-03-23 11:16:47 +00:00
matt 0486735479 More MALLOC -> malloc changes. 2006-03-19 17:50:42 +00:00
rtr aa6b2db95f init struct vnode *vp = NULL
coverity 2724 / run 6
XXX in future runs coverity may complain about deref NULL now but comment
    on line 382 indicates this should not be possible
2006-03-19 04:10:02 +00:00
rtr 7818c9e2d0 don't bother checking of ts == NULL before assigning since we know that
it is.
solves coverity 2725 / run 6
2006-03-19 03:58:34 +00:00
bouyer 9d8928a40d Fix dead error condition, coverity ID 747. 2006-03-18 13:56:51 +00:00
bouyer d8a43c47ae Fix a dead error condition, coverity ID 603. 2006-03-18 13:54:21 +00:00
bouyer b1dc0ca141 Remove dead code, fixing coverity ID 745. nameiop can only be CREATE
or DELETE here. This code got cut-n-pasted from ufs_loolup.c, but
is only used in whiteout support. ext2fs doesn't support whiteout.
2006-03-18 13:49:19 +00:00
bouyer f7123013b8 bread() will always return a valid bp. So remplace the (always true) if (bp)
with a KASSERT.
Should fix Coverity ID 2444.
2006-03-18 12:48:38 +00:00
christos 5a57baa413 don't use MALLOC with a non-constant size; use malloc instead. 2006-03-17 23:29:07 +00:00
tls a67eab5ee4 From Konrad Schroeder, in response to strange df output on anoncvs.netbsd.org:
We were returning the wrong value for free space.  Now we're not.
2006-03-17 23:21:01 +00:00
christos 1b2709754a cleanup more SET/CLR/ISSET lossage 2006-03-05 17:33:33 +00:00
yamt ec5a93183a merge yamt-uio_vmspace branch.
- use vmspace rather than proc or lwp where appropriate.
  the latter is more natural to specify an address space.
  (and less likely to be abused for random purposes.)
- fix a swdmover race.
2006-03-01 12:38:10 +00:00
thorpej 58853410ae Use device_class() instead of accessing dv_class directly. 2006-02-21 04:32:38 +00:00
perry fbae48b901 Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.
2006-02-16 20:17:12 +00:00
dsl 6f0f9f8763 Make almost everything #include <sys/bswap.h> instead of <machine/bswap.h>
The bswap.h and endian.h files are all rather incestuous, but I want to
get the constant folding stuff into one place - sys/bswap.h
2006-01-29 21:42:40 +00:00
christos 9c6e6ff8b2 Protect against uio_lwp being NULL from Pavel Cahyna 2006-01-14 23:49:59 +00:00
yamt 03f80508d6 - unify ffs_blkatoff and lfs_blkatoff.
- remove ufs_ops::uo_blkatoff.
- add directory read-ahead code.  (disabled for now.)
2006-01-14 17:41:16 +00:00
yamt 77e5f3531a make ufsdirhash_pool static. 2006-01-14 09:09:39 +00:00
yamt 3a6eed1f58 pull freebsd's ufs_lookup.c rev.1.53 and 1.54. PR/31873.
> ----------------------------
> revision 1.54
> date: 2001/08/26 01:25:12;  author: iedowse;  state: Exp;  lines: +30 -12
> When compacting directories, ufs_direnter() always trusted DIRSIZ()
> to supply the number of bytes to be bcopy()'d to move an entry. If
> d_ino == 0 however, DIRSIZ() is not guaranteed to return a sensible
> length, so ufs_direnter could end up corrupting a directory during
> compaction. In practice I believe this can only happen after fsck_ffs
> has fixed a previously-corrupted directory.
>
> We now deal with any mid-block unused entries specially to avoid
> using DIRSIZ() or bcopy() on such entries. We also ensure that the
> variables 'dsize' and 'spacefree' contain meaningful values at all
> times. Add a few comments to describe better this intricate piece
> of code.
>
> The special handling of mid-block unused entries makes the dirhash-
> specific bugfix in the previous revision (1.53) now uncecessary,
> so this change removes it.
>
> Reviewed by:  mckusick
> ----------------------------
> revision 1.53
> date: 2001/08/22 01:35:17;  author: iedowse;  state: Exp;  lines: +2 -2
> When compressing directory blocks, the dirhash code didn't check
> that the directory entry was in use before attempting to find it
> in the hash structures to change its offset. Normally, unused
> entries do not need to be moved, but fsck can leave behind some
> unused entries that do. A dirhash sanity panic resulted when the
> entry to be moved was not found. Add a check that stops entries
> with d_ino == 0 from being passed to ufsdirhash_move().
2006-01-14 09:09:02 +00:00
yamt 6af60103dc FSFMT: whitespace. 2006-01-13 00:50:58 +00:00
yamt eaebcf6b5b ufsdirhash_build: yield cpu when looping on directory entries. 2006-01-13 00:50:25 +00:00
yamt 2fc5e44a62 remove an obsolete prototype. 2006-01-06 09:27:55 +00:00
yamt 7b826aac85 initialize necessary members of struct buf. PR/32462 from Reinoud Zandijk. 2006-01-06 09:21:44 +00:00
yamt 690d424f28 - add simple functions to allocate/free a buffer for i/o.
- make bufpool static.
2006-01-04 10:13:05 +00:00
chs 0545b6e0cb changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
   make those fields always present.
 - for functions which are conditionally inline, make them never inline.
 - remove some other functions which are conditionally defined but
   don't actually do anything anymore.
 - make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.
2005-12-27 04:06:45 +00:00
perry 3d4ed1fbc7 __inline__ -> inline 2005-12-24 23:41:33 +00:00
perry 0f0296d88a Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete. 2005-12-24 20:45:08 +00:00
rpaulo fc2fb45bf0 Convert UFS_EXTATTR to struct lwp. 2005-12-23 23:20:00 +00:00
yamt 523e856cba prevent in-core vnode being freed from getting new references.
otherwise, once the corresponding bit in the inode bitmap is cleared,
an unrelated inode with the same inode number can be allocated and
ufs_ihashget() picks a stale in-core vnode for it.

PR/32301 by Matthias Scheler.
2005-12-23 15:31:40 +00:00
christos fff3c8238c add fwd declaration for struct proc. Fixes vax build. 2005-12-13 16:25:59 +00:00
christos 95e1ffb156 merge ktrace-lwp. 2005-12-11 12:16:03 +00:00
yamt 221616873d merge yamt-readahead branch. 2005-11-29 22:52:02 +00:00
dsl d59e7ef247 Force some multiplies to give a 64 bit result to avoid dirsize being zero
and causing a divide by zero trap later.
Fixes a panic noted in netbsd-help.
2005-11-27 11:45:56 +00:00
yamt 6a17dd42f4 - ignore truncation for VCHR/VBLK/VFIFO as it used to be
before yamt-vop merge.  PR/32049 from Atsushi Onoe.
- reject setattr which attempts to change size of VLNK/VSOCK.
2005-11-11 15:50:57 +00:00
gdt 2de7c6cd0d Adjust signature of softdep_freefile (dummy stub which always panics
if called) to match ffs_extern.h so that kernels w/o softdep can compile.
2005-11-02 22:10:41 +00:00
yamt a748ea88dd merge yamt-vop branch. remove following VOPs.
VOP_BLKATOFF
	VOP_VALLOC
	VOP_BALLOC
	VOP_REALLOCBLKS
	VOP_VFREE
	VOP_TRUNCATE
	VOP_UPDATE
2005-11-02 12:38:58 +00:00
simonb ad33b0d825 We don't need <sys/systm.h> here. 2005-10-30 23:34:34 +00:00
simonb 1d1300cd80 Only include <sys/systm.h> if _KERNEL is defined. 2005-10-30 23:34:04 +00:00
yamt aec75b1cc6 - change the way to specify a bufq strategy. (by string rather than by number)
- rather than embedding bufq_state in driver softc,
  have a pointer to the former.
- move bufq related functions from kern/subr_disk.c to kern/subr_bufq.c.
- rename method to strategy for consistency.
- move some definitions which don't need to be exposed to the rest of kernel
  from sys/bufq.h to sys/bufq_impl.h.
  (is it better to move it to kern/ or somewhere?)
- fix some obvious breakage in dev/qbus/ts.c.  (not tested)
2005-10-15 17:29:10 +00:00
chs 6c50e54c82 avoid the need for a bogus initializer. 2005-10-08 03:21:17 +00:00
yamt baee927713 introduce "ufs_ops" and use it for ITIMES. 2005-09-27 06:48:55 +00:00
yamt 050407b699 change um_maxfilesize to unsigned as its on-disk counterpart is. 2005-09-27 06:48:16 +00:00
yamt d3a07546a6 revert ffs_snapshot.c 1.20 because it's bogus. pointed by Simon Burge. 2005-09-26 14:10:32 +00:00
yamt 6138b82a56 always use nanotime rather than time.
it's bad to mix nanotime and time because it sometimes
make timestamps go backwards.
2005-09-26 13:52:20 +00:00
jmmv 9ba32cead7 Follow compat naming tradition: rename compat_export_args to export_args30. 2005-09-25 21:17:05 +00:00
jmmv 2a3e5eeb7c Apply the NFS exports list rototill patch:
- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
  function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
  file sys/nfs/nfs_export.c.  The former was becoming large and its code
  is always compiled, regardless of the build options.  Using the latter,
  the code is only compiled in when NFSSERVER is enabled.  While doing this,
  also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
  path and a set of export entries.  At the moment it can only clear the
  exports list or append entries, one by one, but it is done in a way that
  allows setting the whole set of entries atomically in the future (see the
  comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
  that it becomes file system agnostic.  In fact, all this whole thing was
  done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
  exports initialization; done internally by the kernel when initializing
  the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
  subsystems can run arbitrary code upon receipt of specific VFS events.
  At the moment, this only provides support for unmount and is used to
  destroy NFS exports lists from the file systems being unmounted, though it
  has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
2005-09-23 12:10:31 +00:00
rpaulo 3c4f143c6e Fix bogus if-clause introduced in previous revision. 2005-09-22 14:04:29 +00:00
rpaulo a12bed5a16 In ffs_unmount(), detect EOPNOTSUPP errno returned from
ufs_extattr_stop().

From FreeBSD.
2005-09-22 13:50:55 +00:00
rpaulo 1b8fb7a81f In ufs_extattr_stop(), if we haven't started yet, errno must be set
before bailing out.

From FreeBSD.
2005-09-22 13:49:03 +00:00
yamt 6dadccf7c5 ufs_balloc_range: correct range to clear PG_RDONLY.
fix a panic in ubc_fault.
2005-09-14 10:33:25 +00:00
christos ebc4ea57cf redefine panic if we are a user program. 2005-09-13 04:40:42 +00:00
christos 3544d898ac split out lfs_itimes(). It is used in fsck_lfs. 2005-09-13 04:13:25 +00:00
christos 49840169c0 Add another KASSERT. 2005-09-12 20:26:44 +00:00
christos c93a283e5f - access the ffs and ext2fs itimes functions through a pointer, so that
if the filesystem is not compiled in the kernel still links. Probably
  a better solution is to use weak symbols.
- move the filesystem-specific itime macros to the filesystem header files.
2005-09-12 20:23:03 +00:00
christos 30b59dc1e8 Add a KASSERT like the one ffs has. 2005-09-12 20:21:18 +00:00
drochner 9cde940a73 move the new ffs_itimes() to a berr place -- ffs_subr.c is shared with
userland
2005-09-12 20:09:59 +00:00
christos a12024da06 Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
2005-09-12 16:24:41 +00:00
rpaulo ffd7544c80 Add missing '$' in __RCSID(). 2005-09-12 16:10:11 +00:00
rpaulo f2b738e568 In ufs_extattr_start(), unlock uepm_lock when bailing out.
Ok'd Jason Thorpe.
2005-09-12 16:09:06 +00:00
yamt d8798fec66 - for pagecache dependency, track which page in the block
has been written or not individually by (ab)using b_resid
  in pcbp as a bitmap.
- add a comment to explain why it's needed.

PR/15364.  reviewed by Chuck Silvers.
2005-09-09 15:04:07 +00:00
yamt 5b4c989faf revert the code to expand putpage requests to block boundary.
because:
	- it was incomplete in some cases.
	- it can confuse pagedaemon.
see PR/15364 for details.
2005-09-09 15:00:39 +00:00
xtraeme 23ebf62d26 * Remove __P()
* Use ANSI function declarations on ext2fs and mfs
2005-08-30 22:01:12 +00:00
thorpej e1afed9c2d Experimental support for extended attributes on UFS1 file systems, using a
backing file per attribute type indexed by inode number to hold the extended
attributes.

This is working pretty well on my test systems, except for the "autostart"
feature.  I need someone with a better handle on the VFS locking protocol
to go over that.

This is a work-in-progress.  There are parts of this that could be re-factored
allowing this approach to be used on other types of file systems.

Adapted from FreeBSD.
2005-08-28 19:37:58 +00:00
yamt 4c32aa5945 PRId64 -> ld in UVMHIST_LOG format strings. 2005-08-24 10:19:43 +00:00
yamt d5c3f1e190 ufs_readdir: don't leak kernel garbage to userland. 2005-08-23 12:27:47 +00:00
yamt 3f2c6f0661 ufs_readdir: when computing the maximum number of entries,
use _DIRENT_RECLEN(cdp, 1) instead of "4".
2005-08-23 12:27:16 +00:00
christos 0b0eb1328b Don't overload MAXNAMLEN, use a separate constant for each filesystem type. 2005-08-23 08:05:13 +00:00
yamt 84c9e5bbc1 whitespace. 2005-08-22 09:08:17 +00:00
christos b0e192f2b6 change ino_t to u_int32_t for syscall compatibility. 2005-08-22 08:53:03 +00:00
christos 23e602002f now that we've changed the _DIRENT_ALIGN macro, provide a d_fileno for struct
direct
2005-08-19 05:28:48 +00:00
christos 50f8955b6e 64 bit inode changes. 2005-08-19 02:04:03 +00:00
jmmv 38501db2ff Drop extra word from comment. 2005-08-12 22:31:51 +00:00
christos bce5269120 Move extern kernel variable declarations, into a _KERNEL protected session
so that the don't pollute userland's namespace.
2005-07-31 20:18:32 +00:00
yamt 946832fd33 revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
2005-07-26 12:14:46 +00:00
drochner e32ba1775e fix crash in mount error handling: don't free storage which was not
malloc'd
2005-07-25 11:42:38 +00:00
yamt b7bfe82866 update file timestamps for nfsd loaned-read and mmap.
PR/25279.  discussed on tech-kern@.
2005-07-23 12:18:41 +00:00
yamt 6afb995fea ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.
2005-07-21 22:00:08 +00:00
yamt 2a6dc9d02d - introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
  page size > block size.

- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
  VM_PROT_READ.
2005-07-17 09:13:35 +00:00