Commit Graph

1417 Commits

Author SHA1 Message Date
christos
12b7ab5f0b Correct a bogus expression gcc4 found. 2006-05-14 05:27:59 +00:00
perseant
285f68c114 Fixes to address the "vinvalbuf: dirty blocks" panic that can occur when
many inodes are cleaned at once.  Make sure that we write all the pages
on vnodes that are being flushed, even if we don't think there's room;
drain v_numoutput before lfs_vflush() completes.

Also, don't allow a vnode that is in the process of being cleaned to be
chosen by getnewvnode(); this avoids a segment accounting panic in the case
that a large number of inodes are fed to lfs_markv() all at once.
2006-05-12 23:36:11 +00:00
mrg
084c052803 quell GCC 4.1 uninitialised variable warnings.
XXX: we should audit the tree for which old ones are no longer needed
after getting the older compilers out of the tree..
2006-05-10 21:53:14 +00:00
perseant
935530188d Change VOP_FCNTL to take an unlocked vnode. Approved by wrstuden@. 2006-05-04 16:48:16 +00:00
perseant
ce053245eb Introduce another per-filesystem parameter, lfs_resvseg, to separate the
notion of "how many segments are reserved for the cleaner" from that of
"how many segments are not counted in lfs_bfree".  The default value
used for existing filesystems is the same as the previous implicit value
of (lfs_minfreeseg / 2 + 1), modulo some sanity checking.

Count pending dirops on a per-filesystem basis, since once we start
writing them we can't stop until we're done.  This seems to help stave off
the "no clean segments" panic in the case of filling the filesystem with
directories and small files (e.g. simultaneously unpacking more copies of
pkgsrc than will fit).
2006-05-04 04:22:55 +00:00
perseant
e807d08027 Fix a "locking against myself": lfs_flush_dirops() doesn't need to lock the
vnodes to write their blocks, since it holds the segment lock.
2006-05-02 00:52:26 +00:00
perseant
8696fd25e2 Don't ever partially write dirops, even if we need the cleaner to run.
This increases the chances of the "no clean segments" panic slightly,
but allows us to run the ckckp regression test successfully to completion.
2006-05-01 19:47:29 +00:00
perseant
8fc4e510a9 Add an explicit list initialization that was missing from my last commit. 2006-04-30 21:59:58 +00:00
perseant
481da54fc1 Postpone the segment accounting changes coming from truncation until the
inode that makes those changes valid is either written to disk by
lfs_writeinode() or discarded by lfs_vfree().

A couple of locking fixes are also included as well.
2006-04-30 21:19:42 +00:00
yamt
1d3a67174f remove unused FFS_NAMES and LFS_NAMES. 2006-04-23 14:15:12 +00:00
perseant
7119533fb9 Fix a fencepost error in the bitmap handling in extend_ifile(), and another
in lfs_freelist_prev().
2006-04-22 00:12:45 +00:00
perseant
7cd0266a27 Regression test improvements:
Move the stop for LFCNWRAPSTOP to the point at which writing at segment 0
is really about to commence, since this is what the test expects (and
incidentally what a snapshotting utility wants as well).

More correctly reconstruct the on-disk state at every checkpoint, rather
than relying on the entire state at the point of wrapping to be accurate
(that is only true the first time we wrap).  Add a "make abort" target to
make rerunning the test more convenient when it has failed and we're done
analyzing the failure.
2006-04-22 00:10:54 +00:00
perseant
5f627fe958 Avoid a possible sign overflow condition in lfs_truncate, which would result
in a buffer overflow (underflow).  Coverity CID 1521.
2006-04-19 00:22:15 +00:00
perseant
80a505b9f7 Don't roll forward if we aren't given a process context. Coverity CID 1076. 2006-04-18 23:40:47 +00:00
perseant
e52cd940c0 Get rid of the LFS_FORCE_WRITE case. We never really used it, and it could
panic the kernel if cleaner daemon passed the right combination of arguments.
Coverity CID 2741.
2006-04-18 22:42:33 +00:00
perseant
f58c67b02f Yet another MP locking issue. 2006-04-18 21:41:20 +00:00
christos
53ae068fc6 Coverity CID 746: Remove dead code. lbn >= NDADDR is mutually exclusive to
snapshot_locked == 0.
2006-04-18 21:39:03 +00:00
perseant
0268059112 Introduce two fcntl calls that freeze the filesystem right at the point
where segment 0 is being considered for writing.  This allows for automated
checkpoint vailidity scanning, and could be used (in conjunction with the
existing LFCNREWIND) for e.g. snapshot dumps as well.

Include a regression test that does such scanning.

When writing the Ifile, loop through the dirty block list three times to
make sure that the checkpoint is always consistent (the first and second
times the Ifile blocks can cross a segment boundary; not so the third time
unless the segments are very small).  Discovered by using the aforementioned
regression test.
2006-04-17 20:02:34 +00:00
christos
0bc8039fc6 Coverity CID 1166: Add KASSERT before deref. 2006-04-15 05:32:29 +00:00
christos
3d772305a8 Coverity CID 1169: Add KASSERT before deref. 2006-04-15 05:31:18 +00:00
christos
e14b3e8165 Coverity CID 2858: Avoid NULL deref. 2006-04-15 05:29:10 +00:00
christos
17ed031f90 Coverity CID 2499: Fix uninitialize variable use. 2006-04-15 05:19:08 +00:00
christos
6555ff0ad3 From my posting of April 3 to tech-kern:
My understanding is that the CLRSIG() is supposed to clear the signal
that was sent to the syncer process to prevent it from being delivered
to the syncer process in case unmounting fails, so that the syncer process
does not die while the filesystem is still mounted. The typical scenario
is, the syncher process is tsleep()ing in the kernel, and waking up when
it needs to do work. If someone sends a signal to it, eg. kill -TERM
the mfs process, then the kernel will try to unmount the mfs filesystem
before delivering the signal to the process. If that unmount fails, then
we should not really kill the process because that will hang the mount.
So we call CLRSIG() to stop the signal from being delivered.

So the first call to issignal() will return the signal number that was
sent to the syncer process (unless someone malicious was able to send
a lower numbered signal between the time tsleep() returned and we called
issignal()... something that is not really easy to do). But you are
right, we should not be calling it many times as a side effect of this
macro.

Rewrite CLRSIG() clear all the signals and call issignal() the correct
number of times.
2006-04-15 01:16:40 +00:00
perseant
81ded5df65 Make lfs_vref/lfs_vunref not need to know about VXLOCK and VFREEING
explicitly (especially since we didn't know about VFREEING at all before),
but notice the EBUSY return from vget() instead.

Fix some more MP locking protocol issues, most of which were pointed out by
Christian Ehrhardt this morning on tech-kern.
2006-04-13 23:46:28 +00:00
perseant
575f22cf94 Another MP locking fix. 2006-04-11 22:08:00 +00:00
perseant
74b70f471b Remove mostly useless BUFPAGES warning message from lfs_{un,}mount. 2006-04-10 23:51:50 +00:00
bouyer
eb7f9aba74 Revert previous; I mixed bpp and *bpp when reading ffs_balloc_ufs1().
ffs_balloc() will always allocate a new buffer or leave it as NULL,
so coverity is wrong here, we're not using a freed argument.
2006-04-10 22:01:06 +00:00
bouyer
a4181a9049 If we brelse ibp, set ibp to NULL, to avoid reusing it later in balloc()
or in our code at the next iteration.
Coverity ID 2706
2006-04-10 21:50:18 +00:00
perseant
07ebfab840 Optimize the free list search a little more; in particular use words
instead of bytes for the index, and never search below fs->lfs_freehd.

Fix a bug in the previous version of the search (an erroneous assumption
that ino_t was signed).

Free the bitmap when we unmount the filesystem.
2006-04-10 21:20:19 +00:00
perseant
017f856cba Don't leak vnode references if we fail to lock a vnode in lfs_flush_pchain().
Also fix another (probably only academic) simple_lock protocol error.
2006-04-10 21:17:21 +00:00
perseant
fbf75b2bf7 Correct a locking bug in the recent pager optimization. 2006-04-10 18:42:48 +00:00
yamt
539544d937 ffs_gop_size: revert a problematic part of 1.78.
problems reported by Kouichirou Hiratsuka and Jukka Salmi on current-users@.
2006-04-09 21:59:35 +00:00
perseant
39ce23c169 Implement a somewhat finer-grained mechanism for paging LFS-backed pages.
The writer daemon, if it does not need to flush the whole filesystem,
now only writes the vnodes for which the pagedaemon has requested pageouts
(although it does not pay attention to the page ranges the pagedaemon
supplies).
2006-04-08 00:26:34 +00:00
perseant
ff84dd347a Keep the free list ordered. This solves a problem first pointed out to me
by Michel Oey, in which an aged LFS writes up to an extra Ifile block for
every file created; and paves the way for the truncation of the Ifile when
many files are deleted.
2006-04-08 00:16:56 +00:00
perseant
7c22dcc8a6 Several minor bug fixes:
* Correct (weak) segment lock assertions in lfs_fragextend and lfs_putpages.
* Keep IN_MODIFIED set if we run out of avail in lfs_putpages.
* Don't try to (re)write buffers on a VBLK vnode; fixes a panic I found
  while running with an LFS root.
* Raise priority of LFCNSEGWAIT to PVFS; PUSER is way too low for
  something the pagedaemon is relying on.
2006-04-07 23:59:28 +00:00
perseant
d28248e84e Make the segment lock aware of LWPs. Fixes a (somewhat confusing)
"lockmgr: pid 3997, not exclusive lockholder 3997, unlocking" panic I
encountered while running blogbench on an LFS.
2006-04-07 23:44:14 +00:00
uwe
7494d34448 Tell config to generate fs_ffs.h as vfs_bio.c checks for defined(FFS).
Include that header in vfs_bio.c so that bioops are not redefined.
2006-04-05 00:52:16 +00:00
pavel
929734802b Correct typo in a panic message. 2006-04-04 17:12:57 +00:00
perseant
51afd83ada Make sure we unlock to zero when avoiding 3-way deadlock; otherwise we
simply have a different form of deadlock.
2006-04-01 00:13:01 +00:00
perseant
418bf18f53 Handle the "filesystem is clean" flag correctly when upgrading from
read-only to read-write mount.  This makes "root on lfs" work for me,
although it looks like a different traceback from PR#32667.
2006-03-31 02:31:37 +00:00
yamt
c5fcdd1719 some cleanups after the introduction of GOP_SIZE_MEM flag.
- remove GOP_SIZE_READ/GOP_SIZE_WRITE flags.
  they have not been used since the change.
- ufs_balloc_range: remove code which has been no-op since the change.
  thanks Konrad Schroder for explaining the original intention of the code.
- ffs_gop_size: don't extend past eof, in the case of GOP_SIZE_MEM.
  otherwise genfs_getpages end up to allocate pages past eof unnecessarily.
2006-03-30 12:40:06 +00:00
perseant
0a4e8d80c1 Double-checkpoint on unmount. This ensures that vnodes belonging to removed
files are really freed, preventing occasional spurious EBUSY returns from
vflush().
2006-03-28 23:57:41 +00:00
perseant
afc725a1c7 Don't let the pagedaemon wait for pages, since that is just asking for
a deadlock.
2006-03-28 01:29:55 +00:00
perseant
dddf5c5171 Improvements to LFS's paging mechanism, to wit:
* Acknowledge that sometimes there are more dirty pages to be written to
  disk than clean segments.  When we reach the danger line,
  lfs_gop_write() now returns EAGAIN.  The caller of VOP_PUTPAGES(), if
  it holds the segment lock, drops it and waits for the cleaner to make
  room before continuing.

* Note and avoid a three-way deadlock in lfs_putpages (a writer holding
  a page busy blocks on the cleaner while the cleaner blocks on the
  segment lock while lfs_putpages blocks on the page).
2006-03-24 20:05:32 +00:00
hannken
cd28767efa ffs_balloc*(): Add an assertion for "bpp != NULL" if B_METAONLY is set.
From Coverity CIDs 1170..1173
2006-03-23 11:16:47 +00:00
matt
0486735479 More MALLOC -> malloc changes. 2006-03-19 17:50:42 +00:00
rtr
aa6b2db95f init struct vnode *vp = NULL
coverity 2724 / run 6
XXX in future runs coverity may complain about deref NULL now but comment
    on line 382 indicates this should not be possible
2006-03-19 04:10:02 +00:00
rtr
7818c9e2d0 don't bother checking of ts == NULL before assigning since we know that
it is.
solves coverity 2725 / run 6
2006-03-19 03:58:34 +00:00
bouyer
9d8928a40d Fix dead error condition, coverity ID 747. 2006-03-18 13:56:51 +00:00
bouyer
d8a43c47ae Fix a dead error condition, coverity ID 603. 2006-03-18 13:54:21 +00:00
bouyer
b1dc0ca141 Remove dead code, fixing coverity ID 745. nameiop can only be CREATE
or DELETE here. This code got cut-n-pasted from ufs_loolup.c, but
is only used in whiteout support. ext2fs doesn't support whiteout.
2006-03-18 13:49:19 +00:00
bouyer
f7123013b8 bread() will always return a valid bp. So remplace the (always true) if (bp)
with a KASSERT.
Should fix Coverity ID 2444.
2006-03-18 12:48:38 +00:00
christos
5a57baa413 don't use MALLOC with a non-constant size; use malloc instead. 2006-03-17 23:29:07 +00:00
tls
a67eab5ee4 From Konrad Schroeder, in response to strange df output on anoncvs.netbsd.org:
We were returning the wrong value for free space.  Now we're not.
2006-03-17 23:21:01 +00:00
christos
1b2709754a cleanup more SET/CLR/ISSET lossage 2006-03-05 17:33:33 +00:00
yamt
ec5a93183a merge yamt-uio_vmspace branch.
- use vmspace rather than proc or lwp where appropriate.
  the latter is more natural to specify an address space.
  (and less likely to be abused for random purposes.)
- fix a swdmover race.
2006-03-01 12:38:10 +00:00
thorpej
58853410ae Use device_class() instead of accessing dv_class directly. 2006-02-21 04:32:38 +00:00
perry
fbae48b901 Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.
2006-02-16 20:17:12 +00:00
dsl
6f0f9f8763 Make almost everything #include <sys/bswap.h> instead of <machine/bswap.h>
The bswap.h and endian.h files are all rather incestuous, but I want to
get the constant folding stuff into one place - sys/bswap.h
2006-01-29 21:42:40 +00:00
christos
9c6e6ff8b2 Protect against uio_lwp being NULL from Pavel Cahyna 2006-01-14 23:49:59 +00:00
yamt
03f80508d6 - unify ffs_blkatoff and lfs_blkatoff.
- remove ufs_ops::uo_blkatoff.
- add directory read-ahead code.  (disabled for now.)
2006-01-14 17:41:16 +00:00
yamt
77e5f3531a make ufsdirhash_pool static. 2006-01-14 09:09:39 +00:00
yamt
3a6eed1f58 pull freebsd's ufs_lookup.c rev.1.53 and 1.54. PR/31873.
> ----------------------------
> revision 1.54
> date: 2001/08/26 01:25:12;  author: iedowse;  state: Exp;  lines: +30 -12
> When compacting directories, ufs_direnter() always trusted DIRSIZ()
> to supply the number of bytes to be bcopy()'d to move an entry. If
> d_ino == 0 however, DIRSIZ() is not guaranteed to return a sensible
> length, so ufs_direnter could end up corrupting a directory during
> compaction. In practice I believe this can only happen after fsck_ffs
> has fixed a previously-corrupted directory.
>
> We now deal with any mid-block unused entries specially to avoid
> using DIRSIZ() or bcopy() on such entries. We also ensure that the
> variables 'dsize' and 'spacefree' contain meaningful values at all
> times. Add a few comments to describe better this intricate piece
> of code.
>
> The special handling of mid-block unused entries makes the dirhash-
> specific bugfix in the previous revision (1.53) now uncecessary,
> so this change removes it.
>
> Reviewed by:  mckusick
> ----------------------------
> revision 1.53
> date: 2001/08/22 01:35:17;  author: iedowse;  state: Exp;  lines: +2 -2
> When compressing directory blocks, the dirhash code didn't check
> that the directory entry was in use before attempting to find it
> in the hash structures to change its offset. Normally, unused
> entries do not need to be moved, but fsck can leave behind some
> unused entries that do. A dirhash sanity panic resulted when the
> entry to be moved was not found. Add a check that stops entries
> with d_ino == 0 from being passed to ufsdirhash_move().
2006-01-14 09:09:02 +00:00
yamt
6af60103dc FSFMT: whitespace. 2006-01-13 00:50:58 +00:00
yamt
eaebcf6b5b ufsdirhash_build: yield cpu when looping on directory entries. 2006-01-13 00:50:25 +00:00
yamt
2fc5e44a62 remove an obsolete prototype. 2006-01-06 09:27:55 +00:00
yamt
7b826aac85 initialize necessary members of struct buf. PR/32462 from Reinoud Zandijk. 2006-01-06 09:21:44 +00:00
yamt
690d424f28 - add simple functions to allocate/free a buffer for i/o.
- make bufpool static.
2006-01-04 10:13:05 +00:00
chs
0545b6e0cb changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
   make those fields always present.
 - for functions which are conditionally inline, make them never inline.
 - remove some other functions which are conditionally defined but
   don't actually do anything anymore.
 - make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.
2005-12-27 04:06:45 +00:00
perry
3d4ed1fbc7 __inline__ -> inline 2005-12-24 23:41:33 +00:00
perry
0f0296d88a Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete. 2005-12-24 20:45:08 +00:00
rpaulo
fc2fb45bf0 Convert UFS_EXTATTR to struct lwp. 2005-12-23 23:20:00 +00:00
yamt
523e856cba prevent in-core vnode being freed from getting new references.
otherwise, once the corresponding bit in the inode bitmap is cleared,
an unrelated inode with the same inode number can be allocated and
ufs_ihashget() picks a stale in-core vnode for it.

PR/32301 by Matthias Scheler.
2005-12-23 15:31:40 +00:00
christos
fff3c8238c add fwd declaration for struct proc. Fixes vax build. 2005-12-13 16:25:59 +00:00
christos
95e1ffb156 merge ktrace-lwp. 2005-12-11 12:16:03 +00:00
yamt
221616873d merge yamt-readahead branch. 2005-11-29 22:52:02 +00:00
dsl
d59e7ef247 Force some multiplies to give a 64 bit result to avoid dirsize being zero
and causing a divide by zero trap later.
Fixes a panic noted in netbsd-help.
2005-11-27 11:45:56 +00:00
yamt
6a17dd42f4 - ignore truncation for VCHR/VBLK/VFIFO as it used to be
before yamt-vop merge.  PR/32049 from Atsushi Onoe.
- reject setattr which attempts to change size of VLNK/VSOCK.
2005-11-11 15:50:57 +00:00
gdt
2de7c6cd0d Adjust signature of softdep_freefile (dummy stub which always panics
if called) to match ffs_extern.h so that kernels w/o softdep can compile.
2005-11-02 22:10:41 +00:00
yamt
a748ea88dd merge yamt-vop branch. remove following VOPs.
VOP_BLKATOFF
	VOP_VALLOC
	VOP_BALLOC
	VOP_REALLOCBLKS
	VOP_VFREE
	VOP_TRUNCATE
	VOP_UPDATE
2005-11-02 12:38:58 +00:00
simonb
ad33b0d825 We don't need <sys/systm.h> here. 2005-10-30 23:34:34 +00:00
simonb
1d1300cd80 Only include <sys/systm.h> if _KERNEL is defined. 2005-10-30 23:34:04 +00:00
yamt
aec75b1cc6 - change the way to specify a bufq strategy. (by string rather than by number)
- rather than embedding bufq_state in driver softc,
  have a pointer to the former.
- move bufq related functions from kern/subr_disk.c to kern/subr_bufq.c.
- rename method to strategy for consistency.
- move some definitions which don't need to be exposed to the rest of kernel
  from sys/bufq.h to sys/bufq_impl.h.
  (is it better to move it to kern/ or somewhere?)
- fix some obvious breakage in dev/qbus/ts.c.  (not tested)
2005-10-15 17:29:10 +00:00
chs
6c50e54c82 avoid the need for a bogus initializer. 2005-10-08 03:21:17 +00:00
yamt
baee927713 introduce "ufs_ops" and use it for ITIMES. 2005-09-27 06:48:55 +00:00
yamt
050407b699 change um_maxfilesize to unsigned as its on-disk counterpart is. 2005-09-27 06:48:16 +00:00
yamt
d3a07546a6 revert ffs_snapshot.c 1.20 because it's bogus. pointed by Simon Burge. 2005-09-26 14:10:32 +00:00
yamt
6138b82a56 always use nanotime rather than time.
it's bad to mix nanotime and time because it sometimes
make timestamps go backwards.
2005-09-26 13:52:20 +00:00
jmmv
9ba32cead7 Follow compat naming tradition: rename compat_export_args to export_args30. 2005-09-25 21:17:05 +00:00
jmmv
2a3e5eeb7c Apply the NFS exports list rototill patch:
- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
  function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
  file sys/nfs/nfs_export.c.  The former was becoming large and its code
  is always compiled, regardless of the build options.  Using the latter,
  the code is only compiled in when NFSSERVER is enabled.  While doing this,
  also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
  path and a set of export entries.  At the moment it can only clear the
  exports list or append entries, one by one, but it is done in a way that
  allows setting the whole set of entries atomically in the future (see the
  comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
  that it becomes file system agnostic.  In fact, all this whole thing was
  done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
  exports initialization; done internally by the kernel when initializing
  the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
  subsystems can run arbitrary code upon receipt of specific VFS events.
  At the moment, this only provides support for unmount and is used to
  destroy NFS exports lists from the file systems being unmounted, though it
  has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
2005-09-23 12:10:31 +00:00
rpaulo
3c4f143c6e Fix bogus if-clause introduced in previous revision. 2005-09-22 14:04:29 +00:00
rpaulo
a12bed5a16 In ffs_unmount(), detect EOPNOTSUPP errno returned from
ufs_extattr_stop().

From FreeBSD.
2005-09-22 13:50:55 +00:00
rpaulo
1b8fb7a81f In ufs_extattr_stop(), if we haven't started yet, errno must be set
before bailing out.

From FreeBSD.
2005-09-22 13:49:03 +00:00
yamt
6dadccf7c5 ufs_balloc_range: correct range to clear PG_RDONLY.
fix a panic in ubc_fault.
2005-09-14 10:33:25 +00:00
christos
ebc4ea57cf redefine panic if we are a user program. 2005-09-13 04:40:42 +00:00
christos
3544d898ac split out lfs_itimes(). It is used in fsck_lfs. 2005-09-13 04:13:25 +00:00
christos
49840169c0 Add another KASSERT. 2005-09-12 20:26:44 +00:00
christos
c93a283e5f - access the ffs and ext2fs itimes functions through a pointer, so that
if the filesystem is not compiled in the kernel still links. Probably
  a better solution is to use weak symbols.
- move the filesystem-specific itime macros to the filesystem header files.
2005-09-12 20:23:03 +00:00
christos
30b59dc1e8 Add a KASSERT like the one ffs has. 2005-09-12 20:21:18 +00:00
drochner
9cde940a73 move the new ffs_itimes() to a berr place -- ffs_subr.c is shared with
userland
2005-09-12 20:09:59 +00:00
christos
a12024da06 Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
2005-09-12 16:24:41 +00:00
rpaulo
ffd7544c80 Add missing '$' in __RCSID(). 2005-09-12 16:10:11 +00:00
rpaulo
f2b738e568 In ufs_extattr_start(), unlock uepm_lock when bailing out.
Ok'd Jason Thorpe.
2005-09-12 16:09:06 +00:00
yamt
d8798fec66 - for pagecache dependency, track which page in the block
has been written or not individually by (ab)using b_resid
  in pcbp as a bitmap.
- add a comment to explain why it's needed.

PR/15364.  reviewed by Chuck Silvers.
2005-09-09 15:04:07 +00:00
yamt
5b4c989faf revert the code to expand putpage requests to block boundary.
because:
	- it was incomplete in some cases.
	- it can confuse pagedaemon.
see PR/15364 for details.
2005-09-09 15:00:39 +00:00
xtraeme
23ebf62d26 * Remove __P()
* Use ANSI function declarations on ext2fs and mfs
2005-08-30 22:01:12 +00:00
thorpej
e1afed9c2d Experimental support for extended attributes on UFS1 file systems, using a
backing file per attribute type indexed by inode number to hold the extended
attributes.

This is working pretty well on my test systems, except for the "autostart"
feature.  I need someone with a better handle on the VFS locking protocol
to go over that.

This is a work-in-progress.  There are parts of this that could be re-factored
allowing this approach to be used on other types of file systems.

Adapted from FreeBSD.
2005-08-28 19:37:58 +00:00
yamt
4c32aa5945 PRId64 -> ld in UVMHIST_LOG format strings. 2005-08-24 10:19:43 +00:00
yamt
d5c3f1e190 ufs_readdir: don't leak kernel garbage to userland. 2005-08-23 12:27:47 +00:00
yamt
3f2c6f0661 ufs_readdir: when computing the maximum number of entries,
use _DIRENT_RECLEN(cdp, 1) instead of "4".
2005-08-23 12:27:16 +00:00
christos
0b0eb1328b Don't overload MAXNAMLEN, use a separate constant for each filesystem type. 2005-08-23 08:05:13 +00:00
yamt
84c9e5bbc1 whitespace. 2005-08-22 09:08:17 +00:00
christos
b0e192f2b6 change ino_t to u_int32_t for syscall compatibility. 2005-08-22 08:53:03 +00:00
christos
23e602002f now that we've changed the _DIRENT_ALIGN macro, provide a d_fileno for struct
direct
2005-08-19 05:28:48 +00:00
christos
50f8955b6e 64 bit inode changes. 2005-08-19 02:04:03 +00:00
jmmv
38501db2ff Drop extra word from comment. 2005-08-12 22:31:51 +00:00
christos
bce5269120 Move extern kernel variable declarations, into a _KERNEL protected session
so that the don't pollute userland's namespace.
2005-07-31 20:18:32 +00:00
yamt
946832fd33 revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
2005-07-26 12:14:46 +00:00
drochner
e32ba1775e fix crash in mount error handling: don't free storage which was not
malloc'd
2005-07-25 11:42:38 +00:00
yamt
b7bfe82866 update file timestamps for nfsd loaned-read and mmap.
PR/25279.  discussed on tech-kern@.
2005-07-23 12:18:41 +00:00
yamt
6afb995fea ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.
2005-07-21 22:00:08 +00:00
yamt
2a6dc9d02d - introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
  page size > block size.

- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
  VM_PROT_READ.
2005-07-17 09:13:35 +00:00
thorpej
29af9583d2 Use ANSI function decls. 2005-07-15 05:01:16 +00:00
thorpej
4457fd076f Defflag UFS_DIRHASH. 2005-07-10 01:08:52 +00:00
thorpej
175c3312a8 - Use ANSI function decls.
- Sprinkle some static.
2005-07-10 00:18:52 +00:00
kml
dab4c6d721 Ensure that we change the size of the vnode at the same time as
we change the size of the inode, and use ext2fs_size uniformly.
This fixes a crash that occurs when I create a directory, then
move it, all on an ext2 filesystem.
2005-06-28 16:53:14 +00:00
yamt
44d128fa8e - constify genfs_ops.
- use member designators.
2005-06-28 09:30:37 +00:00
atatat
df13e3579e Change the rest of the sysctl subsystem to use const consistently.
The __UNCONST macro is now used only where necessary and the RW macros
are gone.  Most of the changes here are consumers of the
sysctl_createv(9) interface that now takes a pair of const pointers
which used not to be.
2005-06-20 02:49:18 +00:00
atatat
420d91208b Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code.  I know it's not the prettiest code, but it seems to work rather
well in spite of itself.
2005-06-09 02:19:59 +00:00
dbj
7753d41b8e remove (long) cast on bpref, which is daddr_t 2005-06-06 17:10:25 +00:00
dbj
331e001f0c the cluster summary must be swapped even for ufs2 2005-06-03 01:14:07 +00:00
is
4daeda666d fix copy/paste/don'tupdate bug (fix from PR 22232 by Robert Elz). 2005-06-02 10:08:36 +00:00
christos
c76e17575e s/buf/sbuf. 2005-05-31 02:37:50 +00:00
christos
07d1f24ff5 rename delay because it is a function on sparc. 2005-05-30 22:13:22 +00:00
christos
273df63602 - sprinkle const
- avoid shadow variables.
2005-05-29 21:25:24 +00:00
hannken
a69fbd6a18 - Use an empty snap block list to set the initial file size. Snapshot is
now valid from the beginning.  No need to copy the last fs block two times.
- No need to allocate the cylinder group blocks twice.
- cgbuf -> sbbuf
2005-05-25 11:07:13 +00:00
perseant
96f8f74d91 Don't update lfs_stats.segs_reclaimed if we're not keeping statistics.
Patch from Juan RP.
2005-05-25 01:50:01 +00:00
hannken
ffa83f8f0d ffs/ffs_alloc.c:
- Add a missing ACTIVECG_CLR().

ffs/ffs_snapshot.c:
- Use async/delayed writes for snapshot creation and sync/uncache these buffers
  on end. Reduces the time the file system must be suspended.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
- Byte swap the list of preallocated blocks on read/write instead of access.
- Always keep this list on ip->i_snapblklist so it may be rolled back when the
  newest snapshot gets removed. Fixes a rare snapshot corruption when using
  more than one snapshot on a file system.

ufs/ufsmount.h:
  - Make TAILQ_LAST() possible on member um_snapshots.
  - Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
2005-05-22 08:35:28 +00:00
perseant
2ecd1730c0 Keep track of the number of segments reclaimed, since the cleaner doesn't
do this anymore (it hasn't for quite some time).  Add a couple of conditional
debugging messages to indicate why segments are not cleaned, in the event
that lfs_segclean is used.

Make the LFCNSEGWAITALL fcntl work again.
2005-05-20 19:48:25 +00:00
perseant
f8677583c3 VOP_LOCK drops the interlock; pick it up again to avoid an "already unlocked"
panic in lfs_putpages.
2005-05-20 19:09:25 +00:00
perseant
5760c21b8b Fill in the lfs_fsmnt field in the superblock when we mount the filesystem,
so fsck(8) can tell where it was last mounted.
2005-05-20 19:03:11 +00:00
hannken
a71c653aca flush_inodedep_deps(): If softdep_lookupvp() returns NULL it means the
inode has been reclaimed.  Skip the VOP_PUTPAGES() in this case.

Reviewed by: Chuck Silvers <chs@netbsd.org>
2005-05-07 14:24:14 +00:00
perseant
0d41dd0d46 Don't let the pager_map deadlock avoidance code in lfs_putpages() write
segments containing zero-block FINFO records.  These records cause segments
to become uncleanable, which would eventually result in a "no clean segments"
panic.
2005-05-04 04:58:22 +00:00
hannken
cad9d39281 Fix last commit. The last block of the file system may have changed
even if the last cylinder group is not modified.
2005-05-03 09:43:23 +00:00
perseant
5ed293c5d5 Recognize that we hold the v_interlock when relocking after a flush in
lfs_putpages.
2005-04-27 20:35:10 +00:00
skrll
d1c90589d8 Use the right arg structure for lfs_setattr, i.e. s/getattr/setattr/. 2005-04-25 06:28:51 +00:00
hannken
dc13562a0c Fix an inconsistency where the last block of the snapshot contains old data.
The last block of the file system is written to the snapshot before the
file system is suspended.  If the last cylinder group is modified after
the file system is suspended the last block of the snapshot may contain
old data.  So update this block again.
2005-04-24 15:49:37 +00:00
perseant
2f695b5476 Provide a resize_lfs(8), including kernel and cleaner support. The current
implementation requires the fs to be mounted while resizing.  Tested in both
directions, and everything appears to work happily, but ymmv.
2005-04-23 19:47:51 +00:00
yamt
5241cb4bbc don't assign to non-lvalue. found by gcc4. 2005-04-21 14:02:02 +00:00
perseant
f4a7694fc9 Keep per-inode, per-fs, and subsystem-wide counts of blocks allocated through
lfs_balloc(), and use that to estimate the number of dirty pages belonging
to LFS (subsystem or filesystem).  This is almost certainly wrong for
the case of a large mmap()ed region, but the accounting is tighter than
what we had before, and performs much better in the typical case of pages
dirtied through write().
2005-04-19 20:59:05 +00:00
perseant
f63fa194c2 Check the to-be-on-disk consistency of directories as well (correct a typo
in an earlier commit).
2005-04-18 23:03:08 +00:00
perseant
b2d19f57a3 Check for the inode having been previously freed, in UNMARK_VNODE().
Avoids a panic when calling mkdir() on a full filesystem.
2005-04-18 17:36:46 +00:00
perseant
5923fa20f1 Make userland compile again. 2005-04-16 19:52:09 +00:00
perseant
ad0169af41 Remove left-over reference to "lfs_blist", for _LKM case. 2005-04-16 18:10:12 +00:00
perseant
5ed792ecb0 Use splay trees, rather than a hash table, to manage the accounting of
blocks allocated through VOP_BALLOC() for pages to be written to disk.
This accounting no longer takes a noticeable fraction of the system CPU.
2005-04-16 17:35:58 +00:00
perseant
94decdd25d Use lfs_malloc() to manage the blkiov arrays that the cleaner functions use,
since the cleaner is likely to operate in a low-memory condition.
2005-04-16 17:28:37 +00:00
perseant
9936b8ce7e Tabify leading whitespace 2005-04-14 00:58:26 +00:00
perseant
f08a1ca4fa Consolidate the hash table we use to maintain the integrity of lfs_avail
into a single, system-wide table, rather than having a separate hash table
per inode.  Significantly reduces the "system" cpu usage of your average
file write.
2005-04-14 00:44:16 +00:00
perseant
2ee78c4fa9 Keep track of the highest block held by an LFS inode, so that we can
be assured that the last byte of a file is always allocated.  Previously
a file extension could cause the filesystem to be flushed, writing an
inconsistent inode to disk.  Although this condition would be corrected
the next time blocks were written to disk, an intervening crash would leave
the filesystem in an inconsistent state, leaving fsck_lfs to complain
of an inode "partially truncated".
2005-04-14 00:02:46 +00:00
perseant
af48a6d91c Clean up the handling of the pager_map deadlock in lfs_putpages, after
realizing that it is safe to sleep the second time through the loop.
2005-04-08 00:08:42 +00:00
perseant
c9d4fa4c0d Fix some locking issues that appeared with the simple_lock work.
Address a "pager_map" deadlock in lfs_putpages().
2005-04-06 04:30:46 +00:00
perseant
1ebfc508b6 Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case.  Add debugging segment-lock
assertion statements.
2005-04-01 21:59:46 +00:00
thorpej
e633e8b61b - Define a VFS_ATTACH() macro that places a reference to a vfsops structure
into the "vfsops" link set.
- Use VFS_ATTACH() where vfsops are declared for individual file systems.
- In vfsinit(), traverse the "vfsops" link set, rather than vfs_list_initial[].
2005-03-29 02:41:05 +00:00
christos
f2b82c7f8a make this compile again :-( 2005-03-26 19:40:31 +00:00
christos
aca59c847f Use vlog(9). Open-coding vlog here breaks lkm's because including
<sys/kprintf.h> includes opt_multiprocessor.h. One could argue
that the lock stuff should just move to subr_prf.c since nothing
else uses it.
2005-03-26 19:39:08 +00:00
perseant
bb7bbb2d16 Don't sleep while holding the vnode interlock. Should take care of the
first panic case in PR #26043.
2005-03-25 01:45:05 +00:00
bouyer
303cafe4e5 getblk() can return NULL if we are the pagedaemon. Check for this. 2005-03-24 20:13:17 +00:00
chs
f31a80ccd3 avoid the need for recursive locking lfs_flush_dirops() by unlocking
the vnode around the call to this in the caller.
2005-03-24 04:00:33 +00:00
perseant
c716c3d307 Make LFS dirops get their vnode first, before incrementing the dirop count,
to prevent a deadlock trying to call VOP_PUTPAGES() on a VDIROP vnode.
This can happen when a stacked filesystem is mounted on top of an LFS: an
LFS dirop needs to get a vnode, which is available from the upper layer.
The corresponding lower layer vnode, however, is VDIROP, so the upper layer
can't be cleaned out since its VOP_PUTPAGES() is passed through to the lower
layer, which waits for dirops to drain before it can proceed.  Deadlock.

Tweak ufs_makeinode() and ufs_mkdir() to pass the a_vpp argument through
to VOP_VALLOC().

Partially addresses PR # 26043, though it probably does not completely fix
the problem described there.
2005-03-23 00:12:51 +00:00
perseant
8e578e185f Be more careful about handling of flags to lfs_flush, to ensure that
the lfs_writing mutex is respected.
2005-03-09 22:12:15 +00:00
simonb
52c470b886 Tab Police. 2005-03-08 04:49:35 +00:00
perseant
eefd94b8e2 Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled).  Re-enable the LFS
statistics in sysctl, while I'm there.  A bit of a rototill.
2005-03-08 00:18:19 +00:00
perseant
8de99480fa Move "ifile is too large for your NBUFS/BUFPAGES" messages into a function.
Use log(9) to warn the user instead of printf(9).  Since the theory is that
the Ifile is "always in cache", but the greater performance risk is
when the inode entries can't be held in cache, note these two cases
separately, at different log levels (notice and warning, respectively).
2005-03-04 22:19:05 +00:00
christos
cac7cf0758 PR/26823: Michael L. Hitch: Endianness flag were not preserved in the compat
superblock read routine.
2005-03-04 21:45:29 +00:00
perseant
871beffabf Put the ISSPACE() check where it belongs. This allows rewriting a file
on a full filesystem while still returning ENOSPC on an attempt to allocate
new blocks.
2005-03-02 21:16:09 +00:00
perry
bcfcddbac1 nuke trailing whitespace 2005-02-26 22:31:44 +00:00
perseant
25f49c3c91 Various minor LFS improvements:
* Note when lfs_putpages(9) thinks it is not going to be writing any
  pages before calling genfs_putpages(9).  This prevents a situation in
  which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
  overestimate in most cases.  Note that if NRESERVE() is too high, it
  may be impossible to create files on the filesystem.  We catch this
  case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
  entries in indirect blocks again, triggering a failed assertion "daddr
  <= LFS_MAX_DADDR".  Explicitly convert to and from int32_t to correct
  this.
* Add a high-water mark for the number of dirty pages any given LFS can
  hold before triggering a flush.  This is settable by sysctl, but off
  (zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
  shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
  even though their v_size == 0.  Don't panic when we see this.
* Change lfs_bfree to a signed quantity.  The manner in which it is
  processed before being passed to the cleaner means that sometimes it
  may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
  lfs_statvfs(9).  This prevents df(1) from ever telling us that our full
  filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
  associated buffer headers, so that the pagedaemon doesn't run us out
  of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
  unmounted.  Because vfs_busy() is a shared lock, and
  lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
  holding the lock that umount() is blocking on, then try to vfs_busy()
  again in getnewvnode().
2005-02-26 05:40:42 +00:00
hannken
1d85e05ec4 Make `options FFS_NO_SNAPSHOT' only disable snapshot creation
while not trashing existing snapshots.

Approved by: core@
2005-02-21 17:52:11 +00:00
dsl
bd99144a6b change ffs_snapshot to !ffs_no_snapshot 2005-02-18 21:15:38 +00:00
chs
9cc4bd69b2 fix typoe in previous. 2005-02-14 02:22:48 +00:00
dsl
25579adfb4 Make ffs snapshots be enabled by 'option FFS_SNAPSHOT' 2005-02-10 22:23:19 +00:00
dsl
2f6f4269bd Add a stub file so that snapshot support can be compiled out.
Will allow INSTALL_TINY to fit back in its designated space.
Since the calling code doesn't allow a snapshot mount to fail, this code
will output a warning and delete any snapshots it finds.
This only happend on rw mounts - snapshots don't seem to be created
when mounting ro.
The whole way the snapshots gets mounted is a PITA anyway, the superblock
'last mounted' time should be used to validate that the fs hasn't been
mounted elsewhere.
2005-02-10 22:22:32 +00:00
ws
5387f7217c Add support for large files (>2GB).
Like Linux, automagically convert old filesystem to use this,
if they are already at revision 1.
For revision 0, just punt (unlike Linux; makes me a bit too nervous.)

There should be an option to fsck_ext2fs to upgrade revision 0 to revision 1.

Reviewd by Manuel (bouyer@).
2005-02-09 23:02:10 +00:00
hannken
8bb5af4d2e Fss device only checks read access to snapshot vode. On snapshot creation
check we are either super-user or owner of the snapshot vnode.
2005-02-09 16:05:29 +00:00
hannken
c13136f43f No longer needed. Ffs snapshots are enabled by default. 2005-01-31 22:21:17 +00:00
hannken
d5fbb6936f Add file system snapshots to kernel configs.
- Ffs internal snapshots get compiled in unconditionally.

- File system snapshot device fss(4) added to all kernel configs that
  have a disk.  Device is commented out on all non-GENERIC kernels.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>
2005-01-31 16:54:32 +00:00
wrstuden
442d792d00 Fix pasto in previous. We only perform the DIOCCACHESYNC call if
FSYNC_CACHE is set, not if FSYNC_WAIT is set.
2005-01-27 02:16:42 +00:00
wrstuden
e384a44e9d Extend fsync_range(2) to support the FDISKSYNC flag, which requests
that the sync be propogated out through the disk drive caches.
2005-01-25 23:55:20 +00:00
dbj
d681cb1ea9 check _KERNEL_OPT instead of !_LKM to conditionalize opt includes 2005-01-24 21:34:48 +00:00
rumble
468646676a Remove dirhash.h. 2005-01-24 01:32:22 +00:00
rumble
32386a4e99 Bring in Ian Dowse's Dirhash from FreeBSD. Hash tables of
directories are created on the fly and used to increase
performance by circumventing ufs_lookup's linear search.

Dirhash is enabled by the UFS_DIRHASH option, but not
by default.
2005-01-23 19:37:05 +00:00
hannken
5b0fdd5c72 Protect calls to ffs_*_swap' with #ifdef FFS_EI'. 2005-01-18 10:40:21 +00:00
mycroft
7f1fe4e81f Rearrange some code slightly to avoid uninitialized variable warnings. 2005-01-11 00:19:36 +00:00
chs
8975a0856f adjust the UBC mapping code to support non-vnode uvm_objects.
this means we can no longer look at the vnode size to determine how many
pages to request in a fault, which is good since for NFS the size can change
out from under us on the server anyway.  there's also a new flag UBC_UNMAP
for ubc_release(), so that the file system code can make the decision about
whether to cache mappings for files being used as executables.
2005-01-09 16:42:43 +00:00
mycroft
e72fc6717e Whoops -- move the location of the VOP_OPEN()/VOP_CLOSE(), et al, from
foo_mountfs() to foo_mount(), to match the new mountroot API.
Also, for ext2fs and lfs, copy some restructuring from ffs to allow changing
file system parameters without specifying the device name.
(ntfs could use some more work.)
2005-01-09 09:27:17 +00:00
mycroft
0461b30ac3 Rework the mountroot interface so that vfs_mountroot() opens the root device
and just passes it on to the file system functions.  This avoids opening and
closing the device several times.

Mentioned on tech-kern some time ago, IIRC.  I've been running this for a
long time.
2005-01-09 03:11:48 +00:00
thorpej
1c95472d01 Add the system call and VFS infrastructure for file system extended
attributes.

From FreeBSD.
2005-01-02 16:08:28 +00:00
dbj
0c5a27af69 remove opt_compat_netbsd.h, afaict it is no longer needed.
i think it was previously used to pull in COMPAT_09 for ffs_statfs
2004-12-26 17:34:39 +00:00
dbj
9b0bad335a use #if defined(_KERNEL_OPT) around opt includes
fix arg to pool_init() when _LKM is defined
2004-12-20 03:12:20 +00:00
mycroft
5ac91d4849 Remove some unnecessary (int32_t) casts that would cause us to screw up the
top bit in block addresses.

Also, change some daddr_t->int32_t casts (mostly as arguments to ufs_rw32(),
where they would get promoted anyway) to u_int32_t.
2004-12-15 07:11:51 +00:00