Commit Graph

302 Commits

Author SHA1 Message Date
christos
273df63602 - sprinkle const
- avoid shadow variables.
2005-05-29 21:25:24 +00:00
hannken
ffa83f8f0d ffs/ffs_alloc.c:
- Add a missing ACTIVECG_CLR().

ffs/ffs_snapshot.c:
- Use async/delayed writes for snapshot creation and sync/uncache these buffers
  on end. Reduces the time the file system must be suspended.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
- Byte swap the list of preallocated blocks on read/write instead of access.
- Always keep this list on ip->i_snapblklist so it may be rolled back when the
  newest snapshot gets removed. Fixes a rare snapshot corruption when using
  more than one snapshot on a file system.

ufs/ufsmount.h:
  - Make TAILQ_LAST() possible on member um_snapshots.
  - Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
2005-05-22 08:35:28 +00:00
perseant
f4a7694fc9 Keep per-inode, per-fs, and subsystem-wide counts of blocks allocated through
lfs_balloc(), and use that to estimate the number of dirty pages belonging
to LFS (subsystem or filesystem).  This is almost certainly wrong for
the case of a large mmap()ed region, but the accounting is tighter than
what we had before, and performs much better in the typical case of pages
dirtied through write().
2005-04-19 20:59:05 +00:00
perseant
1ebfc508b6 Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case.  Add debugging segment-lock
assertion statements.
2005-04-01 21:59:46 +00:00
perseant
c716c3d307 Make LFS dirops get their vnode first, before incrementing the dirop count,
to prevent a deadlock trying to call VOP_PUTPAGES() on a VDIROP vnode.
This can happen when a stacked filesystem is mounted on top of an LFS: an
LFS dirop needs to get a vnode, which is available from the upper layer.
The corresponding lower layer vnode, however, is VDIROP, so the upper layer
can't be cleaned out since its VOP_PUTPAGES() is passed through to the lower
layer, which waits for dirops to drain before it can proceed.  Deadlock.

Tweak ufs_makeinode() and ufs_mkdir() to pass the a_vpp argument through
to VOP_VALLOC().

Partially addresses PR # 26043, though it probably does not completely fix
the problem described there.
2005-03-23 00:12:51 +00:00
perry
bcfcddbac1 nuke trailing whitespace 2005-02-26 22:31:44 +00:00
perseant
25f49c3c91 Various minor LFS improvements:
* Note when lfs_putpages(9) thinks it is not going to be writing any
  pages before calling genfs_putpages(9).  This prevents a situation in
  which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
  overestimate in most cases.  Note that if NRESERVE() is too high, it
  may be impossible to create files on the filesystem.  We catch this
  case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
  entries in indirect blocks again, triggering a failed assertion "daddr
  <= LFS_MAX_DADDR".  Explicitly convert to and from int32_t to correct
  this.
* Add a high-water mark for the number of dirty pages any given LFS can
  hold before triggering a flush.  This is settable by sysctl, but off
  (zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
  shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
  even though their v_size == 0.  Don't panic when we see this.
* Change lfs_bfree to a signed quantity.  The manner in which it is
  processed before being passed to the cleaner means that sometimes it
  may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
  lfs_statvfs(9).  This prevents df(1) from ever telling us that our full
  filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
  associated buffer headers, so that the pagedaemon doesn't run us out
  of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
  unmounted.  Because vfs_busy() is a shared lock, and
  lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
  holding the lock that umount() is blocking on, then try to vfs_busy()
  again in getnewvnode().
2005-02-26 05:40:42 +00:00
dbj
d681cb1ea9 check _KERNEL_OPT instead of !_LKM to conditionalize opt includes 2005-01-24 21:34:48 +00:00
rumble
468646676a Remove dirhash.h. 2005-01-24 01:32:22 +00:00
rumble
32386a4e99 Bring in Ian Dowse's Dirhash from FreeBSD. Hash tables of
directories are created on the fly and used to increase
performance by circumventing ufs_lookup's linear search.

Dirhash is enabled by the UFS_DIRHASH option, but not
by default.
2005-01-23 19:37:05 +00:00
chs
8975a0856f adjust the UBC mapping code to support non-vnode uvm_objects.
this means we can no longer look at the vnode size to determine how many
pages to request in a fault, which is good since for NFS the size can change
out from under us on the server anyway.  there's also a new flag UBC_UNMAP
for ubc_release(), so that the file system code can make the decision about
whether to cache mappings for files being used as executables.
2005-01-09 16:42:43 +00:00
dbj
9b0bad335a use #if defined(_KERNEL_OPT) around opt includes
fix arg to pool_init() when _LKM is defined
2004-12-20 03:12:20 +00:00
mycroft
5ac91d4849 Remove some unnecessary (int32_t) casts that would cause us to screw up the
top bit in block addresses.

Also, change some daddr_t->int32_t casts (mostly as arguments to ufs_rw32(),
where they would get promoted anyway) to u_int32_t.
2004-12-15 07:11:51 +00:00
dbj
c45b4268dc remove diagnostic check for modified inactive inodes in ufs_reclaim
this condition can occur if ufs_inactive experiences failure attempting
to write the inode out.  Instead, have ufs_reclaim always call VOP_UPDATE
which will only write out the inode if there are unflushed changes
2004-10-08 18:43:50 +00:00
thorpej
11afd11faa Add a new VNODE_LOCKDEBUG option, which enables checks in the VOP_*()
calls to ensure that the vnode lock state is as expected when the VOP
call is made.  Modify vnode_if.src to set the expected state according
to the documenting lock table for each VOP.  Modify vnode_if.sh to emit
the checks.

Notes:
- The checks are only performed if the vnode has the VLOCKSWORK bit
  set.  Some file systems (e.g. specfs) don't even bother with vnode
  locks, so of course the checks will fail.
- We can't actually run with VNODE_LOCKDEBUG because there are so many
  vnode locking problems, not the least of which is the "use SHARED for
  VOP_READ()" issue, which screws things up for the entire call chain.

Inspired by similar changes in OpenBSD, but implemented differently.
2004-09-21 03:10:35 +00:00
skrll
f7155e40f6 There's no need to pass a proc value when using UIO_SYSSPACE with
vn_rdwr(9) and uiomove(9).

OK'd by Jason Thorpe
2004-09-17 14:11:20 +00:00
yamt
6f3db818ea ufs_getlbns:
- fix an integer overflow when calculating lbns of indirect blocks.
- remove a redundant calculation of blockcnt.
2004-09-15 09:52:49 +00:00
yamt
77641b98f3 g/c no longer used definition of fs_maxfilesize. 2004-09-10 09:36:05 +00:00
mycroft
84b11bd3be Another piece of FFS_EI flotsam. 2004-08-15 21:53:03 +00:00
mycroft
0d7e3746eb Repair some FFS_EI code for ufsmount changes. 2004-08-15 21:34:14 +00:00
mycroft
f94beb30d4 Fix some formatting glitches. 2004-08-15 16:46:18 +00:00
mycroft
f3fbefe76a Minor simplification to some arithmetic. 2004-08-15 16:17:37 +00:00
mycroft
45a21b76f0 Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
  d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
  in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
  FS-specific checks littered throughout the code.  This may be used later to
  make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
  to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
2004-08-15 07:19:54 +00:00
mycroft
c09a793e93 Push atime/mtime updates even further -- into the reclaim path, so they happen
rarely in the normal case.  (Note: This happens at reboot/shutdown time because
all file systems are unmounted.)

Also, for IN_MODIFY, use IN_ACCESSED, not IN_MODIFIED; otherwise "ls -l" of
your device node or FIFO would cause the time stamps to get written too
quickly.
2004-08-14 14:32:04 +00:00
mycroft
bc25b30608 Add a new flag, IN_MODIFY. This is like IN_UPDATE|IN_CHANGE, but unlike
setting those flags, it does not cause the inode to be written in the periodic
sync.  This is used for writes to special files (devices and named pipes) and
FIFOs.

Do not preemptively sync updates to access times and modification times.  They
are now updated in the inode only opportunistically, or when the file or device
is closed.  (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)

And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways.  In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced.  In ext2fs, it was causing the metadata to *not* be synced.  We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY.  I've confirmed
that we do in fact trickle correctly now.
2004-08-14 01:08:02 +00:00
dbj
cf1784812c remove incorrect casts that limit some uses of daddr_t to 31 bits
this fixes problems using ffs2 with more than 2^31 sectors (~1tb)
2004-07-24 15:02:32 +00:00
hannken
2c825e5573 Use a pool for struct direct instead of kernel stack.
Reduces the kernel stack usage by 264 bytes.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
2004-06-20 18:25:49 +00:00
hannken
8c21bc6224 Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.
- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
    may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
    Snapshots may not be opened for writing and the attributes are read-only.
    Use the mtime as the time this snapshot was taken.
    Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
  one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
  a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
2004-05-25 14:54:55 +00:00
kleink
25709a1d3a POSIX: Permit a process without the appropriate privilege to change a
file's group ID to its effective gid, in addition to the presently
permitted set of supplementary gids.

From Mark Davies in PR standards/25401.
2004-05-22 23:24:23 +00:00
jrf
fc97fd571a First pass for some caddr_t removal and changes to get rid of it where we
no longer use and/or need it

	- removed casts from unionfs, deadfs and fdesc
	  (there are more to hunt down still)
	- changed vfs_quotactl args argumet from caddr_t to void *
	- changed vfs_quotactl structures/callers to reflect the api change

Compiled fine and ran for about a day. Approved/reviewed by
christos@netbsd.org and gimpy@netbsd.org.
2004-04-27 17:37:30 +00:00
christos
6bd1d6d4db Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
2004-04-21 01:05:31 +00:00
yamt
25a0a3496e revert ufs_lookup.c rev.1.53 (MNT_ASYNC changes)
it was redundant because our bwrite() knows about MNT_ASYNC.

ok'ed by Jaromir Dolecek and Chuck Silvers.
2004-03-06 06:54:12 +00:00
uwe
20c14b226b Shut up gcc3 warning that `metalbn' might be used uninitialized.
XXX: The warning is bogus and only triggered on sh3.
2004-02-27 00:19:36 +00:00
hannken
d6170777cf Fix xxx_strategy() to use the vnode arg instead of bp->b_vp. 2004-01-26 10:39:29 +00:00
hannken
84b45bc333 Fix mfs_strategy() to use the vp argument.
From YAMAMOTO Takashi <yamt@netbsd.org>.
2004-01-26 10:02:31 +00:00
itojun
c1675e235e avoid panic on monut_mfs. Greg Oster 2004-01-26 04:25:02 +00:00
hannken
3db4e2acd8 Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.
VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp)  Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp)      Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
2004-01-25 18:06:48 +00:00
yamt
7266a95907 store a i/o priority hint in struct buf for buffer queue discipline. 2004-01-10 14:39:50 +00:00
dbj
01637061e3 never upgrade the superblock or set FS_FLAGS_UPDATED in fs_old_flags
add compatibility for filesystems created before FFSv2 integration
these patches are from pr port-macppc/23926 and should also fix
problems discussed in pr kern/21404 and pr kern/21283
2004-01-09 19:10:22 +00:00
dbj
fe70f4f25f only let i_ffs_effnlink diverge from i_nlink if DOINGSOFTDEP 2003-11-08 06:38:10 +00:00
dbj
54309cb4b4 comment out unnecessary IN_CHANGE|IN_UPDATE in lookup
move softdep specific lock release/regrab inside if DOINGSOFTDEP
2003-11-08 06:36:13 +00:00
hannken
2ef662a69e Clean up the usage of vn_start_write(). At least one occurence clobbered
previous error conditions.
If "(flags & (V_WAIT|V_PCATCH)) == V_WAIT" the return value is always zero.
Ignore the return value in these cases.

From Darrin B. Jewell.
2003-11-05 10:18:38 +00:00
hannken
a3a898ff0f Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>
2003-10-15 11:28:59 +00:00
bouyer
7b066791c8 Remove references to University of California from my copyright notices. 2003-10-05 17:48:49 +00:00
jdolecek
d07d321142 if mounted ASYNC, use delayed writes for metadata, which improves performance
of these operations significantly
based on FreeBSD ufs_lookup.c rev. 1.8, by John Dyson
2003-09-20 21:05:53 +00:00
yamt
7bd2f524bc indent. 2003-09-15 15:08:09 +00:00
christos
136f9cd821 PR/15397: Jason Thorpe: directory operations on pathnames that refer to
directories and have trailing slashes should succeed. Ok'd by kjk.
Fix provided by enami.
2003-09-11 17:33:42 +00:00
dsl
72fa43211e gcc for sparc seems to barf at 'int_var * 0x100000000ull' so do
'(uint64_t)(uint)int_var << 32' even though it generates twice as many
instructions on i386!
2003-08-16 07:04:17 +00:00
dsl
8d9878ecb4 Remove only (last?) use of SETHIGH and SETLOW before gcc starts warning
about the odd construct.  Also fixes kern/6525.
2003-08-10 09:54:06 +00:00
dsl
5638922967 Stop panic if 'mknod xxx b 0 0' done on a full filesystem.
panics in ffs_full_fsync because v_specmountpoint requires that the NULL
v_specinfo be followed.
Tidy up in the same order in all error paths so compiler can merge the
code sequences.

Fixes PR kern/22419
2003-08-09 19:02:53 +00:00