Commit Graph

100 Commits

Author SHA1 Message Date
christos 95e1ffb156 merge ktrace-lwp. 2005-12-11 12:16:03 +00:00
yamt 6a17dd42f4 - ignore truncation for VCHR/VBLK/VFIFO as it used to be
before yamt-vop merge.  PR/32049 from Atsushi Onoe.
- reject setattr which attempts to change size of VLNK/VSOCK.
2005-11-11 15:50:57 +00:00
yamt a748ea88dd merge yamt-vop branch. remove following VOPs.
VOP_BLKATOFF
	VOP_VALLOC
	VOP_BALLOC
	VOP_REALLOCBLKS
	VOP_VFREE
	VOP_TRUNCATE
	VOP_UPDATE
2005-11-02 12:38:58 +00:00
christos a12024da06 Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
2005-09-12 16:24:41 +00:00
christos 273df63602 - sprinkle const
- avoid shadow variables.
2005-05-29 21:25:24 +00:00
perseant 2f695b5476 Provide a resize_lfs(8), including kernel and cleaner support. The current
implementation requires the fs to be mounted while resizing.  Tested in both
directions, and everything appears to work happily, but ymmv.
2005-04-23 19:47:51 +00:00
perseant 5ed792ecb0 Use splay trees, rather than a hash table, to manage the accounting of
blocks allocated through VOP_BALLOC() for pages to be written to disk.
This accounting no longer takes a noticeable fraction of the system CPU.
2005-04-16 17:35:58 +00:00
perseant 94decdd25d Use lfs_malloc() to manage the blkiov arrays that the cleaner functions use,
since the cleaner is likely to operate in a low-memory condition.
2005-04-16 17:28:37 +00:00
perseant 2ee78c4fa9 Keep track of the highest block held by an LFS inode, so that we can
be assured that the last byte of a file is always allocated.  Previously
a file extension could cause the filesystem to be flushed, writing an
inconsistent inode to disk.  Although this condition would be corrected
the next time blocks were written to disk, an intervening crash would leave
the filesystem in an inconsistent state, leaving fsck_lfs to complain
of an inode "partially truncated".
2005-04-14 00:02:46 +00:00
perseant 1ebfc508b6 Protect various per-fs structures with fs->lfs_interlock simple_lock, to
improve behavior in the multiprocessor case.  Add debugging segment-lock
assertion statements.
2005-04-01 21:59:46 +00:00
perseant eefd94b8e2 Straighten out the maze of ifdefs. Instead, consolidate all the debugging
stuff under '#ifdef DEBUG', and use sysctl knobs to turn on/off particular
parts of the debugging reporting (if DEBUG is enabled).  Re-enable the LFS
statistics in sysctl, while I'm there.  A bit of a rototill.
2005-03-08 00:18:19 +00:00
perry bcfcddbac1 nuke trailing whitespace 2005-02-26 22:31:44 +00:00
mycroft bb17450999 Don't write out the extra zero pages with PGO_SYNCIO. We start an asynchronous
write anyway, and they will not be freed until that write is finished.
2004-08-15 19:01:16 +00:00
mycroft 4303882b7e Copy the current partial-truncate logic from FFS. In the process, fix a
potential overrun when truncating a fragment.
2004-08-15 17:37:07 +00:00
mycroft f3fbefe76a Minor simplification to some arithmetic. 2004-08-15 16:17:37 +00:00
mycroft 45a21b76f0 Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
  d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
  in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
  FS-specific checks littered throughout the code.  This may be used later to
  make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
  to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
2004-08-15 07:19:54 +00:00
mycroft bc25b30608 Add a new flag, IN_MODIFY. This is like IN_UPDATE|IN_CHANGE, but unlike
setting those flags, it does not cause the inode to be written in the periodic
sync.  This is used for writes to special files (devices and named pipes) and
FIFOs.

Do not preemptively sync updates to access times and modification times.  They
are now updated in the inode only opportunistically, or when the file or device
is closed.  (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)

And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways.  In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced.  In ext2fs, it was causing the metadata to *not* be synced.  We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY.  I've confirmed
that we do in fact trickle correctly now.
2004-08-14 01:08:02 +00:00
oster 87d110abfa If we bail out due to an error, we need 'unreserve' the space that
we'd reserved earlier.

Approved by: yamt
2004-03-30 14:50:46 +00:00
hannken 3db4e2acd8 Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.
VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp)  Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp)      Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
2004-01-25 18:06:48 +00:00
pk 70f20a1217 Replace the traditional buffer memory management -- based on fixed per buffer
virtual memory reservation and a private pool of memory pages -- by a scheme
based on memory pools.

This allows better utilization of memory because buffers can now be allocated
with a granularity finer than the system's native page size (useful for
filesystems with e.g. 1k or 2k fragment sizes).  It also avoids fragmentation
of virtual to physical memory mappings (due to the former fixed virtual
address reservation) resulting in better utilization of MMU resources on some
platforms.  Finally, the scheme is more flexible by allowing run-time decisions
on the amount of memory to be used for buffers.

On the other hand, the effectiveness of the LRU queue for buffer recycling
may be somewhat reduced compared to the traditional method since, due to the
nature of the pool based memory allocation, the actual least recently used
buffer may release its memory to a pool different from the one needed by a
newly allocated buffer. However, this effect will kick in only if the
system is under memory pressure.
2003-12-30 12:33:13 +00:00
yamt cd2445d8d3 more assertion about file truncation to zero. 2003-11-07 14:48:28 +00:00
agc aad01611e7 Move UCB-licensed code from 4-clause to 3-clause licence.
Patches provided by Joel Baker in PR 22364, verified by myself.
2003-08-07 16:26:28 +00:00
yamt 3852db2096 - protect global resource counts with lfs_subsys_lock.
- clean up scattered externs a little.
2003-07-12 16:17:06 +00:00
fvdl d5aece61d6 Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
2003-06-29 22:28:00 +00:00
darrenr 960df3c8d1 Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records.  The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
2003-06-28 14:20:43 +00:00
yamt c2b802ff24 fix b_interlock lock/unlock mismatches. 2003-04-27 06:46:38 +00:00
perseant ef3c60764c Make LFS work better (though still not "well") as an NFS-exported
filesystem (and other things that needed to be fixed before the tests
would complete), to wit:

* Include the fs ident in the filehandle; improve stale filehandle checks.

* Change definition of blksize() to use the on-dinode size instead of
  the inode's i_size, so that fsck_lfs will work properly again.

* Use b_interlock in lfs_vtruncbuf.

* Postpone dirop reclamation until after the seglock has been released,
  so that lfs_truncate is not called with the segment lock held.

* Don't loop in lfs_fsync(), just write everything and wait.

* Be more careful about the interlock/uobjlock in lfs_putpages: when we
  lose this lock, we have to resynchronize dirtiness of pages in each
  block.

* Be sure to always write indirect blocks and update metadata in
  lfs_putpages; fixes a bug that caused blocks to be accounted to the
  wrong segment.
2003-04-23 07:20:37 +00:00
simonb 761de7345c '#if 0' out a variable that is currently only used in other '#if 0'd out
code.
2003-04-10 04:15:38 +00:00
fvdl 42614ed3f3 Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
2003-04-02 10:39:19 +00:00
perseant c364d884f6 Hold the segment lock during truncation to prevent indirect blocks from
being written by lfs_updatemeta while lfs_truncate is also writing them,
a bug pointed out by YAMAMOTO Takashi <yamt@netbsd.org>.
2003-03-20 06:47:38 +00:00
perseant 8feb2c22f5 Take away "#ifdef LFS_UBC". 2003-03-08 21:46:04 +00:00
perseant 958a4c008c Don't force all truncations to be synchronous 2003-03-04 19:10:35 +00:00
perseant cfc73a5fa9 Be careful to always zero pages on truncation/fragment extension,
in the case where the filesystem block size is larger than PAGE_SIZE.
2003-03-01 05:07:51 +00:00
perseant daeb6c37d1 Make lfs_truncate handle file extension correctly, in the LFS_UBC case. 2003-02-28 07:37:56 +00:00
perseant a94f9407dc Quell a hasty panic in lfs_truncate: on-inode disk addresses can be
different between the beginning and end of the call.
2003-02-28 04:37:07 +00:00
perseant fdf4bfe002 Tabify, and fix some comment alignment problems. 2003-02-20 04:27:23 +00:00
perseant b397c875ae Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon.  To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
  writes for the pagedaemon, but which also takes over some of the
  functions of lfs_check().  This thread is started the first time an
  LFS is mounted.

* Add a "flags" parameter to GOP_SIZE.  Current values are
  GOP_SIZE_READ, meaning that the call should return the size of the
  in-core version of the file, and GOP_SIZE_WRITE, meaning that it
  should return the on-disk size.  One of GOP_SIZE_READ or
  GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
  resources to get by and use malloc(...M_NOWAIT), using the reserves if
  necessary.  Use the pool subsystem for structures small enough that
  this is feasible.  This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
  structure; getting closer to LFS as an LKM.  "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
  checkpoint; any segments that pass two checkpoints both dirty and
  empty can be summarily cleaned.  Do this.  Right now lfs_segclean
  still works, but this should be turned into an effectless
  compatibility syscall.
2003-02-17 23:48:08 +00:00
fvdl a138610cac The oldblks and newblks arrays are used to store direct copies of
on-disk block pointers, so they should be int32_t. Error found
by Izumi Tsutsui.
2003-01-25 16:40:28 +00:00
fvdl a3ff3a3038 Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
2003-01-24 21:55:02 +00:00
yamt 59be5399b7 - in lfs_reserve, vref vnodes that we're locking so that cleaner doesn't
try to reclaim them.
  (workaround for deadlock noted in the comment in lfs_reserveavail)
- in lfs_rename, mark vnodes which are being moved as well as directry vnodes.
2002-12-28 14:39:08 +00:00
provos 0f09ed48a5 remove trailing \n in panic(). approved perry. 2002-09-27 15:35:29 +00:00
perseant 32ae84b188 Deal with fragment size changes better. For each fragment that can
exist on an on-disk inode, we keep a record of its size in struct inode,
which is updated when we write the block to disk.  The cleaner routines
thus have ready access to what size is the correct size for this block,
on disk.

Fixed a related bug: if a file with fragments is being cleaned
(fragments being cleaned) at the same time it is being extended beyond
NDADDR blocks, we could write a bogus FINFO record that has a frag in the
middle; when it was cleaned this would give back bogus file data.  Don't
write the indirect blocks in this case, since there is no need.

lfs_fragextend and lfs_truncate no longer require the seglock, but instead
take a shared lock, which the seglock locks exclusively.
2002-07-06 01:30:11 +00:00
yamt d566a58b5e fix printf format for DEBUG_LFS. 2002-07-02 19:07:03 +00:00
perseant 8886b0f4b2 Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
  This should make it possible to (1) write clusters using a page mapping
  instead of malloc, if desired, and (2) schedule blocks for rewriting
  (somewhere else) if a write error occurs.  Code is present to use
  pagemove() to construct the clusters but that is untested and will go away
  anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
  the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
  can be called by lfs_segwrite, and loop on that until it is clean, for
  a checkpoint.  Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
  buffer cache.  Both lfs_mountfs and lfs_unmount check since the Ifile can
  grow.
* If an inode is not found in a disk block, try rereading the block, under
  the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
2002-05-14 20:03:53 +00:00
chs a106161b5a add spaces for KNF. confirmed to produce identical objects. 2001-11-23 21:44:25 +00:00
lukem ec6245465a add RCSID 2001-11-08 02:39:06 +00:00
simonb c56d879335 Remove some variables that are set but never used. 2001-11-06 07:11:29 +00:00
chs 64c6d1d2dc a whole bunch of changes to improve performance and robustness under load:
- remove special treatment of pager_map mappings in pmaps.  this is
   required now, since I've removed the globals that expose the address range.
   pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
   no longer any need to special-case it.
 - eliminate struct uvm_vnode by moving its fields into struct vnode.
 - rewrite the pageout path.  the pager is now responsible for handling the
   high-level requests instead of only getting control after a bunch of work
   has already been done on its behalf.  this will allow us to UBCify LFS,
   which needs tighter control over its pages than other filesystems do.
   writing a page to disk no longer requires making it read-only, which
   allows us to write wired pages without causing all kinds of havoc.
 - use a new PG_PAGEOUT flag to indicate that a page should be freed
   on behalf of the pagedaemon when it's unlocked.  this flag is very similar
   to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
   pageout fails due to eg. an indirect-block buffer being locked.
   this allows us to remove the "version" field from struct vm_page,
   and together with shrinking "loan_count" from 32 bits to 16,
   struct vm_page is now 4 bytes smaller.
 - no longer use PG_RELEASED for swap-backed pages.  if the page is busy
   because it's being paged out, we can't release the swap slot to be
   reallocated until that write is complete, but unlike with vnodes we
   don't keep a count of in-progress writes so there's no good way to
   know when the write is done.  instead, when we need to free a busy
   swap-backed page, just sleep until we can get it busy ourselves.
 - implement a fast-path for extending writes which allows us to avoid
   zeroing new pages.  this substantially reduces cpu usage.
 - encapsulate the data used by the genfs code in a struct genfs_node,
   which must be the first element of the filesystem-specific vnode data
   for filesystems which use genfs_{get,put}pages().
 - eliminate many of the UVM pagerops, since they aren't needed anymore
   now that the pager "put" operation is a higher-level operation.
 - enhance the genfs code to allow NFS to use the genfs_{get,put}pages
   instead of a modified copy.
 - clean up struct vnode by removing all the fields that used to be used by
   the vfs_cluster.c code (which we don't use anymore with UBC).
 - remove kmem_object and mb_object since they were useless.
   instead of allocating pages to these objects, we now just allocate
   pages with no object.  such pages are mapped in the kernel until they
   are freed, so we can use the mapping to find the page to free it.
   this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
2001-09-15 20:36:31 +00:00
perseant 4e3fced95b Merge the short-lived perseant-lfsv2 branch into the trunk.
Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default.  Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
  matched to convenient physical characteristics of the partition (e.g.,
  stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
  non-512-byte-sector devices.  In theory fragments can be as large
  as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
  doesn't get old data and think it's new.  Roll-forward is enabled for
  v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
  is not yet implemented, but can be without further non-backwards-compatible
  changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
  that is, the inode is never written *just* because atime was changed.
  Because of this the inodes remain near the file data on the disk, rather
  than wandering all over as the disk is read repeatedly.  This speeds up
  repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
  longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
  I need to look more closely to make sure that the times are only updated
  during write(2) and friends, not after-the-fact during a segment write,
  and certainly not by the cleaner.
2001-07-13 20:30:18 +00:00
mrg 67afbd6270 use _KERNEL_OPT 2001-05-30 11:57:16 +00:00