Commit Graph

299 Commits

Author SHA1 Message Date
lukem b3b9740195 add __KERNEL_RCSID() 2001-10-30 01:11:53 +00:00
lukem 80ac606906 ffs_sb_swap() fixes:
- calculate the offset and length of the postbl before byteswapping.
  problem noted by der Mouse.
- use offsetof() to determine # of fields to calculate in initial
  loop, rather than hard-coding in `52 fields'
- improve comments.
2001-10-29 11:26:35 +00:00
lukem 6f39841c03 - pull in ufsmount.h after inode.h, because the latter pulls in
quota.h which the former needs, and this makes the usage consistent
  with other files anyway
- expand the details in a few panic strings
2001-10-26 06:37:55 +00:00
lukem 99147a7648 remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former.  leave the former
in source that obviously uses specific bits of it (for completeness.)
2001-10-26 05:56:06 +00:00
chs d4406ff8c9 in ffs_balloc(), clean up page cache state to avoid hangs when we
get ENOSPC. as a result of this, we now skip some of the normal cleanup
in ufs_balloc_range() in the error case.
2001-09-30 02:54:42 +00:00
chs 299934b2ed handle allocation errors in truncate-up case. 2001-09-28 11:43:23 +00:00
chs d288111138 undo the part of the previous revision about skipping
the put if there are no pages, that seems to cause some problem.
fix another problem with missing an splx(), spotted by enami.
2001-09-26 06:20:50 +00:00
chs e8be8c6351 be sure to call the pager put with page-aligned offsets.
spotted by Nathan Williams.

while I'm here, move an splbio() so that we don't return without
splx()ing it if there's an error, and don't bother calling the
pager put if the vnode has no pages.
2001-09-26 05:25:03 +00:00
sommerfeld 181c4513dc Add fifo_putpages() placebo so that the vnode's uobj is unlocked. 2001-09-22 22:35:18 +00:00
chs 3be896ac31 we can't assert that the inode and vnode sizes are consistent at the start
of ffs_truncate() since there are cases (eg. when ffs_write() gets ENOSPC)
where they should be different.  move the assert to the end instead.
2001-09-20 08:25:59 +00:00
lukem 9c5c77ae54 - ffs_blkpref() changes:
- don't both updating fs->fs_cgrotor, since it's actually not used in
	  the kernel. from Manuel Bouyer in [kern/3389]
	- when examining cylinder groups from startcg to startcg-1 (wrapping
	  at fs->fs_ncg), there's no need to check startcg at the end as well
	  as the start...
- highlight in the struct fs declaration that fs_cgrotor is UNUSED
2001-09-19 01:38:16 +00:00
jdolecek 68aacb8f70 add softdep_reinitialize() stub 2001-09-16 13:51:45 +00:00
chs 64c6d1d2dc a whole bunch of changes to improve performance and robustness under load:
- remove special treatment of pager_map mappings in pmaps.  this is
   required now, since I've removed the globals that expose the address range.
   pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
   no longer any need to special-case it.
 - eliminate struct uvm_vnode by moving its fields into struct vnode.
 - rewrite the pageout path.  the pager is now responsible for handling the
   high-level requests instead of only getting control after a bunch of work
   has already been done on its behalf.  this will allow us to UBCify LFS,
   which needs tighter control over its pages than other filesystems do.
   writing a page to disk no longer requires making it read-only, which
   allows us to write wired pages without causing all kinds of havoc.
 - use a new PG_PAGEOUT flag to indicate that a page should be freed
   on behalf of the pagedaemon when it's unlocked.  this flag is very similar
   to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
   pageout fails due to eg. an indirect-block buffer being locked.
   this allows us to remove the "version" field from struct vm_page,
   and together with shrinking "loan_count" from 32 bits to 16,
   struct vm_page is now 4 bytes smaller.
 - no longer use PG_RELEASED for swap-backed pages.  if the page is busy
   because it's being paged out, we can't release the swap slot to be
   reallocated until that write is complete, but unlike with vnodes we
   don't keep a count of in-progress writes so there's no good way to
   know when the write is done.  instead, when we need to free a busy
   swap-backed page, just sleep until we can get it busy ourselves.
 - implement a fast-path for extending writes which allows us to avoid
   zeroing new pages.  this substantially reduces cpu usage.
 - encapsulate the data used by the genfs code in a struct genfs_node,
   which must be the first element of the filesystem-specific vnode data
   for filesystems which use genfs_{get,put}pages().
 - eliminate many of the UVM pagerops, since they aren't needed anymore
   now that the pager "put" operation is a higher-level operation.
 - enhance the genfs code to allow NFS to use the genfs_{get,put}pages
   instead of a modified copy.
 - clean up struct vnode by removing all the fields that used to be used by
   the vfs_cluster.c code (which we don't use anymore with UBC).
 - remove kmem_object and mb_object since they were useless.
   instead of allocating pages to these objects, we now just allocate
   pages with no object.  such pages are mapped in the kernel until they
   are freed, so we can use the mapping to find the page to free it.
   this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
2001-09-15 20:36:31 +00:00
chs 5d3eefe245 use pools for allocating most softdep datastructures. since we want to
allocate memory from kernel_map but some of the objects are freed from
interrupt context, we put objects on a queue instead of freeing them
immediately.  then in softdep_process_worklist() (which is called at
least once per second from the syncer), we process that queue and
free all the objects.  allocating from kernel_map instead of from kmem_map
allows us to have a much larger number of softdeps pending even in
configurations where kmem_map is relatively small.
2001-09-15 16:33:53 +00:00
chs adf5d360a7 add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl.  file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value.  the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
2001-09-15 16:12:54 +00:00
lukem 5c2ee5861d Incorporate the enhanced ffs_dirpref() by Grigoriy Orlov, as found in
FreeBSD (three commits; the initial work, man page updates, and a fix
to ffs_reload()), with the following differences:
- Be consistent between newfs(8) and tunefs(8) as to the options which
  set and control the tuning parameters for this work (avgfilesize & avgfpdir)
- Use u_int16_t instead of u_int8_t to keep track of the number of
  contiguous directories (suggested by Chuck Silvers)
- Work within our FFS_EI framework
- Ensure that fs->fs_maxclusters and fs->fs_contigdirs don't point to
  the same area of memory

The new algorithm has a marked performance increase, especially when
performing tasks such as untarring pkgsrc.tar.gz, etc.

The original FreeBSD commit messages are attached:

=====
mckusick    2001/04/10 01:39:00 PDT
  Directory layout preference improvements from Grigoriy Orlov <gluk@ptci.ru>.
  His description of the problem and solution follow. My own tests show
  speedups on typical filesystem intensive workloads of 5% to 12% which
  is very impressive considering the small amount of code change involved.

  ------

    One day I noticed that some file operations run much faster on
  small file systems then on big ones. I've looked at the ffs
  algorithms, thought about them, and redesigned the dirpref algorithm.

    First I want to describe the results of my tests. These results are old
  and I have improved the algorithm after these tests were done. Nevertheless
  they show how big the perfomance speedup may be. I have done two file/directory
  intensive tests on a two OpenBSD systems with old and new dirpref algorithm.
  The first test is "tar -xzf ports.tar.gz", the second is "rm -rf ports".
  The ports.tar.gz file is the ports collection from the OpenBSD 2.8 release.
  It contains 6596 directories and 13868 files. The test systems are:

  1. Celeron-450, 128Mb, two IDE drives, the system at wd0, file system for
     test is at wd1. Size of test file system is 8 Gb, number of cg=991,
     size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current
     from Dec 2000 with BUFCACHEPERCENT=35

  2. PIII-600, 128Mb, two IBM DTLA-307045 IDE drives at i815e, the system
     at wd0, file system for test is at wd1. Size of test file system is 40 Gb,
     number of cg=5324, size of cg is 8m, block size = 8k, fragment size = 1k
     OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=50

  You can get more info about the test systems and methods at:
  http://www.ptci.ru/gluk/dirpref/old/dirpref.html

                                Test Results

               tar -xzf ports.tar.gz               rm -rf ports
    mode  old dirpref new dirpref speedup old dirprefnew dirpref speedup
                               First system
   normal     667         472      1.41       477        331       1.44
   async      285         144      1.98       130         14       9.29
   sync       768         616      1.25       477        334       1.43
   softdep    413         252      1.64       241         38       6.34
                               Second system
   normal     329         81       4.06       263.5       93.5     2.81
   async      302         25.7    11.75       112          2.26   49.56
   sync       281         57.0     4.93       263         90.5     2.9
   softdep    341         40.6     8.4        284          4.76   59.66

  "old dirpref" and "new dirpref" columns give a test time in seconds.
  speedup - speed increasement in times, ie. old dirpref / new dirpref.

  ------

  Algorithm description

  The old dirpref algorithm is described in comments:

  /*
   * Find a cylinder to place a directory.
   *
   * The policy implemented by this algorithm is to select from
   * among those cylinder groups with above the average number of
   * free inodes, the one with the smallest number of directories.
   */

  A new directory is allocated in a different cylinder groups than its
  parent directory resulting in a directory tree that is spreaded across
  all the cylinder groups. This spreading out results in a non-optimal
  access to the directories and files. When we have a small filesystem
  it is not a problem but when the filesystem is big then perfomance
  degradation becomes very apparent.

  What I mean by a big file system ?

    1. A big filesystem is a filesystem which occupy 20-30 or more percent
       of total drive space, i.e. first and last cylinder are physically
       located relatively far from each other.
    2. It has a relatively large number of cylinder groups, for example
       more cylinder groups than 50% of the buffers in the buffer cache.

  The first results in long access times, while the second results in
  many buffers being used by metadata operations. Such operations use
  cylinder group blocks and on-disk inode blocks. The cylinder group
  block (fs->fs_cblkno) contains struct cg, inode and block bit maps.
  It is 2k in size for the default filesystem parameters. If new and
  parent directories are located in different cylinder groups then the
  system performs more input/output operations and uses more buffers.
  On filesystems with many cylinder groups, lots of cache buffers are
  used for metadata operations.

  My solution for this problem is very simple. I allocate many directories
  in one cylinder group. I also do some things, so that the new allocation
  method does not cause excessive fragmentation and all directory inodes
  will not be located at a location far from its file's inodes and data.
  The algorithm is:
  /*
   * Find a cylinder group to place a directory.
   *
   * The policy implemented by this algorithm is to allocate a
   * directory inode in the same cylinder group as its parent
   * directory, but also to reserve space for its files inodes
   * and data. Restrict the number of directories which may be
   * allocated one after another in the same cylinder group
   * without intervening allocation of files.
   *
   * If we allocate a first level directory then force allocation
   * in another cylinder group.
   */

    My early versions of dirpref give me a good results for a wide range of
  file operations and different filesystem capacities except one case:
  those applications that create their entire directory structure first
  and only later fill this structure with files.

    My solution for such and similar cases is to limit a number of
  directories which may be created one after another in the same cylinder
  group without intervening file creations. For this purpose, I allocate
  an array of counters at mount time. This array is linked to the superblock
  fs->fs_contigdirs[cg]. Each time a directory is created the counter
  increases and each time a file is created the counter decreases. A 60Gb
  filesystem with 8mb/cg requires 10kb of memory for the counters array.

    The maxcontigdirs is a maximum number of directories which may be created
  without an intervening file creation. I found in my tests that the best
  performance occurs when I restrict the number of directories in one cylinder
  group such that all its files may be located in the same cylinder group.
  There may be some deterioration in performance if all the file inodes
  are in the same cylinder group as its containing directory, but their
  data partially resides in a different cylinder group. The maxcontigdirs
  value is calculated to try to prevent this condition. Since there is
  no way to know how many files and directories will be allocated later
  I added two optimization parameters in superblock/tunefs. They are:

          int32_t  fs_avgfilesize;   /* expected average file size */
          int32_t  fs_avgfpdir;      /* expected # of files per directory */

  These parameters have reasonable defaults but may be tweeked for special
  uses of a filesystem. They are only necessary in rare cases like better
  tuning a filesystem being used to store a squid cache.

  I have been using this algorithm for about 3 months. I have done
  a lot of testing on filesystems with different capacities, average
  filesize, average number of files per directory, and so on. I think
  this algorithm has no negative impact on filesystem perfomance. It
  works better than the default one in all cases. The new dirpref
  will greatly improve untarring/removing/coping of big directories,
  decrease load on cvs servers and much more. The new dirpref doesn't
  speedup a compilation process, but also doesn't slow it down.

  Obtained from:	Grigoriy Orlov <gluk@ptci.ru>
=====

=====
iedowse     2001/04/23 17:37:17 PDT
  Pre-dirpref versions of fsck may zero out the new superblock fields
  fs_contigdirs, fs_avgfilesize and fs_avgfpdir. This could cause
  panics if these fields were zeroed while a filesystem was mounted
  read-only, and then remounted read-write.

  Add code to ffs_reload() which copies the fs_contigdirs pointer
  from the previous superblock, and reinitialises fs_avgf* if necessary.

  Reviewed by:	mckusick
=====

=====
nik         2001/04/10 03:36:44 PDT
  Add information about the new options to newfs and tunefs which set the
  expected average file size and number of files per directory.  Could do
  with some fleshing out.
=====
2001-09-06 02:16:00 +00:00
lukem c50eb8cc85 deprecate fs_fscktime; we never used it.
in an effort to maintain compatibility with freebsd/openbsd/whatever,
i'm attempting to get the superblock format in sync, and freebsd uses
the int32_t at this position for `fs_pendinginodes'.

if we ever decide to implement fscktime functionality, we'll:
a) make sure to liaise with the other projects to reserve the same
   spare field
b) actually implement the code this time ...

(this is also preparing us for other changes, like the new dirpref code)
2001-09-03 14:52:17 +00:00
lukem e3ba61f9f3 Incorporate fix by iedowse @ FreeBSD to allow disks with large numbers of
cylinder groups to work correctly, with minor modifications by me to work
with our FFS_EI code.  From the FreeBSD commit message:

	The ffs superblock includes a 128-byte region for use by temporary
	in-core pointers to summary information. An array in this region
	(fs_csp) could overflow on filesystems with a very large number of
	cylinder groups (~16000 on i386 with 8k blocks). When this happens,
	other fields in the superblock get corrupted, and fsck refuses to
	check the filesystem.

	Solve this problem by replacing the fs_csp array in 'struct fs'
	with a single pointer, and add padding to keep the length of the
	128-byte region fixed. Update the kernel and userland utilities
	to use just this single pointer.

	With this change, the kernel no longer makes use of the superblock
	fields 'fs_csshift' and 'fs_csmask'. Add a comment to newfs/mkfs.c
	to indicate that these fields must be calculated for compatibility
	with older kernels.

	Reviewed by:    mckusick
2001-09-02 01:58:30 +00:00
lukem 563fb2d03f no need to cast arg to lblktosize() any more 2001-08-31 03:38:45 +00:00
lukem 2bfd8a2678 More fixes from FreeBSD (with changes):
- Cast blk argument to lblktosize() to (off_t), to prevent 32 bit overflow.
  whilst almost every use in ffs used this for small blknos, there are
  potential issues, and it's safer this way.  (as discussed with chuq)
- Use 64bit (off_t) math to calculate if we have hit our freespace() limit.
  Necessary for coherent results on filesystems bigger than 0.5Tb.
- Use lblktosize() in blksize() and dblksize(), to make it obvious what's
  happening
- Remove sblksize() - nothing uses it
2001-08-31 03:15:45 +00:00
lukem 0cf1d74c5b be consistent when casting arg to lblktosize() in UVM_PAGE_TRKOWN debug code 2001-08-30 15:17:28 +00:00
lukem c56418af73 some improvements from freebsd/openbsd
- replace the unused fs_headswitch and fs_trkseek with fs_id[2], bringing
  our struct fs closer to that in freebsd & openbsd (& solaris FWIW)
- dumpfs: improve warning message when cpc == 0
2001-08-30 14:37:25 +00:00
lukem c535133897 - minor whitespace and comments cleanup
- replace "filesystem" with "file system"
- fix spelo (from freebsd)
2001-08-30 08:31:25 +00:00
chs 1de4b3e2e0 min() -> MIN() (on general principles) 2001-08-30 03:55:42 +00:00
chs eccd469cf7 min() -> MIN() 2001-08-30 03:47:53 +00:00
wiz 251b3464be heirarchy -> hierarchy 2001-08-24 10:24:45 +00:00
wiz 1e378c4c12 precede, not preceed. 2001-08-20 12:00:46 +00:00
chs f0af9f581b add getpages/putpages entries for spec vnodes. 2001-08-17 05:54:36 +00:00
lukem 1b81d6353d remove third argument (`int ns') from ffs_sb_swap(), and let ffs_sb_swap()
determine the endianness of the `struct fs *o' superblock from o->fs_magic
and set needswap as necessary, rather than trusting the caller to get
it right.  invariably, almost every caller of ffs_sb_swap() was calling it
with ns set to the wrong value for ns anyway!
ansi KNF ffs_bswap.c declarations whilst here.

this fixes all sorts of problems when trying to use other-endian file systems,
notably the kernel trying to access memory *way* off, possibly corrupting or
panicing, and userland programs SEGVing and/or corrupting things (e.g,
"fsck_ffs -B"  to swap a file system endianness).

whilst the previous rev of ffs_bswap.c (1.10, 2000/12/23) made this problem
worse, i suspect that the problem was always there and previous versions
just happened not to trash things at the wrong time.

FFS_EI should now be a lot more stable.
2001-08-17 02:18:46 +00:00
lukem ed54fa2d76 correctly cast arguments to scanc() 2001-08-09 08:16:42 +00:00
lukem 1a2d5cf412 be consistent and use "u_char" instead of "unsigned char" 2001-08-09 08:15:26 +00:00
lukem a73aa816f3 get argument name correct in comment describing vop_balloc_args 2001-08-08 08:36:36 +00:00
lukem 3cd4afc9e3 - multiple include protection
- pull in <ufs/ufs/dinode.h> for ufs_daddr_t
- mark a few fields as being "UNUSED" (because they are)
2001-07-27 01:28:06 +00:00
lukem 714cac851d if printing the value of fs_clean, say 'fs_clean' instead of 'fs_flags' ... 2001-07-26 07:58:55 +00:00
chs d8bbc51566 fix an error case for quotas. 2001-06-03 16:49:07 +00:00
mrg 67afbd6270 use _KERNEL_OPT 2001-05-30 11:57:16 +00:00
sommerfeld d02dde9937 Change ffs_dirpref() to pay attention to the amount of available free
space before deciding which cylinder group should contain a new directory
inode.

Fixes kern/11983; works around some, but not all, of the side effects
of kern/11989.

Tested by me for well over a month on my laptop; preliminary versions of
the fix were tested by Frank van der Linden and Herb Peyerl.
2001-03-13 21:16:23 +00:00
eeh d0eaafc17f Use int32_t for on-disk time_t values. 2001-02-23 02:25:10 +00:00
chs 31f045ca75 remove debug code that was left in by accident. 2001-02-07 22:40:06 +00:00
chs a1c22f6d67 add casts to an assertion in ffs_alloc() so it works with offsets past 4GB. 2001-02-05 10:55:02 +00:00
augustss 46ee162100 Fix from chuq:
don't update UVM's notion of the file size before the VOP_FSYNC() when
we're partially truncating a file with softdeps enabled.  doing so could
free pages without updating the dependency info, which would result in
"panic: softdep_write_inodeblock: direct pointer #1 mismatch 0 != N".
2001-01-27 04:23:21 +00:00
jdolecek d9466585b7 make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const 2001-01-22 12:17:35 +00:00
jdolecek 34c8ae80da constify 2001-01-18 20:28:15 +00:00
mycroft fad85a24d8 On a RW->RO transition, explicitly clear fs_fmod after the cgupdate/sbupdate,
to prevent spurious writebacks and whinging about the (correct!) clean flag.
(Why this isn't done in ffs_sbupdate(), I dunno...)
2001-01-10 17:49:18 +00:00
ad d8735dd13a RCS ID 2001-01-10 16:45:56 +00:00
chs bc21905f3c attach the softdep pagecache pseudo-buffers to the inode
so we can find them quickly in the softdep truncate path.
2001-01-10 04:47:10 +00:00
mycroft 7f2aa054f1 ffs_reload(): Copy fs_ronly into the new superblock, too, as it may have been
modified on disk (e.g. by fsck(8)).  This flag should really be elsewhere.
2001-01-09 10:44:19 +00:00
matt ad346bb9eb Convert a MALLOC with a variable size to malloc(). Saves 220 bytes of text
on VAX.
2001-01-01 05:17:26 +00:00
enami 95a1bfa14c - 16 * 8 != 168
- offset should be endian independent.
2000-12-23 14:42:06 +00:00
enami 0e4a3d44c0 Cosmetic changes 2000-12-23 14:09:52 +00:00
mycroft 61a6479ab1 Patch from Kirk McKusick to fix an ordering problem in softdep_setup_freeblks()
that could cause an inode to be reused prematurely (possibly resulting in the
file containing garbage blocks).
2000-12-13 20:07:32 +00:00
chs e6e27e9efc fix bookkeeping for page cache dependency buffers. 2000-12-13 15:32:31 +00:00
chs bb61d9c5e4 in flush_inodedep_deps(), drop the big softdep lock while flushing pages. 2000-12-11 03:53:54 +00:00
chs 4ab33e73c2 call pgo_flush with (start,end) rather than (start,length). 2000-12-10 19:41:35 +00:00
chs 4912461b20 in ffs_sync(), don't skip vnodes which have (potentially dirty) pages. 2000-12-04 09:37:06 +00:00
fvdl 7c2b9d8515 In addition to setting the softdep flag in the superblock when
mounting with softdeps, also explicitly clear it when we don't,
so that a leftover setting after a crash will be cleared.
2000-12-03 19:52:06 +00:00
nathanw aa215181ce Don't set the value of doreallocblks here; it's defined over in vfs_cluster.c
In fact, doreallocblks isn't used here at all. Delete the declaration.
2000-11-30 20:56:10 +00:00
jdolecek 861369604d change vfs.ffs.doreallocblks to 1 by default - this does not have
aby bad symptoms any more, fix for bug causing problems with this
option was in BSD4.4-Lite2 and pulled in together with softdep changes

See also Keith Smith & Margo Seltzer's paper on the topic at
http://www.eecs.harvard.edu/~keith/papers/realloc.ps.gz
2000-11-30 19:46:02 +00:00
chs e9037d16c5 allow building without SOFTDEP by adding the pageiodone hook to bio_ops. 2000-11-27 18:26:38 +00:00
chs aeda8d3b77 Initial integration of the Unified Buffer Cache project. 2000-11-27 08:39:39 +00:00
ad 642267bcc7 Update for hashinit() change. 2000-11-08 14:28:12 +00:00
fvdl ef6bdbccd8 Stay at splbio across the VBWAIT loop, as is done elsewhere in the
kernel. Avoids a possible race condition. Pointed out by
enami@netbsd.org, problem reported by deberg@netbsd.org.
2000-10-24 14:43:32 +00:00
simonb 7bf589b1ae There is no need to explicitly include <uvm/uvm_extern.h> for
<sys/sysctl.h> anymore.
2000-10-13 16:40:26 +00:00
fvdl 81ba8e7ff7 Adapt for VOP_FSYNC parameter change.
Implement range fsync for FFS. Note: not yet implemented for the
SOFTDEP case.
2000-09-19 22:04:08 +00:00
fvdl ce4bcf47f3 Do not call MALLOC with M_WAITOK while holding the "lock". Thanks to
Ethan Solomita for the reminder.

Mark the parent vnode lock as recursive while flushing pagedeps. XXX.
Should fix kern/10564.
2000-08-15 14:25:08 +00:00
mrg 419501093a remove include of <vm/vm.h> and <uvm/uvm_extern.h> 2000-06-28 14:16:37 +00:00
mrg 91cc436b9e <vm/vm.h> -> <uvm/uvm_extern.h> 2000-06-28 14:11:33 +00:00
fvdl d09958adad Due to popular demand, change vinsheadfree to ungetnewvnode to make
the name clearer. No functional change.
2000-06-27 23:51:22 +00:00
fvdl bba2403203 In ffs_vget, do not hold ufs_haslock across the call to getnewvnode.
We may sleep in it, or even recurse, with softdeps. Instead, grab
the lock later, but check if noone else has beaten us to the VFS_VGET
operation, and if so, roll back getnewvnode using vinsheadfree, and
just return.
2000-06-27 23:39:17 +00:00
pk 88b0328aca We shouldn't be defining DEBUG and DIAGNOSTIC on our own; these may have
unwanted side-effects in the header files. For now, do the internal
#defines after including the headers.
2000-06-27 16:46:54 +00:00
fvdl 45b3f2405a Moved here from gnu/sys/ufs/ffs 2000-06-22 16:13:41 +00:00
fvdl 77b2bcbe07 Copyright changed. 2000-06-22 15:23:05 +00:00
perseant da29133e76 make it compile (fix typo) 2000-06-16 05:45:14 +00:00
matt 1b5bc7ce61 ignore the softdep flags when mounting and there's no softdep in the kernel. 2000-06-16 00:30:15 +00:00
fvdl 4f11634756 Allow MNT_SOFTDEP to be passed in via the mount(2) system call, do not
require it to be set via tunefs(8). Silently ignore it when doing
an update mount of a writeable filesystem, the FFS/softdep code isn't ready
for this yet.
2000-06-15 22:35:37 +00:00
mycroft 64f5a574a7 In ffs_update():
* Move the clearing of IN_MODIFIED and IN_ACCESSED later, so they are not
  cleared if the bread() failed.
* Explicitly set waitfor to 0 in the softdep case, if IN_MODIFIED is not
  set (mirroring the bwrite()/bdwrite() decision).
2000-05-30 17:23:52 +00:00
mycroft 4db674fa50 According to Frank, buffers with dependencies *are* left on v_dirtyblks, so
remove the FSYNC_RECLAIM check and force them to be flushed.
2000-05-29 18:53:35 +00:00
mycroft edfd1e6f32 Use LIST_{FIRST,NEXT,EMPTY}(). 2000-05-29 18:28:48 +00:00
mycroft d747ada9c2 Add a new inode flags called IN_ACCESSED. This used in place of IN_MODIFIED
to record that the atime was updated.  In ffs_update(), we only do synchronous
writes if something *other* than the atime was changed.
2000-05-29 18:04:30 +00:00
mycroft 941524439a Never call softdep_sync_metadata() in the FSYNC_RECLAIM case. Any pending
blocks are detached from the vnode at this point.  When the dependencies are
broken to enable writing the blocks, the vnode will be regenerated.  (The only
reason we sync buffers in this case is that they have to be detached from the
vnode.)
2000-05-29 17:19:20 +00:00
mycroft c47adf55e0 In ffs_fsync(), remove the FSYNC_RECLAIM special case, so that it properly
waits for pending buffers, and doesn't throw away time stamp updates.
2000-05-29 17:12:06 +00:00
mycroft ccf1cf4b69 MNT_WAIT -> FSYNC_WAIT 2000-05-29 16:28:27 +00:00
mycroft 1ea529f6df DTRT when unwinding multiple levels. 2000-05-28 08:31:41 +00:00
mycroft 4fc7b946c2 When unwinding a failed allocation, make sure to nuke the unwound block from
the vnode's block list.  This fixes `itrunc3' panics (at least in some cases;
further testing is needed) and prevents further lossage later on.
2000-05-28 08:15:40 +00:00
mycroft 4656dfd24f Add a new function to remove extra buffers when truncating a file. This is
more generic than the vinvalbuf(V_SAVEMETA) case, avoiding synchronous
operations when truncating to a non-zero length.
2000-05-28 04:13:56 +00:00
thorpej 21fc65e1a8 sleep() -> tsleep() 2000-05-27 04:52:27 +00:00
thorpej f636538446 NULL != 0 2000-05-19 04:34:39 +00:00
bouyer 1900598507 Sync copyrigth notice. 2000-05-15 08:51:55 +00:00
perseant f0728fdce1 Change the sementics of the last parameter from a boolean ("waitfor") to
a set of flags ("flags").  Two flags are defined, UPDATE_WAIT and
UPDATE_DIROP.

Under the old semantics, VOP_UPDATE would block if waitfor were set,
under the assumption that directory operations should be done
synchronously.  At least LFS and FFS+softdep do not make this
assumption; FFS+softdep got around the problem by enclosing all relevant
calls to VOP_UPDATE in a "if(!DOINGSOFTDEP(vp))", while LFS simply
ignored waitfor, one of the reasons why NFS-serving an LFS filesystem
did not work properly.

Under the new semantics, the UPDATE_DIROP flag is a hint to the
fs-specific update routine that the call comes from a dirop routine, and
should be wait for, or not, accordingly.

Closes PR#8996.
2000-05-13 23:43:06 +00:00
jdolecek c78399fc88 Add a new sysctl variable vfs.ffs.log_changeopt - if this is true,
an optimalization strategy change is logged into syslog. Default
is 0 (to not log). This replaces the recent not quite "right"
change to only log the change if kernel is compiled with DEBUG.
2000-04-04 09:23:20 +00:00
augustss 169ac5b3c1 Remove register declarations. 2000-03-30 12:41:09 +00:00
simonb c2e5560a03 Delete redundant decls of rootvp - it's in <sys/systm.h>.
Delete redundant decl of ffs_sbupdate() - it's in <ufs/ffs/ffs_extern.h>.
2000-03-30 02:48:22 +00:00
jdolecek a6cb6fe4ee Log the optimization changes only if DEBUG. Fixes kern/9697 2000-03-29 08:46:57 +00:00
simonb 0fd09c8496 Don't need to include <sys/conf.h> here. 2000-03-29 03:43:33 +00:00
fvdl 512503c606 If we're reclaiming, and there are no dirty blocks, just return. 2000-03-17 01:26:52 +00:00
jdolecek 03efc0b2b7 Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading.

For each leaf filesystem, add appropriate vfs_done routine.

Also remember how many times ffs_init() was called and do
the appropriate initialization on first call only. In ffs_done(),
destroy the resources when called by the last user of ffs code.
Change mfs to call ffs_init()/ffs_done() appropriately.
2000-03-16 18:20:06 +00:00
fvdl 1c78f3708b Inititalize the fs variable struct a little earlier to avoid referencing
a bad pointer in a printf. Problem reported by Krister Walfridsson.
2000-03-16 10:37:00 +00:00
fvdl e3dbad5a3c Revert this back to 2 revisions ago, these checks are done higher up now. 2000-03-15 16:31:52 +00:00
fvdl 563d336e44 Don't immediately return in ffs_fsync if there appears to be no data
to flush if it's a vnode on a softdep filesystem. softdep_sync_metadata
may still need to do some work.
2000-03-14 13:06:29 +00:00
perseant 61fa9e1409 Move vinvalbuf's check for dirty blocks into ffs_fsync, to ensure that
mode and ownership bits are flushed to disk before the vnode is
reclaimed.

The check, introduced in the softdep merge, assumes that if no blocks
are dirty, no file data *or metadata* needs to be flushed to disk.  This
is true of ffs, but is not true of lfs, and may not be true of other
filesystems.

Tested by myself and Bill Squier <groo@cs.stevens-tech.edu>.
2000-03-11 05:00:18 +00:00
fvdl 89670eb646 Fix a bug introduced in Lite2 with block allocation and full disk
conditions. Reported by Ian Dowse <iedowse@maths.tcd.ie>, based
on patch in FreeBSD reviewed by Kirk McKusick.
2000-02-25 19:58:25 +00:00
fvdl fe39281ea4 Fixes to the softdep code from Ethan Solomita <ethan@geocast.com>.
* Fix buffer ordering when it has dependencies.
* Alleviate memory problems.
* Deal with some recursive vnode locks (sigh).
* Fix other bugs.
2000-02-14 22:00:21 +00:00
bouyer 3c680c00ab Handle pre-FS_42POSTBLFMT. I now can mount an Ultrix file system on my
sparc without panic.
2000-01-18 18:41:29 +00:00
drochner 800b976584 Call ffs_oldfscompat() before all the consistency checks, to avoid the
use of uninitialized data in the checks if the filesystem is an old one.
1999-12-10 14:36:04 +00:00
fvdl 0b1963121a Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
1999-11-15 18:49:07 +00:00
enami fee96e1746 Check if the type of device node isn't VBAD before touching v_specinfo. If
the device vnode is revoked, the field is NULL and touching it causes null
pointer derefercence.
1999-10-20 14:32:09 +00:00
wrstuden e682a080e9 In spec_close(), if we're not doing a non-blocking close and VXLOCK is
not set, unlock the vnode before calling the device's close routine and
relock it after it returns. tty close routines will sleep waiting for
buffers to drain, which won't happen often times as the other side needs
to grab the vnode lock first.

Make all unmount routines lock the device vnode before calling VOP_CLOSE().
1999-10-16 23:53:26 +00:00
thorpej 29df848753 Need <string.h> for memcpy(3) prototype if building from userland. 1999-09-14 04:50:54 +00:00
wrstuden 3bf14d81e9 Add support for fcntl(2) to generate VOP_FCNTL calls. Any fcntl
call with F_FSCTL set and F_SETFL calls generate calls to a new
fileop fo_fcntl. Add genfs_fcntl() and soo_fcntl() which return 0
for F_SETFL and EOPNOTSUPP otherwise. Have all leaf filesystems
use genfs_fcntl().

Reviewed by: thorpej
Tested by: wrstuden
1999-08-03 20:19:16 +00:00
drochner 12a6593f79 clean up inclusion of "opt_ffs.h" and use of "FFS_EI" a bit 1999-08-03 19:22:43 +00:00
wrstuden 976aedb7ac Adjust mountroot routines to vrele rootvp in case of mount error. Closes
PR 7977 by Neil Carson, <neil@brini.com>.
1999-07-17 01:08:28 +00:00
wrstuden 379a26972f Modify file systems to deal with struct lock in struct vnode. All leaf
fs's other than nfs use genfs_lock() for locking.

Modify lookup routines to set PDIRUNLOCK when they unlock the parrent.
1999-07-08 01:05:58 +00:00
mrg d2397ac5f7 completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
1999-03-24 05:50:49 +00:00
mycroft b174019ccc Pass null pointers to VOP_UPDATE rather than having all the callers fetch the
current time themselves.
1999-03-05 21:09:48 +00:00
mycroft 86ed73efb4 Permit the access and modify time pointers passed to VOP_UPDATE to be null,
meaning the current time.
1999-03-05 20:47:06 +00:00
bouyer 00d7241e81 Don't check fs_bsize before the superblock has been swapped if needed.
Check value of sbsize before allocating memory with this value.
1999-03-05 12:02:18 +00:00
wrstuden 862a56e88b Modify vfsops to seperate vfs_fhtovp() into two routines. vfs_fhtovp() now
only handles the file handle to vnode conversion, and a new call,
vfs_checkexp(), performs the export verification.
1999-02-26 23:44:43 +00:00
bouyer 22d556f6cf Make sure a buffer optained from bread() is always bresle()'d in case of
error. Closes PR kern/1448 from Wolfgang Solfrank.
1999-02-10 13:14:08 +00:00
bouyer cdbe530495 No need to #include malloc.h here. 1998-12-04 11:02:30 +00:00
bouyer 3efc699962 Sanity check a few values in the superblock, to avoid mallocing huge
memory area if we try to mount a corrupted filesystem. Fixes kern/3933.
1998-12-04 11:00:40 +00:00
thorpej 1fcae7f1be defopt FFS_EI 1998-11-12 19:51:10 +00:00
mycroft 1a5b9c6c30 Do not corrupt file flags when file system is full! 1998-10-27 21:32:58 +00:00
thorpej f7948d05a1 Use DINODE_SIZE rather than pointer arithmetic. 1998-10-23 00:31:28 +00:00
christos e62c8fd824 Missed a conditional for FFS_EI; appears when we compile without -Ox 1998-10-04 18:07:57 +00:00
thorpej 60cfe320cc Use the pool allocator and the "nointr" pool page allocator for FFS inodes.
XXX MFS also comes in here for inodes, and used a different malloc type,
but the structure is the same, so we just use the FFS inode pool.
1998-09-01 03:11:08 +00:00
thorpej 39f683419f Back out part of last change (uninitialized work-around). 1998-08-18 18:15:41 +00:00
thorpej 6fc90a1a4d Add some braces to make egcs happy (ambiguous else warning). Also,
deal with bogus uninitialized warning (__noreturn__ related)
1998-08-18 06:47:53 +00:00
perry 27ca6798df bzero->memset, bcopy->memcpy, bcmp->memcmp 1998-08-09 20:15:38 +00:00
drochner 2dcc522f1d The fragtbl[], inside[] and around[] variables are needed by "fsck",
so we can't put them inside "#ifdef _KERNEL".
Put declarations inside .c files where needed to preserve namespace.
1998-07-28 17:30:01 +00:00
mycroft 829367f279 Omit some externs if not _KERNEL. 1998-07-28 04:17:51 +00:00
jonathan d275e56dee * defopt COMPAT_{09,10,11,12,13} and COMPAT_NOMID.
TODO: revisit interaction between native compat and emul compat usage.
1998-07-05 08:49:30 +00:00
sommerfe 7ba7fbbb23 Always include fifos; "not an option any more". 1998-06-24 20:58:44 +00:00
sommerfe becaafeea0 defopt for options FIFO 1998-06-22 22:00:59 +00:00
kleink 2d869bbacf KNF, mostly of FFS_EI changes. 1998-06-13 16:26:22 +00:00
cgd 651b44e211 Rework the way kernel include files are installed. In the new method,
as with user-land programs, include files are installed by each directory
in the tree that has includes to install.  (This allows more flexibility
as to what gets installed, makes 'partial installs' easier, and gives us
more options as to which machines' includes get installed at any given
time.)  The old SYS_INCLUDES={symlinks,copies} behaviours are _both_
still supported, though at least one bug in the 'symlinks' case is
fixed by this change.  Include files can't be build before installation,
so directories that have includes as targets (e.g. dev/pci) have to move
those targets into a different Makefile.
1998-06-12 23:22:30 +00:00
kleink 74ce7ac984 KNF: only include one of <sys/{param,types}.h>, not both. 1998-06-10 15:57:39 +00:00
scottr 7171cca4b8 Protect various config(8)-generated files from inclusion while
building LKMs.  Fixes PR 5557.
1998-06-09 07:46:31 +00:00
ragge 1a66918fc0 Wrong include file order; caused compile error on vax. 1998-06-08 17:59:08 +00:00
scottr d48f258f90 Use the newly-defined opt_quota.h. 1998-06-08 04:27:50 +00:00
kleink 382743ada3 Convert fsync vnode operator implementations and usage from the old `waitfor'
argument and MNT_WAIT/MNT_NOWAIT to `flags' and FSYNC_WAIT.
1998-06-05 19:53:00 +00:00
ross ac5774c288 Fix a 64-bit pointer/int warning. 1998-03-19 03:42:35 +00:00
bouyer 091dafd39f Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
1998-03-18 15:57:26 +00:00
fvdl e5bc90f40c Merge with Lite2 + local changes 1998-03-01 02:20:01 +00:00
thorpej b5bf2ed6d0 Place a pointer to an array of our vnodeopv_desc *'s in our vfsops
structure, for use by vfs_attach().
1998-02-18 07:05:47 +00:00
mrg d90485202c - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
1998-02-10 14:08:44 +00:00
mrg 1a8c7604f4 initial import of the new virtual memory system, UVM, into -current.
UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code.  i provided some help
getting swap and paging working, and other bug fixes/ideas.  chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly.  :-)
1998-02-05 07:59:28 +00:00
mjacob 949982f2a4 In calculating the f_bavail field, don't take 32 bit quantities and
multiply them by 90 (to be divided by 100) and expect them to be sane
for very large values (I was getting a negative 'avail' count).
1997-10-16 18:29:11 +00:00
fvdl e351013e56 Fix messed up RCS Id. 1997-07-22 14:36:31 +00:00
fvdl 5d96e77ef6 Get locking around inode hashing right. 1997-07-07 23:37:36 +00:00
fvdl acffafa288 Oops, I messed up the lock. Reverting it until I have time to fix it,
to avoid people getting trouble after the supscan hits.
1997-07-07 11:47:06 +00:00