Commit Graph

231 Commits

Author SHA1 Message Date
perseant
8886b0f4b2 Phase one of my three-phase plan to make LFS play nice with UBC, and bug-fixes
I found while making sure there weren't any new ones.

* Make the write clusters keep track of the buffers whose blocks they contain.
  This should make it possible to (1) write clusters using a page mapping
  instead of malloc, if desired, and (2) schedule blocks for rewriting
  (somewhere else) if a write error occurs.  Code is present to use
  pagemove() to construct the clusters but that is untested and will go away
  anyway in favor of page mapping.
* DEBUG now keeps a log of Ifile writes, so that any lingering instances of
  the "dirty bufs" problem can be properly debugged.
* Keep track of whether the Ifile has been dirtied by various routines that
  can be called by lfs_segwrite, and loop on that until it is clean, for
  a checkpoint.  Checkpoints need to be squeaky clean.
* Warn the user (once) if the Ifile grows larger than is reasonable for their
  buffer cache.  Both lfs_mountfs and lfs_unmount check since the Ifile can
  grow.
* If an inode is not found in a disk block, try rereading the block, under
  the assumption that the block was copied to a cluster and then freed.
* Protect WRITEINPROG() with splbio() to fix a hang in lfs_update.
2002-05-14 20:03:53 +00:00
matt
0cb85bc7b9 Eliminate commons. 2002-05-12 23:06:27 +00:00
perseant
76d2795556 Make exported LFSes not panic on the first file create. 2002-04-27 01:00:46 +00:00
thorpej
a180cee23b Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map).  Try to deal with this:

* Group all information about the backend allocator for a pool in a
  separate structure.  The pool references this structure, rather than
  the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
  to become available, but will still fail if it cannot callocate KVA
  space for the pages.  If this happens, carefully drain all pools using
  the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
  some pages, and use that information to make draining easier and more
  efficient.
* Get rid of PR_URGENT.  There was only one use of it, and it could be
  dealt with by the caller.

From art@openbsd.org.
2002-03-08 20:48:27 +00:00
perseant
f41358613c Include the space taken by inodes in the count made by lfs_check();
make VOP_SETATTR call lfs_check.  This prevents large numbers of inode
changes (say, at the end of tar(1)) from filling the buffer cache.
2002-02-11 02:47:29 +00:00
perseant
8ded9a2c7d Correct free list tail pointer, when adding blocks of new inodes to v2
filesystems.  Should fix PR #14408.
2002-02-04 03:32:16 +00:00
chs
0d70d731c2 use the new compatibility routines to allow mmap() to work
(in the same non-coherent fashion that it worked pre-UBC)
until someone has time to do it the right way.
2001-12-18 07:51:16 +00:00
chs
a106161b5a add spaces for KNF. confirmed to produce identical objects. 2001-11-23 21:44:25 +00:00
lukem
2565646230 don't need <sys/types.h> when including <sys/param.h> 2001-11-15 09:47:59 +00:00
lukem
ec6245465a add RCSID 2001-11-08 02:39:06 +00:00
simonb
c56d879335 Remove some variables that are set but never used. 2001-11-06 07:11:29 +00:00
lukem
99147a7648 remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former.  leave the former
in source that obviously uses specific bits of it (for completeness.)
2001-10-26 05:56:06 +00:00
chs
a2e3e57398 initialize the vnode's copy of the size in lfs_ialloc(). 2001-10-14 19:06:16 +00:00
chs
80373b7e54 don't depend on other headers to include sys/proc.h for us. 2001-09-28 11:59:51 +00:00
sommerfeld
181c4513dc Add fifo_putpages() placebo so that the vnode's uobj is unlocked. 2001-09-22 22:35:18 +00:00
chs
64c6d1d2dc a whole bunch of changes to improve performance and robustness under load:
- remove special treatment of pager_map mappings in pmaps.  this is
   required now, since I've removed the globals that expose the address range.
   pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
   no longer any need to special-case it.
 - eliminate struct uvm_vnode by moving its fields into struct vnode.
 - rewrite the pageout path.  the pager is now responsible for handling the
   high-level requests instead of only getting control after a bunch of work
   has already been done on its behalf.  this will allow us to UBCify LFS,
   which needs tighter control over its pages than other filesystems do.
   writing a page to disk no longer requires making it read-only, which
   allows us to write wired pages without causing all kinds of havoc.
 - use a new PG_PAGEOUT flag to indicate that a page should be freed
   on behalf of the pagedaemon when it's unlocked.  this flag is very similar
   to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
   pageout fails due to eg. an indirect-block buffer being locked.
   this allows us to remove the "version" field from struct vm_page,
   and together with shrinking "loan_count" from 32 bits to 16,
   struct vm_page is now 4 bytes smaller.
 - no longer use PG_RELEASED for swap-backed pages.  if the page is busy
   because it's being paged out, we can't release the swap slot to be
   reallocated until that write is complete, but unlike with vnodes we
   don't keep a count of in-progress writes so there's no good way to
   know when the write is done.  instead, when we need to free a busy
   swap-backed page, just sleep until we can get it busy ourselves.
 - implement a fast-path for extending writes which allows us to avoid
   zeroing new pages.  this substantially reduces cpu usage.
 - encapsulate the data used by the genfs code in a struct genfs_node,
   which must be the first element of the filesystem-specific vnode data
   for filesystems which use genfs_{get,put}pages().
 - eliminate many of the UVM pagerops, since they aren't needed anymore
   now that the pager "put" operation is a higher-level operation.
 - enhance the genfs code to allow NFS to use the genfs_{get,put}pages
   instead of a modified copy.
 - clean up struct vnode by removing all the fields that used to be used by
   the vfs_cluster.c code (which we don't use anymore with UBC).
 - remove kmem_object and mb_object since they were useless.
   instead of allocating pages to these objects, we now just allocate
   pages with no object.  such pages are mapped in the kernel until they
   are freed, so we can use the mapping to find the page to free it.
   this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
2001-09-15 20:36:31 +00:00
chs
adf5d360a7 add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl.  file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value.  the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
2001-09-15 16:12:54 +00:00
chs
cb3b720183 disable mmap() for LFS until it is fixed. 2001-08-24 06:42:46 +00:00
chs
f0af9f581b add getpages/putpages entries for spec vnodes. 2001-08-17 05:54:36 +00:00
jdolecek
58ed62e500 Constraint 'blkcnt' of lfs_markv() syscall by 64KB. Reviewed by
Konrad Schroder <perseant@NetBSD.org>.
2001-08-03 06:02:42 +00:00
jdolecek
bd21ec5d2e lfs_writeseg(): make el_size a size_t (cosmetic only, no functional change) 2001-07-26 20:20:15 +00:00
assar
bec71dc090 change vop_symlink and vop_mknod to return vpp (the created node)
refed, so that the caller can actually use it.  update callers and
file systems that implement these vnode operations
2001-07-24 15:39:30 +00:00
perseant
4e3fced95b Merge the short-lived perseant-lfsv2 branch into the trunk.
Kernels and tools understand both v1 and v2 filesystems; newfs_lfs
generates v2 by default.  Changes for the v2 layout include:

- Segments of non-PO2 size and arbitrary block offset, so these can be
  matched to convenient physical characteristics of the partition (e.g.,
  stripe or track size and offset).

- Address by fragment instead of by disk sector, paving the way for
  non-512-byte-sector devices.  In theory fragments can be as large
  as you like, though in reality they must be smaller than MAXBSIZE in size.

- Use serial number and filesystem identifier to ensure that roll-forward
  doesn't get old data and think it's new.  Roll-forward is enabled for
  v2 filesystems, though not for v1 filesystems by default.

- The inode free list is now a tailq, paving the way for undelete (undelete
  is not yet implemented, but can be without further non-backwards-compatible
  changes to disk structures).

- Inode atime information is kept in the Ifile, instead of on the inode;
  that is, the inode is never written *just* because atime was changed.
  Because of this the inodes remain near the file data on the disk, rather
  than wandering all over as the disk is read repeatedly.  This speeds up
  repeated reads by a small but noticeable amount.

Other changes of note include:

- The ifile written by newfs_lfs can now be of arbitrary length, it is no
  longer restricted to a single indirect block.

- Fixed an old bug where ctime was changed every time a vnode was created.
  I need to look more closely to make sure that the times are only updated
  during write(2) and friends, not after-the-fact during a segment write,
  and certainly not by the cleaner.
2001-07-13 20:30:18 +00:00
toshii
4866f1a22b Fix typo. s/extention/extension/ 2001-07-05 08:38:24 +00:00
mrg
67afbd6270 use _KERNEL_OPT 2001-05-30 11:57:16 +00:00
christos
bf4fd5e39c don't include lfs_extern.h; ufs/inode.h does too. 2001-02-04 21:51:19 +00:00
itohy
7c338ddc48 Call inittodr() from lfs_mountroot() so that the system time is set properly
when booted from LFS.
2001-01-26 07:59:23 +00:00
jdolecek
d9466585b7 make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const 2001-01-22 12:17:35 +00:00
joff
a6ef389457 If DIAGNOSTIC and the segment writer gets a badly sized buffer, panic()
instead of silently corrupting the filesystem.
2001-01-09 05:05:35 +00:00
cgd
1a1dca038e replace \<space(s)><newline> (wrong!) with \<newline> 2000-12-20 00:24:23 +00:00
perseant
32d11b86a5 Call uvm_vmp_setsize() in lfs_{fast,}vget to set initial vnode size. 2000-12-03 07:34:49 +00:00
perseant
72633be8c6 Fix typo in 'malloc' for non-MALLOCLOG case 2000-12-03 06:43:36 +00:00
perseant
2a53ff5ab9 Get rid of some old unnecessary code that cleared B_NEEDCOMMIT from buffers in
lfs_writeseg (possibly after they had been freed).

If MALLOCLOG is defined, make lfs_newbuf and lfs_freebuf pass along the
caller's file and line to _malloc and _free.
2000-12-03 05:56:27 +00:00
chs
65a9d68fda don't forget to set um_lognindir (now required by ufs_bmaparray()). 2000-12-03 05:27:51 +00:00
jdolecek
bf558e3b3e only include opt_ddb.h for !LKM 2000-11-30 15:59:47 +00:00
jdolecek
734f246738 no need to include fs_lfs.h, define LFS directly 2000-11-30 15:57:35 +00:00
chs
aeda8d3b77 Initial integration of the Unified Buffer Cache project. 2000-11-27 08:39:39 +00:00
perseant
0055236dda If LFS_DO_ROLLFORWARD is defined, roll forward from the older checkpoint
on mount, through the newer checkpoint and on through any newer
partial-segments that may have been written but not checkpointed because
of an intervening crash.

LFS_DO_ROLLFORWARD is not defined by default.
2000-11-27 03:33:57 +00:00
perseant
77b518b85d Use u_int32_t instead of u_long to compute LFS checksums, since the
checksum is stored in a u_int32_t.
2000-11-25 02:39:34 +00:00
perseant
e4911189f1 Protect lfs_{bmapv,markv} with vfs_{un,}busy. Fix a reference/lock leak
in an error case in lfs_markv.  Change the vfs_getvfs() error to return
ENOENT, for consistency with failure of vfs_busy().

99% of this patch was from Jesse Off <joff@gci-net.com> (PR #11547).
2000-11-22 22:11:34 +00:00
perseant
c398987151 More locked_queue_* and lfs_avail accounting fixes from Jesse Off
<joff@gci-net.com>.  Remove a specious btodb() in lfs_fragextend, and
count blocks shrunk or removed by VOP_TRUNCATE in lfs_avail.
2000-11-21 00:00:31 +00:00
toshii
92a17c6ecd Make buildable again.
The previous commit was a backout of rev. 1.45, which must be an accident.
2000-11-18 02:11:23 +00:00
perseant
31fc62d4e9 Correct accounting of lfs_avail, locked_queue_count, and locked_queue_bytes.
(PR #11468).  In the case of fragment allocation, check to see if enough
space is available before extending a fragment already scheduled for writing.

The locked_queue_* variables indicate the number of buffer headers and bytes,
respectively, that are unavailable to getnewbuf() because they are locked up
waiting for LFS to flush them; make sure that that is actually what we're
counting, i.e., never count malloced buffers, and always use b_bufsize instead
of b_bcount.

If DEBUG is defined, the periodic calls to lfs_countlocked will now complain
if either counter is incorrect.  (In the future lfs_countlocked will not need
to be called at all if DEBUG is not defined.)
2000-11-17 19:14:41 +00:00
perseant
b880487624 Initialize the cleaner information in the Ifile from the same info from
the superblock at fs mount time, enabling the previous patch to fsck_lfs.
Patch from Jesse Off <joff@gci-net.com> (Closes PR #11470).
2000-11-14 00:42:55 +00:00
perseant
a07c936a59 Remove debugging code that accidentally went in with yesterday's commit. 2000-11-13 00:24:30 +00:00
perseant
c4c7b2adbb Do not needlessly dirty segment table blocks during lfs_segwrite,
preventing needless disk activity when the filesystem is idle.  (PR #10979.)
2000-11-12 07:58:36 +00:00
toshii
af22f56146 Fix obsolete comments in lfs_writeinode since rev. 1.27.
New comments are mostly from perseant, with my additions.
2000-11-12 02:13:51 +00:00
toshii
0036e468ef In lfs_fastvget(), initialize i_lfs_effnblks correctly. 2000-10-21 13:53:25 +00:00
perseant
26f26aafcd Do not increment the clean segment counter, if a segment that the cleaner
is trying to clean is already clean (e.g., if two lfs_cleanerds are running
at once.)
2000-10-20 17:48:05 +00:00
perseant
7a4d35b365 In lfs_truncate, don't overcount the real blocks removed from the inode,
when deallocating a fragment that has not made it to disk yet.

Also, during dirops, give the directory vnode an extra reference in
SET_DIROP, to ensure its continued existence during SET_ENDOP, preventing
a possible NULL-dereference there.

These two changes should close PR #11064.
2000-10-14 23:22:14 +00:00