2000-12-03 09:43:36 +03:00
|
|
|
/* $NetBSD: lfs_bio.c,v 1.35 2000/12/03 06:43:36 perseant Exp $ */
|
1994-06-29 10:39:25 +04:00
|
|
|
|
1999-03-10 03:20:00 +03:00
|
|
|
/*-
|
Various bug-fixes to LFS, to wit:
Kernel:
* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.
If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.
* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.
* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.
* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().
lfs_cleanerd:
* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.
* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.
* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.
* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).
fsck_lfs:
* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.
newfs_lfs:
* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.
* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).
* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.
mount_lfs:
* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.
* Default to calling lfs_cleanerd with "-b -n 4".
[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
2000-09-09 08:49:54 +04:00
|
|
|
* Copyright (c) 1999, 2000 The NetBSD Foundation, Inc.
|
1999-03-10 03:20:00 +03:00
|
|
|
* All rights reserved.
|
|
|
|
*
|
|
|
|
* This code is derived from software contributed to The NetBSD Foundation
|
|
|
|
* by Konrad E. Schroder <perseant@hhhh.org>.
|
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
* 3. All advertising materials mentioning features or use of this software
|
|
|
|
* must display the following acknowledgement:
|
|
|
|
* This product includes software developed by the NetBSD
|
|
|
|
* Foundation, Inc. and its contributors.
|
|
|
|
* 4. Neither the name of The NetBSD Foundation nor the names of its
|
|
|
|
* contributors may be used to endorse or promote products derived
|
|
|
|
* from this software without specific prior written permission.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
|
|
|
|
* ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
|
|
|
* TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
|
|
|
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
|
|
|
|
* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
|
|
|
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
|
|
|
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
|
|
|
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
|
|
|
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
|
|
|
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
|
|
|
* POSSIBILITY OF SUCH DAMAGE.
|
|
|
|
*/
|
1994-06-08 15:41:58 +04:00
|
|
|
/*
|
|
|
|
* Copyright (c) 1991, 1993
|
|
|
|
* The Regents of the University of California. All rights reserved.
|
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
* 3. All advertising materials mentioning features or use of this software
|
|
|
|
* must display the following acknowledgement:
|
|
|
|
* This product includes software developed by the University of
|
|
|
|
* California, Berkeley and its contributors.
|
|
|
|
* 4. Neither the name of the University nor the names of its contributors
|
|
|
|
* may be used to endorse or promote products derived from this software
|
|
|
|
* without specific prior written permission.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
|
|
|
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
|
|
|
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
|
|
|
*
|
1998-03-01 05:20:01 +03:00
|
|
|
* @(#)lfs_bio.c 8.10 (Berkeley) 6/10/95
|
1994-06-08 15:41:58 +04:00
|
|
|
*/
|
|
|
|
|
|
|
|
#include <sys/param.h>
|
1996-02-10 01:28:45 +03:00
|
|
|
#include <sys/systm.h>
|
1994-06-08 15:41:58 +04:00
|
|
|
#include <sys/proc.h>
|
|
|
|
#include <sys/buf.h>
|
|
|
|
#include <sys/vnode.h>
|
|
|
|
#include <sys/resourcevar.h>
|
|
|
|
#include <sys/mount.h>
|
|
|
|
#include <sys/kernel.h>
|
|
|
|
|
|
|
|
#include <ufs/ufs/quota.h>
|
|
|
|
#include <ufs/ufs/inode.h>
|
|
|
|
#include <ufs/ufs/ufsmount.h>
|
1996-02-10 01:28:45 +03:00
|
|
|
#include <ufs/ufs/ufs_extern.h>
|
1994-06-08 15:41:58 +04:00
|
|
|
|
1999-03-10 03:20:00 +03:00
|
|
|
#include <sys/malloc.h>
|
1994-06-08 15:41:58 +04:00
|
|
|
#include <ufs/lfs/lfs.h>
|
|
|
|
#include <ufs/lfs/lfs_extern.h>
|
|
|
|
|
1999-03-10 03:20:00 +03:00
|
|
|
/* Macros to clear/set/test flags. */
|
|
|
|
# define SET(t, f) (t) |= (f)
|
|
|
|
# define CLR(t, f) (t) &= ~(f)
|
|
|
|
# define ISSET(t, f) ((t) & (f))
|
|
|
|
|
1994-06-08 15:41:58 +04:00
|
|
|
/*
|
|
|
|
* LFS block write function.
|
|
|
|
*
|
|
|
|
* XXX
|
|
|
|
* No write cost accounting is done.
|
|
|
|
* This is almost certainly wrong for synchronous operations and NFS.
|
|
|
|
*/
|
|
|
|
int lfs_allclean_wakeup; /* Cleaner wakeup address. */
|
1999-03-10 03:20:00 +03:00
|
|
|
int locked_queue_count = 0; /* XXX Count of locked-down buffers. */
|
|
|
|
long locked_queue_bytes = 0L; /* XXX Total size of locked buffers. */
|
|
|
|
int lfs_writing = 0; /* Set if already kicked off a writer
|
1994-06-08 15:41:58 +04:00
|
|
|
because of buffer space */
|
1999-03-10 03:20:00 +03:00
|
|
|
extern int lfs_dostats;
|
|
|
|
|
Various bug-fixes to LFS, to wit:
Kernel:
* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.
If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.
* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.
* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.
* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().
lfs_cleanerd:
* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.
* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.
* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.
* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).
fsck_lfs:
* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.
newfs_lfs:
* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.
* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).
* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.
mount_lfs:
* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.
* Default to calling lfs_cleanerd with "-b -n 4".
[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
2000-09-09 08:49:54 +04:00
|
|
|
/*
|
|
|
|
* Try to reserve some blocks, prior to performing a sensitive operation that
|
|
|
|
* requires the vnode lock to be honored. If there is not enough space, give
|
|
|
|
* up the vnode lock temporarily and wait for the space to become available.
|
|
|
|
*
|
|
|
|
* Called with vp locked. (Note nowever that if nb < 0, vp is ignored.)
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
lfs_reserve(fs, vp, nb)
|
|
|
|
struct lfs *fs;
|
|
|
|
struct vnode *vp;
|
|
|
|
int nb;
|
|
|
|
{
|
|
|
|
CLEANERINFO *cip;
|
|
|
|
struct buf *bp;
|
|
|
|
int error, slept;
|
|
|
|
|
|
|
|
slept = 0;
|
|
|
|
while (nb > 0 && !lfs_fits(fs, nb + fs->lfs_ravail)) {
|
|
|
|
VOP_UNLOCK(vp, 0);
|
|
|
|
|
2000-09-13 04:07:56 +04:00
|
|
|
if (!slept) {
|
|
|
|
#ifdef DEBUG
|
Various bug-fixes to LFS, to wit:
Kernel:
* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.
If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.
* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.
* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.
* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().
lfs_cleanerd:
* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.
* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.
* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.
* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).
fsck_lfs:
* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.
newfs_lfs:
* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.
* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).
* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.
mount_lfs:
* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.
* Default to calling lfs_cleanerd with "-b -n 4".
[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
2000-09-09 08:49:54 +04:00
|
|
|
printf("lfs_reserve: waiting for %ld (bfree = %d,"
|
2000-09-13 04:07:56 +04:00
|
|
|
" est_bfree = %d)\n",
|
Various bug-fixes to LFS, to wit:
Kernel:
* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.
If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.
* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.
* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.
* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().
lfs_cleanerd:
* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.
* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.
* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.
* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).
fsck_lfs:
* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.
newfs_lfs:
* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.
* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).
* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.
mount_lfs:
* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.
* Default to calling lfs_cleanerd with "-b -n 4".
[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
2000-09-09 08:49:54 +04:00
|
|
|
nb + fs->lfs_ravail, fs->lfs_bfree,
|
2000-09-13 04:07:56 +04:00
|
|
|
LFS_EST_BFREE(fs));
|
|
|
|
#endif
|
|
|
|
}
|
Various bug-fixes to LFS, to wit:
Kernel:
* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.
If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.
* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.
* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.
* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().
lfs_cleanerd:
* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.
* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.
* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.
* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).
fsck_lfs:
* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.
newfs_lfs:
* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.
* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).
* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.
mount_lfs:
* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.
* Default to calling lfs_cleanerd with "-b -n 4".
[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
2000-09-09 08:49:54 +04:00
|
|
|
++slept;
|
|
|
|
|
|
|
|
/* Wake up the cleaner */
|
|
|
|
LFS_CLEANERINFO(cip, fs, bp);
|
2000-11-12 10:58:36 +03:00
|
|
|
LFS_SYNC_CLEANERINFO(cip, fs, bp, 0);
|
Various bug-fixes to LFS, to wit:
Kernel:
* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.
If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.
* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.
* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.
* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().
lfs_cleanerd:
* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.
* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.
* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.
* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).
fsck_lfs:
* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.
newfs_lfs:
* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.
* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).
* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.
mount_lfs:
* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.
* Default to calling lfs_cleanerd with "-b -n 4".
[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
2000-09-09 08:49:54 +04:00
|
|
|
wakeup(&lfs_allclean_wakeup);
|
|
|
|
wakeup(&fs->lfs_nextseg);
|
|
|
|
|
|
|
|
error = tsleep(&fs->lfs_avail, PCATCH | PUSER, "lfs_reserve",
|
|
|
|
0);
|
|
|
|
vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); /* XXX use lockstatus */
|
|
|
|
if (error)
|
|
|
|
return error;
|
|
|
|
}
|
|
|
|
if (slept)
|
|
|
|
printf("lfs_reserve: woke up\n");
|
|
|
|
fs->lfs_ravail += nb;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
1994-06-08 15:41:58 +04:00
|
|
|
/*
|
1999-03-10 03:20:00 +03:00
|
|
|
*
|
|
|
|
* XXX we don't let meta-data writes run out of space because they can
|
|
|
|
* come from the segment writer. We need to make sure that there is
|
|
|
|
* enough space reserved so that there's room to write meta-data
|
|
|
|
* blocks.
|
|
|
|
*
|
|
|
|
* Also, we don't let blocks that have come to us from the cleaner
|
|
|
|
* run out of space.
|
|
|
|
*/
|
|
|
|
#define CANT_WAIT(BP,F) (IS_IFILE((BP)) || (BP)->b_lblkno<0 || ((F) & BW_CLEAN))
|
1994-06-08 15:41:58 +04:00
|
|
|
|
|
|
|
int
|
1996-02-10 01:28:45 +03:00
|
|
|
lfs_bwrite(v)
|
|
|
|
void *v;
|
|
|
|
{
|
1994-06-08 15:41:58 +04:00
|
|
|
struct vop_bwrite_args /* {
|
|
|
|
struct buf *a_bp;
|
1996-02-10 01:28:45 +03:00
|
|
|
} */ *ap = v;
|
2000-03-30 16:41:09 +04:00
|
|
|
struct buf *bp = ap->a_bp;
|
2000-06-07 00:19:14 +04:00
|
|
|
struct inode *ip;
|
|
|
|
|
|
|
|
ip = VTOI(bp->b_vp);
|
1999-03-10 03:20:00 +03:00
|
|
|
|
|
|
|
#ifdef DIAGNOSTIC
|
2000-06-07 00:19:14 +04:00
|
|
|
if (VTOI(bp->b_vp)->i_lfs->lfs_ronly == 0 && (bp->b_flags & B_ASYNC)) {
|
1999-03-10 03:20:00 +03:00
|
|
|
panic("bawrite LFS buffer");
|
1999-12-15 10:10:32 +03:00
|
|
|
}
|
1999-03-10 03:20:00 +03:00
|
|
|
#endif /* DIAGNOSTIC */
|
|
|
|
return lfs_bwrite_ext(bp,0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Determine if there is enough room currently available to write db
|
|
|
|
* disk blocks. We need enough blocks for the new blocks, the current
|
|
|
|
* inode blocks, a summary block, plus potentially the ifile inode and
|
|
|
|
* the segment usage table, plus an ifile page.
|
|
|
|
*/
|
2000-11-17 22:14:41 +03:00
|
|
|
int
|
2000-06-07 00:19:14 +04:00
|
|
|
lfs_fits(struct lfs *fs, int db)
|
1999-03-10 03:20:00 +03:00
|
|
|
{
|
2000-07-06 02:25:43 +04:00
|
|
|
int needed;
|
|
|
|
|
|
|
|
needed = db + btodb(LFS_SUMMARY_SIZE) +
|
Various bug-fixes to LFS, to wit:
Kernel:
* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.
If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.
* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.
* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.
* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().
lfs_cleanerd:
* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.
* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.
* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.
* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).
fsck_lfs:
* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.
newfs_lfs:
* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.
* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).
* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.
mount_lfs:
* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.
* Default to calling lfs_cleanerd with "-b -n 4".
[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
2000-09-09 08:49:54 +04:00
|
|
|
fsbtodb(fs, howmany(fs->lfs_uinodes + 1, INOPB(fs)) +
|
2000-07-06 02:25:43 +04:00
|
|
|
fs->lfs_segtabsz + 1);
|
Various bug-fixes to LFS, to wit:
Kernel:
* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.
If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.
* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.
* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.
* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().
lfs_cleanerd:
* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.
* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.
* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.
* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).
fsck_lfs:
* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.
newfs_lfs:
* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.
* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).
* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.
mount_lfs:
* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.
* Default to calling lfs_cleanerd with "-b -n 4".
[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
2000-09-09 08:49:54 +04:00
|
|
|
|
2000-07-06 02:25:43 +04:00
|
|
|
if (needed >= fs->lfs_avail) {
|
Various bug-fixes to LFS, to wit:
Kernel:
* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.
If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.
* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.
* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.
* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().
lfs_cleanerd:
* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.
* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.
* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.
* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).
fsck_lfs:
* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.
newfs_lfs:
* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.
* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).
* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.
mount_lfs:
* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.
* Default to calling lfs_cleanerd with "-b -n 4".
[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
2000-09-09 08:49:54 +04:00
|
|
|
#ifdef DEBUG
|
2000-07-06 02:25:43 +04:00
|
|
|
printf("lfs_fits: no fit: db = %d, uinodes = %d, "
|
|
|
|
"needed = %d, avail = %d\n",
|
|
|
|
db, fs->lfs_uinodes, needed, fs->lfs_avail);
|
|
|
|
#endif
|
1999-03-10 03:20:00 +03:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2000-11-17 22:14:41 +03:00
|
|
|
int
|
|
|
|
lfs_availwait(fs, db)
|
|
|
|
struct lfs *fs;
|
|
|
|
int db;
|
|
|
|
{
|
|
|
|
int error;
|
|
|
|
CLEANERINFO *cip;
|
|
|
|
struct buf *cbp;
|
|
|
|
|
|
|
|
while (!lfs_fits(fs, db)) {
|
|
|
|
/*
|
|
|
|
* Out of space, need cleaner to run.
|
|
|
|
* Update the cleaner info, then wake it up.
|
|
|
|
* Note the cleanerinfo block is on the ifile
|
|
|
|
* so it CANT_WAIT.
|
|
|
|
*/
|
|
|
|
LFS_CLEANERINFO(cip, fs, cbp);
|
|
|
|
LFS_SYNC_CLEANERINFO(cip, fs, cbp, 0);
|
|
|
|
|
|
|
|
printf("lfs_availwait: out of available space, "
|
|
|
|
"waiting on cleaner\n");
|
|
|
|
|
|
|
|
wakeup(&lfs_allclean_wakeup);
|
|
|
|
wakeup(&fs->lfs_nextseg);
|
|
|
|
#ifdef DIAGNOSTIC
|
|
|
|
if (fs->lfs_seglock && fs->lfs_lockpid == curproc->p_pid)
|
|
|
|
panic("lfs_availwait: deadlock");
|
|
|
|
#endif
|
|
|
|
error = tsleep(&fs->lfs_avail, PCATCH | PUSER, "cleaner", 0);
|
|
|
|
if (error)
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
1999-03-10 03:20:00 +03:00
|
|
|
int
|
|
|
|
lfs_bwrite_ext(bp, flags)
|
|
|
|
struct buf *bp;
|
|
|
|
int flags;
|
|
|
|
{
|
1994-06-08 15:41:58 +04:00
|
|
|
struct lfs *fs;
|
|
|
|
struct inode *ip;
|
1998-03-01 05:20:01 +03:00
|
|
|
int db, error, s;
|
1999-03-10 03:20:00 +03:00
|
|
|
|
1999-12-15 10:10:32 +03:00
|
|
|
/*
|
|
|
|
* Don't write *any* blocks if we're mounted read-only.
|
|
|
|
* In particular the cleaner can't write blocks either.
|
|
|
|
*/
|
|
|
|
if(VTOI(bp->b_vp)->i_lfs->lfs_ronly) {
|
2000-11-17 22:14:41 +03:00
|
|
|
bp->b_flags &= ~(B_DELWRI | B_READ | B_ERROR);
|
|
|
|
LFS_UNLOCK_BUF(bp);
|
1999-12-15 10:10:32 +03:00
|
|
|
if(bp->b_flags & B_CALL)
|
|
|
|
bp->b_flags &= ~B_BUSY;
|
|
|
|
else
|
|
|
|
brelse(bp);
|
|
|
|
return EROFS;
|
|
|
|
}
|
|
|
|
|
1994-06-08 15:41:58 +04:00
|
|
|
/*
|
|
|
|
* Set the delayed write flag and use reassignbuf to move the buffer
|
|
|
|
* from the clean list to the dirty one.
|
|
|
|
*
|
|
|
|
* Set the B_LOCKED flag and unlock the buffer, causing brelse to move
|
|
|
|
* the buffer onto the LOCKED free list. This is necessary, otherwise
|
|
|
|
* getnewbuf() would try to reclaim the buffers using bawrite, which
|
|
|
|
* isn't going to work.
|
|
|
|
*
|
|
|
|
* XXX we don't let meta-data writes run out of space because they can
|
|
|
|
* come from the segment writer. We need to make sure that there is
|
|
|
|
* enough space reserved so that there's room to write meta-data
|
|
|
|
* blocks.
|
|
|
|
*/
|
|
|
|
if (!(bp->b_flags & B_LOCKED)) {
|
|
|
|
fs = VFSTOUFS(bp->b_vp->v_mount)->um_lfs;
|
1998-03-01 05:20:01 +03:00
|
|
|
db = fragstodb(fs, numfrags(fs, bp->b_bcount));
|
2000-11-17 22:14:41 +03:00
|
|
|
if (!CANT_WAIT(bp, flags)) {
|
|
|
|
if ((error = lfs_availwait(fs, db)) != 0) {
|
1994-06-08 15:41:58 +04:00
|
|
|
brelse(bp);
|
2000-11-17 22:14:41 +03:00
|
|
|
return error;
|
1994-06-08 15:41:58 +04:00
|
|
|
}
|
|
|
|
}
|
1999-03-10 03:20:00 +03:00
|
|
|
|
|
|
|
ip = VTOI(bp->b_vp);
|
2000-06-28 00:57:11 +04:00
|
|
|
if (bp->b_flags & B_CALL) {
|
2000-07-06 02:25:43 +04:00
|
|
|
LFS_SET_UINO(ip, IN_CLEANING);
|
1999-03-10 03:20:00 +03:00
|
|
|
} else {
|
2000-07-06 02:25:43 +04:00
|
|
|
LFS_SET_UINO(ip, IN_CHANGE | IN_MODIFIED | IN_UPDATE);
|
1999-03-10 03:20:00 +03:00
|
|
|
}
|
1998-03-01 05:20:01 +03:00
|
|
|
fs->lfs_avail -= db;
|
2000-11-17 22:14:41 +03:00
|
|
|
bp->b_flags |= B_DELWRI;
|
|
|
|
|
|
|
|
LFS_LOCK_BUF(bp);
|
1994-06-08 15:41:58 +04:00
|
|
|
bp->b_flags &= ~(B_READ | B_ERROR);
|
2000-11-17 22:14:41 +03:00
|
|
|
s = splbio();
|
1994-06-08 15:41:58 +04:00
|
|
|
reassignbuf(bp, bp->b_vp);
|
|
|
|
splx(s);
|
1999-03-26 01:26:52 +03:00
|
|
|
|
1994-06-08 15:41:58 +04:00
|
|
|
}
|
1999-03-10 03:20:00 +03:00
|
|
|
|
1999-03-26 01:26:52 +03:00
|
|
|
if(bp->b_flags & B_CALL)
|
1999-03-10 03:20:00 +03:00
|
|
|
bp->b_flags &= ~B_BUSY;
|
|
|
|
else
|
|
|
|
brelse(bp);
|
|
|
|
|
1994-06-08 15:41:58 +04:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2000-06-28 00:57:11 +04:00
|
|
|
void
|
|
|
|
lfs_flush_fs(fs, flags)
|
|
|
|
struct lfs *fs;
|
1999-06-01 07:00:40 +04:00
|
|
|
int flags;
|
|
|
|
{
|
2000-06-28 00:57:11 +04:00
|
|
|
if(fs->lfs_ronly == 0 && fs->lfs_dirops == 0)
|
1999-06-01 07:00:40 +04:00
|
|
|
{
|
|
|
|
/* disallow dirops during flush */
|
2000-06-28 00:57:11 +04:00
|
|
|
fs->lfs_writer++;
|
1999-06-01 07:00:40 +04:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We set the queue to 0 here because we
|
|
|
|
* are about to write all the dirty
|
|
|
|
* buffers we have. If more come in
|
|
|
|
* while we're writing the segment, they
|
|
|
|
* may not get written, so we want the
|
|
|
|
* count to reflect these new writes
|
|
|
|
* after the segwrite completes.
|
|
|
|
*/
|
|
|
|
if(lfs_dostats)
|
|
|
|
++lfs_stats.flush_invoked;
|
2000-06-28 00:57:11 +04:00
|
|
|
lfs_segwrite(fs->lfs_ivnode->v_mount, flags);
|
1999-06-01 07:00:40 +04:00
|
|
|
|
|
|
|
/* XXX KS - allow dirops again */
|
2000-06-28 00:57:11 +04:00
|
|
|
if(--fs->lfs_writer==0)
|
|
|
|
wakeup(&fs->lfs_dirops);
|
1999-06-01 07:00:40 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
1994-06-08 15:41:58 +04:00
|
|
|
/*
|
|
|
|
* XXX
|
|
|
|
* This routine flushes buffers out of the B_LOCKED queue when LFS has too
|
|
|
|
* many locked down. Eventually the pageout daemon will simply call LFS
|
|
|
|
* when pages need to be reclaimed. Note, we have one static count of locked
|
|
|
|
* buffers, so we can't have more than a single file system. To make this
|
|
|
|
* work for multiple file systems, put the count into the mount structure.
|
|
|
|
*/
|
|
|
|
void
|
1999-03-10 03:20:00 +03:00
|
|
|
lfs_flush(fs, flags)
|
|
|
|
struct lfs *fs;
|
|
|
|
int flags;
|
1994-06-08 15:41:58 +04:00
|
|
|
{
|
Various bug-fixes to LFS, to wit:
Kernel:
* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.
If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.
* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.
* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.
* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().
lfs_cleanerd:
* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.
* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.
* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.
* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).
fsck_lfs:
* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.
newfs_lfs:
* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.
* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).
* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.
mount_lfs:
* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.
* Default to calling lfs_cleanerd with "-b -n 4".
[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
2000-09-09 08:49:54 +04:00
|
|
|
int s;
|
2000-03-30 16:41:09 +04:00
|
|
|
struct mount *mp, *nmp;
|
1999-03-10 03:20:00 +03:00
|
|
|
|
|
|
|
if(lfs_dostats)
|
|
|
|
++lfs_stats.write_exceeded;
|
1999-12-15 10:10:32 +03:00
|
|
|
if (lfs_writing && flags==0) {/* XXX flags */
|
|
|
|
#ifdef DEBUG_LFS
|
|
|
|
printf("lfs_flush: not flushing because another flush is active\n");
|
|
|
|
#endif
|
1994-06-08 15:41:58 +04:00
|
|
|
return;
|
1999-12-15 10:10:32 +03:00
|
|
|
}
|
1994-06-08 15:41:58 +04:00
|
|
|
lfs_writing = 1;
|
1999-03-10 03:20:00 +03:00
|
|
|
|
1998-03-01 05:20:01 +03:00
|
|
|
simple_lock(&mountlist_slock);
|
|
|
|
for (mp = mountlist.cqh_first; mp != (void *)&mountlist; mp = nmp) {
|
|
|
|
if (vfs_busy(mp, LK_NOWAIT, &mountlist_slock)) {
|
|
|
|
nmp = mp->mnt_list.cqe_next;
|
|
|
|
continue;
|
|
|
|
}
|
1999-03-10 03:20:00 +03:00
|
|
|
if (strncmp(&mp->mnt_stat.f_fstypename[0], MOUNT_LFS, MFSNAMELEN)==0)
|
2000-06-28 00:57:11 +04:00
|
|
|
lfs_flush_fs(((struct ufsmount *)mp->mnt_data)->ufsmount_u.lfs, flags);
|
1998-03-01 05:20:01 +03:00
|
|
|
simple_lock(&mountlist_slock);
|
|
|
|
nmp = mp->mnt_list.cqe_next;
|
|
|
|
vfs_unbusy(mp);
|
1994-06-08 15:41:58 +04:00
|
|
|
}
|
1998-03-01 05:20:01 +03:00
|
|
|
simple_unlock(&mountlist_slock);
|
1999-03-10 03:20:00 +03:00
|
|
|
|
2000-11-17 22:14:41 +03:00
|
|
|
#if 1 || defined(DEBUG)
|
Various bug-fixes to LFS, to wit:
Kernel:
* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.
If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.
* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.
* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.
* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().
lfs_cleanerd:
* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.
* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.
* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.
* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).
fsck_lfs:
* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.
newfs_lfs:
* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.
* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).
* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.
mount_lfs:
* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.
* Default to calling lfs_cleanerd with "-b -n 4".
[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
2000-09-09 08:49:54 +04:00
|
|
|
s = splbio();
|
|
|
|
lfs_countlocked(&locked_queue_count, &locked_queue_bytes);
|
|
|
|
splx(s);
|
1999-03-10 03:20:00 +03:00
|
|
|
wakeup(&locked_queue_count);
|
2000-11-17 22:14:41 +03:00
|
|
|
#endif /* 1 || DEBUG */
|
1999-03-10 03:20:00 +03:00
|
|
|
|
1994-06-08 15:41:58 +04:00
|
|
|
lfs_writing = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
1999-03-10 03:20:00 +03:00
|
|
|
lfs_check(vp, blkno, flags)
|
1994-06-08 15:41:58 +04:00
|
|
|
struct vnode *vp;
|
1998-03-01 05:20:01 +03:00
|
|
|
ufs_daddr_t blkno;
|
1999-03-10 03:20:00 +03:00
|
|
|
int flags;
|
1994-06-08 15:41:58 +04:00
|
|
|
{
|
|
|
|
int error;
|
1999-04-12 04:36:47 +04:00
|
|
|
struct lfs *fs;
|
2000-05-27 04:19:52 +04:00
|
|
|
struct inode *ip;
|
1999-11-06 23:33:05 +03:00
|
|
|
extern int lfs_dirvcount;
|
1999-04-12 04:36:47 +04:00
|
|
|
|
1994-06-08 15:41:58 +04:00
|
|
|
error = 0;
|
2000-05-27 04:19:52 +04:00
|
|
|
ip = VTOI(vp);
|
1999-03-10 03:20:00 +03:00
|
|
|
|
1994-06-08 15:41:58 +04:00
|
|
|
/* If out of buffers, wait on writer */
|
1999-03-10 03:20:00 +03:00
|
|
|
/* XXX KS - if it's the Ifile, we're probably the cleaner! */
|
2000-05-27 04:19:52 +04:00
|
|
|
if (ip->i_number == LFS_IFILE_INUM)
|
1999-03-10 03:20:00 +03:00
|
|
|
return 0;
|
2000-05-27 04:19:52 +04:00
|
|
|
/* If we're being called from inside a dirop, don't sleep */
|
|
|
|
if (ip->i_flag & IN_ADIROP)
|
1999-04-12 04:36:47 +04:00
|
|
|
return 0;
|
|
|
|
|
2000-05-27 04:19:52 +04:00
|
|
|
fs = ip->i_lfs;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we would flush below, but dirops are active, sleep.
|
|
|
|
* Note that a dirop cannot ever reach this code!
|
|
|
|
*/
|
|
|
|
while (fs->lfs_dirops > 0 &&
|
|
|
|
(locked_queue_count > LFS_MAX_BUFS ||
|
|
|
|
locked_queue_bytes > LFS_MAX_BYTES ||
|
|
|
|
lfs_dirvcount > LFS_MAXDIROP || fs->lfs_diropwait > 0))
|
|
|
|
{
|
|
|
|
++fs->lfs_diropwait;
|
|
|
|
tsleep(&fs->lfs_writer, PRIBIO+1, "bufdirop", 0);
|
|
|
|
--fs->lfs_diropwait;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (locked_queue_count > LFS_MAX_BUFS ||
|
|
|
|
locked_queue_bytes > LFS_MAX_BYTES ||
|
|
|
|
lfs_dirvcount > LFS_MAXDIROP || fs->lfs_diropwait > 0)
|
1999-03-10 03:20:00 +03:00
|
|
|
{
|
1999-04-12 04:36:47 +04:00
|
|
|
++fs->lfs_writer;
|
|
|
|
lfs_flush(fs, flags);
|
|
|
|
if(--fs->lfs_writer==0)
|
|
|
|
wakeup(&fs->lfs_dirops);
|
1999-03-10 03:20:00 +03:00
|
|
|
}
|
1999-06-01 07:00:40 +04:00
|
|
|
|
2000-11-27 06:33:57 +03:00
|
|
|
while (locked_queue_count > LFS_WAIT_BUFS
|
|
|
|
|| locked_queue_bytes > LFS_WAIT_BYTES)
|
1999-03-10 03:20:00 +03:00
|
|
|
{
|
|
|
|
if(lfs_dostats)
|
|
|
|
++lfs_stats.wait_exceeded;
|
2000-11-27 06:33:57 +03:00
|
|
|
#ifdef DEBUG
|
1999-12-15 10:10:32 +03:00
|
|
|
printf("lfs_check: waiting: count=%d, bytes=%ld\n",
|
|
|
|
locked_queue_count, locked_queue_bytes);
|
|
|
|
#endif
|
1999-03-10 03:20:00 +03:00
|
|
|
error = tsleep(&locked_queue_count, PCATCH | PUSER,
|
|
|
|
"buffers", hz * LFS_BUFWAIT);
|
2000-11-27 06:33:57 +03:00
|
|
|
if (error != EWOULDBLOCK)
|
|
|
|
break;
|
2000-05-06 00:59:20 +04:00
|
|
|
/*
|
|
|
|
* lfs_flush might not flush all the buffers, if some of the
|
2000-11-27 06:33:57 +03:00
|
|
|
* inodes were locked or if most of them were Ifile blocks
|
|
|
|
* and we weren't asked to checkpoint. Try flushing again
|
|
|
|
* to keep us from blocking indefinitely.
|
2000-05-06 00:59:20 +04:00
|
|
|
*/
|
2000-05-27 04:19:52 +04:00
|
|
|
if (locked_queue_count > LFS_MAX_BUFS ||
|
|
|
|
locked_queue_bytes > LFS_MAX_BYTES)
|
2000-05-06 00:59:20 +04:00
|
|
|
{
|
|
|
|
++fs->lfs_writer;
|
2000-11-27 06:33:57 +03:00
|
|
|
lfs_flush(fs, flags | SEGM_CKP);
|
2000-05-06 00:59:20 +04:00
|
|
|
if(--fs->lfs_writer==0)
|
|
|
|
wakeup(&fs->lfs_dirops);
|
|
|
|
}
|
1999-03-10 03:20:00 +03:00
|
|
|
}
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Allocate a new buffer header.
|
|
|
|
*/
|
2000-12-03 08:56:27 +03:00
|
|
|
#ifdef MALLOCLOG
|
|
|
|
# define DOMALLOC(S, T, F) _malloc((S), (T), (F), file, line)
|
|
|
|
struct buf *
|
|
|
|
lfs_newbuf_malloclog(vp, daddr, size, file, line)
|
|
|
|
struct vnode *vp;
|
|
|
|
ufs_daddr_t daddr;
|
|
|
|
size_t size;
|
|
|
|
char *file;
|
|
|
|
int line;
|
|
|
|
#else
|
2000-12-03 09:43:36 +03:00
|
|
|
# define DOMALLOC(S, T, F) malloc((S), (T), (F))
|
1999-03-10 03:20:00 +03:00
|
|
|
struct buf *
|
|
|
|
lfs_newbuf(vp, daddr, size)
|
|
|
|
struct vnode *vp;
|
|
|
|
ufs_daddr_t daddr;
|
|
|
|
size_t size;
|
2000-12-03 08:56:27 +03:00
|
|
|
#endif
|
1999-03-10 03:20:00 +03:00
|
|
|
{
|
|
|
|
struct buf *bp;
|
|
|
|
size_t nbytes;
|
|
|
|
int s;
|
|
|
|
|
|
|
|
nbytes = roundup(size, DEV_BSIZE);
|
|
|
|
|
2000-12-03 08:56:27 +03:00
|
|
|
bp = DOMALLOC(sizeof(struct buf), M_SEGMENT, M_WAITOK);
|
1999-03-10 03:20:00 +03:00
|
|
|
bzero(bp, sizeof(struct buf));
|
|
|
|
if (nbytes)
|
2000-12-03 08:56:27 +03:00
|
|
|
bp->b_data = DOMALLOC(nbytes, M_SEGMENT, M_WAITOK);
|
1999-03-10 03:20:00 +03:00
|
|
|
if(nbytes) {
|
|
|
|
bzero(bp->b_data, nbytes);
|
|
|
|
}
|
|
|
|
#ifdef DIAGNOSTIC
|
|
|
|
if(vp==NULL)
|
|
|
|
panic("vp is NULL in lfs_newbuf");
|
|
|
|
if(bp==NULL)
|
|
|
|
panic("bp is NULL after malloc in lfs_newbuf");
|
1994-06-08 15:41:58 +04:00
|
|
|
#endif
|
1999-03-10 03:20:00 +03:00
|
|
|
s = splbio();
|
|
|
|
bgetvp(vp, bp);
|
|
|
|
splx(s);
|
|
|
|
|
|
|
|
bp->b_bufsize = size;
|
|
|
|
bp->b_bcount = size;
|
|
|
|
bp->b_lblkno = daddr;
|
|
|
|
bp->b_blkno = daddr;
|
|
|
|
bp->b_error = 0;
|
|
|
|
bp->b_resid = 0;
|
|
|
|
bp->b_iodone = lfs_callback;
|
|
|
|
bp->b_flags |= B_BUSY | B_CALL | B_NOCACHE;
|
|
|
|
|
|
|
|
return (bp);
|
|
|
|
}
|
|
|
|
|
2000-12-03 08:56:27 +03:00
|
|
|
#ifdef MALLOCLOG
|
|
|
|
# define DOFREE(A, T) _free((A), (T), file, line)
|
|
|
|
void
|
|
|
|
lfs_freebuf_malloclog(bp, file, line)
|
|
|
|
struct buf *bp;
|
|
|
|
char *file;
|
|
|
|
int line;
|
|
|
|
#else
|
|
|
|
# define DOFREE(A, T) free((A), (T))
|
1999-03-10 03:20:00 +03:00
|
|
|
void
|
|
|
|
lfs_freebuf(bp)
|
|
|
|
struct buf *bp;
|
2000-12-03 08:56:27 +03:00
|
|
|
#endif
|
1999-03-10 03:20:00 +03:00
|
|
|
{
|
|
|
|
int s;
|
|
|
|
|
|
|
|
s = splbio();
|
|
|
|
if(bp->b_vp)
|
|
|
|
brelvp(bp);
|
|
|
|
splx(s);
|
|
|
|
if (!(bp->b_flags & B_INVAL)) { /* B_INVAL indicates a "fake" buffer */
|
2000-12-03 08:56:27 +03:00
|
|
|
DOFREE(bp->b_data, M_SEGMENT);
|
1999-03-10 03:20:00 +03:00
|
|
|
bp->b_data = NULL;
|
1994-06-08 15:41:58 +04:00
|
|
|
}
|
2000-12-03 08:56:27 +03:00
|
|
|
DOFREE(bp, M_SEGMENT);
|
1999-03-10 03:20:00 +03:00
|
|
|
}
|
1994-06-08 15:41:58 +04:00
|
|
|
|
1999-03-10 03:20:00 +03:00
|
|
|
/*
|
|
|
|
* Definitions for the buffer free lists.
|
|
|
|
*/
|
|
|
|
#define BQUEUES 4 /* number of free buffer queues */
|
|
|
|
|
|
|
|
#define BQ_LOCKED 0 /* super-blocks &c */
|
|
|
|
#define BQ_LRU 1 /* lru, useful buffers */
|
|
|
|
#define BQ_AGE 2 /* rubbish */
|
|
|
|
#define BQ_EMPTY 3 /* buffer headers with no memory */
|
|
|
|
|
|
|
|
extern TAILQ_HEAD(bqueues, buf) bufqueues[BQUEUES];
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Return a count of buffers on the "locked" queue.
|
Various bug-fixes to LFS, to wit:
Kernel:
* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.
If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.
* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.
* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.
* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().
lfs_cleanerd:
* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.
* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.
* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.
* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).
fsck_lfs:
* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.
newfs_lfs:
* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.
* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).
* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.
mount_lfs:
* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.
* Default to calling lfs_cleanerd with "-b -n 4".
[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
2000-09-09 08:49:54 +04:00
|
|
|
* Don't count malloced buffers, since they don't detract from the total.
|
1999-03-10 03:20:00 +03:00
|
|
|
*/
|
|
|
|
void
|
|
|
|
lfs_countlocked(count, bytes)
|
|
|
|
int *count;
|
|
|
|
long *bytes;
|
|
|
|
{
|
2000-03-30 16:41:09 +04:00
|
|
|
struct buf *bp;
|
|
|
|
int n = 0;
|
|
|
|
long int size = 0L;
|
1999-03-10 03:20:00 +03:00
|
|
|
|
|
|
|
for (bp = bufqueues[BQ_LOCKED].tqh_first; bp;
|
|
|
|
bp = bp->b_freelist.tqe_next) {
|
Various bug-fixes to LFS, to wit:
Kernel:
* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.
If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.
* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.
* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.
* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().
lfs_cleanerd:
* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.
* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.
* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.
* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).
fsck_lfs:
* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.
newfs_lfs:
* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.
* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).
* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.
mount_lfs:
* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.
* Default to calling lfs_cleanerd with "-b -n 4".
[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
2000-09-09 08:49:54 +04:00
|
|
|
if (bp->b_flags & B_CALL) /* Malloced buffer */
|
|
|
|
continue;
|
1999-03-10 03:20:00 +03:00
|
|
|
n++;
|
|
|
|
size += bp->b_bufsize;
|
Various bug-fixes to LFS, to wit:
Kernel:
* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.
If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.
* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.
* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.
* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().
lfs_cleanerd:
* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.
* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.
* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.
* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).
fsck_lfs:
* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.
newfs_lfs:
* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.
* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).
* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.
mount_lfs:
* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.
* Default to calling lfs_cleanerd with "-b -n 4".
[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
2000-09-09 08:49:54 +04:00
|
|
|
#ifdef DEBUG_LOCKED_LIST
|
|
|
|
if (n > nbuf)
|
|
|
|
panic("lfs_countlocked: this can't happen: more"
|
|
|
|
" buffers locked than exist");
|
|
|
|
#endif
|
1999-03-10 03:20:00 +03:00
|
|
|
}
|
2000-11-17 22:14:41 +03:00
|
|
|
#ifdef DEBUG
|
Various bug-fixes to LFS, to wit:
Kernel:
* Add runtime quantity lfs_ravail, the number of disk-blocks reserved
for writing. Writes to the filesystem first reserve a maximum amount
of blocks before their write is allowed to proceed; after the blocks
are allocated the reserved total is reduced by a corresponding amount.
If the lfs_reserve function cannot immediately reserve the requested
number of blocks, the inode is unlocked, and the thread sleeps until
the cleaner has made enough space available for the blocks to be
reserved. In this way large files can be written to the filesystem
(or, smaller files can be written to a nearly-full but thoroughly
clean filesystem) and the cleaner can still function properly.
* Remove explicit switching on dlfs_minfreeseg from the kernel code; it
is now merely a fs-creation parameter used to compute dlfs_avail and
dlfs_bfree (and used by fsck_lfs(8) to check their accuracy). Its
former role is better assumed by a properly computed dlfs_avail.
* Bounds-check inode numbers submitted through lfs_bmapv and lfs_markv.
This prevents a panic, but, if the cleaner is feeding the filesystem
the wrong data, you are still in a world of hurt.
* Cleanup: remove explicit references of DEV_BSIZE in favor of
btodb()/dbtob().
lfs_cleanerd:
* Make -n mean "send N segments' blocks through a single call to
lfs_markv". Previously it had meant "clean N segments though N calls
to lfs_markv, before looking again to see if more need to be cleaned".
The new behavior gives better packing of direct data on disk with as
little metadata as possible, largely alleviating the problem that the
cleaner can consume more disk through inefficient use of metadata than
it frees by moving dirty data away from clean "holes" to produce
entirely clean segments.
* Make -b mean "read as many segments as necessary to write N segments
of dirty data back to disk", rather than its former meaning of "read
as many segments as necessary to free N segments worth of space". The
new meaning, combined with the new -n behavior described above,
further aids in cleaning storage efficiency as entire segments can be
written at once, using as few blocks as possible for segment summaries
and inode blocks.
* Make the cleaner take note of segments which could not be cleaned due
to error, and not attempt to clean them until they are entirely free
of dirty blocks. This prevents the case in which a cleanerd running
with -n 1 and without -b (formerly the default) would spin trying
repeatedly to clean a corrupt segment, while the remaining space
filled and deadlocked the filesystem.
* Update the lfs_cleanerd manual page to describe all the options,
including the changes mentioned here (in particular, the -b and -n
flags were previously undocumented).
fsck_lfs:
* Check, and optionally fix, lfs_avail (to an exact figure) and
lfs_bfree (within a margin of error) in pass 5.
newfs_lfs:
* Reduce the default dlfs_minfreeseg to 1/20 of the total segments.
* Add a warning if the sgs disklabel field is 16 (the default for FFS'
cpg, but not usually desirable for LFS' sgs: 5--8 is a better range).
* Change the calculation of lfs_avail and lfs_bfree, corresponding to
the kernel changes mentioned above.
mount_lfs:
* Add -N and -b options to pass corresponding -n and -b options to
lfs_cleanerd.
* Default to calling lfs_cleanerd with "-b -n 4".
[All of these changes were largely tested in the 1.5 branch, with the
idea that they (along with previous un-pulled-up work) could be applied
to the branch while it was still in ALPHA2; however my test system has
experienced corruption on another filesystem (/dev/console has gone
missing :^), and, while I believe this unrelated to the LFS changes, I
cannot with good conscience request that the changes be pulled up.]
2000-09-09 08:49:54 +04:00
|
|
|
/* Theoretically this function never really does anything */
|
|
|
|
if (n != *count)
|
|
|
|
printf("lfs_countlocked: adjusted buf count from %d to %d\n",
|
|
|
|
*count, n);
|
|
|
|
if (size != *bytes)
|
|
|
|
printf("lfs_countlocked: adjusted byte count from %ld to %ld\n",
|
|
|
|
*bytes, size);
|
|
|
|
#endif
|
1999-03-10 03:20:00 +03:00
|
|
|
*count = n;
|
|
|
|
*bytes = size;
|
|
|
|
return;
|
1994-06-08 15:41:58 +04:00
|
|
|
}
|