NetBSD/share/man/man9/buffercache.9
hannken 5d2bff060a Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write.  Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn().  If set the caller
  intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
  may clear the buffer and runs copy-on-write.  Process possible errors
  from getblk() or fscow_run().  Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
2008-05-16 09:21:59 +00:00

384 lines
13 KiB
Groff

.\" $NetBSD: buffercache.9,v 1.23 2008/05/16 09:21:59 hannken Exp $
.\"
.\" Copyright (c)2003 YAMAMOTO Takashi,
.\" All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\"
.\" following copyright notices are from sys/kern/vfs_bio.c.
.\" they are here because i took some comments from it. yamt@NetBSD.org
.\"
.\"
.\"/*-
.\" * Copyright (c) 1982, 1986, 1989, 1993
.\" * The Regents of the University of California. All rights reserved.
.\" * (c) UNIX System Laboratories, Inc.
.\" * All or some portions of this file are derived from material licensed
.\" * to the University of California by American Telephone and Telegraph
.\" * Co. or Unix System Laboratories, Inc. and are reproduced herein with
.\" * the permission of UNIX System Laboratories, Inc.
.\" *
.\" * Redistribution and use in source and binary forms, with or without
.\" * modification, are permitted provided that the following conditions
.\" * are met:
.\" * 1. Redistributions of source code must retain the above copyright
.\" * notice, this list of conditions and the following disclaimer.
.\" * 2. Redistributions in binary form must reproduce the above copyright
.\" * notice, this list of conditions and the following disclaimer in the
.\" * documentation and/or other materials provided with the distribution.
.\" * 3. Neither the name of the University nor the names of its contributors
.\" * may be used to endorse or promote products derived from this software
.\" * without specific prior written permission.
.\" *
.\" * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
.\" * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
.\" * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" * SUCH DAMAGE.
.\" *
.\" * @(#)vfs_bio.c 8.6 (Berkeley) 1/11/94
.\" */
.\"
.\"/*-
.\" * Copyright (c) 1994 Christopher G. Demetriou
.\" *
.\" * Redistribution and use in source and binary forms, with or without
.\" * modification, are permitted provided that the following conditions
.\" * are met:
.\" * 1. Redistributions of source code must retain the above copyright
.\" * notice, this list of conditions and the following disclaimer.
.\" * 2. Redistributions in binary form must reproduce the above copyright
.\" * notice, this list of conditions and the following disclaimer in the
.\" * documentation and/or other materials provided with the distribution.
.\" * 3. All advertising materials mentioning features or use of this software
.\" * must display the following acknowledgement:
.\" * This product includes software developed by the University of
.\" * California, Berkeley and its contributors.
.\" * 4. Neither the name of the University nor the names of its contributors
.\" * may be used to endorse or promote products derived from this software
.\" * without specific prior written permission.
.\" *
.\" * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
.\" * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
.\" * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" * SUCH DAMAGE.
.\" *
.\" * @(#)vfs_bio.c 8.6 (Berkeley) 1/11/94
.\" */
.\"
.\"
.\" ------------------------------------------------------------
.Dd May 16, 2008
.Dt BUFFERCACHE 9
.Os
.Sh NAME
.Nm buffercache ,
.Nm bread ,
.Nm breada ,
.Nm breadn ,
.Nm bwrite ,
.Nm bawrite ,
.Nm bdwrite ,
.Nm getblk ,
.Nm geteblk ,
.Nm incore ,
.Nm allocbuf ,
.Nm brelse ,
.Nm biodone ,
.Nm biowait
.Nd buffer cache interfaces
.\" ------------------------------------------------------------
.Sh SYNOPSIS
.In sys/buf.h
.Ft int
.Fn bread "struct vnode *vp" "daddr_t blkno" "int size" \
"struct kauth_cred *cred" "int flags" "struct buf **bpp"
.Ft int
.Fn breadn "struct vnode *vp" "daddr_t blkno" "int size" \
"daddr_t rablks[]" "int rasizes[]" "int nrablks" \
"struct kauth_cred *cred" "int flags" "struct buf **bpp"
.Ft int
.Fn breada "struct vnode *vp" "daddr_t blkno" "int size" \
"daddr_t rablkno" "int rabsize" \
"struct kauth_cred *cred" "int flags" "struct buf **bpp"
.Ft int
.Fn bwrite "struct buf *bp"
.Ft void
.Fn bawrite "struct buf *bp"
.Ft void
.Fn bdwrite "struct buf *bp"
.Ft struct buf *
.Fn getblk "struct vnode *vp" "daddr_t blkno" "int size" \
"int slpflag" "int slptimeo"
.Ft struct buf *
.Fn geteblk "int size"
.Ft struct buf *
.Fn incore "struct vnode *vp" "daddr_t blkno"
.Ft void
.Fn allocbuf "struct buf *bp" "int size" "int preserve"
.Ft void
.Fn brelse "struct buf *bp"
.Ft void
.Fn biodone "struct buf *bp"
.Ft int
.Fn biowait "struct buf *bp"
.\" ------------------------------------------------------------
.Sh DESCRIPTION
The
.Nm
interface is used by each filesystems to improve I/O performance using
in-core caches of filesystem blocks.
.Pp
The kernel memory used to cache a block is called a buffer and
described by a
.Em buf
structure.
In addition to describing a cached block, a
.Em buf
structure is also used to describe an I/O request as a part of
the disk driver interface.
.\" XXX struct buf, B_ flags, MP locks, etc
.\" ------------------------------------------------------------
.Sh FUNCTIONS
.Bl -tag -width compact
.It Fn bread "vp" "blkno" "size" "cred" "flags" "bpp"
Read a block corresponding to
.Fa vp
and
.Fa blkno .
The buffer is returned via
.Fa bpp .
The units of
.Fa blkno
are specifically the units used by the
.Fn VOP_STRATEGY
routine for the
.Fa vp
vnode.
For device special files,
.Fa blkno
is in units of
.Dv DEV_BSIZE
and both
.Fa blkno
and
.Fa size
must be multiples of the underlying device's block size.
For other files,
.Fa blkno
is in units chosen by the file system containing
.Fa vp .
.Pp
If the buffer is not found (i.e. the block is not cached in memory),
.Fn bread
allocates a buffer with enough pages for
.Fa size
and reads the specified disk block into it using
credential
.Fa cred .
.Pp
The buffer returned by
.Fn bread
is marked as busy.
(The
.Dv B_BUSY
flag is set.)
After manipulation of the buffer returned from
.Fn bread ,
the caller should unbusy it so that another thread can get it.
If the buffer contents are modified and should be written back to disk,
it should be unbusied using one of variants of
.Fn bwrite .
Otherwise, it should be unbusied using
.Fn brelse .
.\" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
.It Fn breadn "vp" "blkno" "size" "rablks" "rasizes" "nrablks" "cred" "flags" \
"bpp"
Get a buffer as
.Fn bread .
In addition,
.Fn breadn
will start read-ahead of blocks specified by
.Fa rablks ,
.Fa rasizes ,
.Fa nrablks .
.\" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
.It Fn breada "vp" "blkno" "size" "rablkno" "rabsize" "cred" "flags" "bpp"
Same as
.Fn breadn
with single block read-ahead.
This function is for compatibility with old filesystem code and
shouldn't be used by new ones.
.\" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
.It Fn bwrite "bp"
Write a block.
Start I/O for write using
.Fn VOP_STRATEGY .
Then, unless the
.Dv B_ASYNC
flag is set in
.Fa bp ,
.Fn bwrite
waits for the I/O to complete.
.\" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
.It Fn bawrite "bp"
Write a block asynchronously.
Set the
.Dv B_ASYNC
flag in
.Fa bp
and simply call
.Fn VOP_BWRITE ,
which results in
.Fn bwrite
for most filesystems.
.\" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
.It Fn bdwrite "bp"
Delayed write.
Unlike
.Fn bawrite ,
.Fn bdwrite
won't start any I/O.
It only marks the buffer as dirty
.Pq Dv B_DELWRI
and unbusy it.
.\" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
.It Fn getblk "vp" "blkno" "size" "slpflag" "slptimeo"
Get a block of requested size
.Fa size
that is associated with a given vnode and block
offset, specified by
.Fa vp
and
.Fa blkno .
If it is found in the block cache, make it busy and return it.
Otherwise, return an empty block of the correct size.
It is up to the caller to ensure that the cached blocks
are of the correct size.
.Pp
If
.Fn getblk
needs to sleep,
.Fa slpflag
and
.Fa slptimeo
are used as arguments for
.Fn cv_timedwait .
.\" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
.It Fn geteblk "size"
Allocate an empty, disassociated block of a given size
.Fa size .
.\" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
.It Fn incore "vp" "blkno"
Determine if a block associated to a given vnode and block offset
is in the cache.
If it is there, return a pointer to it.
Note that
.Fn incore
doesn't busy the buffer unlike
.Fn getblk .
.\" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
.It Fn allocbuf "bp" "size" "preserve"
Expand or contract the actual memory allocated to a buffer.
If
.Fa preserve
is zero, the entire data in the buffer will be lost.
Otherwise, if the buffer shrinks, the truncated part of the data
is lost, so it is up to the caller to have written
it out
.Em first
if needed; this routine will not start a write.
If the buffer grows, it is the callers responsibility to fill out
the buffer's additional contents.
.\" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
.It Fn brelse "bp"
Unbusy a buffer and release it to the free lists.
.\" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
.It Fn biodone "bp"
Mark I/O complete on a buffer.
If a callback has been requested by
.Dv B_CALL ,
do so.
Otherwise, wakeup waiters.
.\" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
.It Fn biowait "bp"
Wait for operations on the buffer to complete.
When they do, extract and return the I/O's error value.
.El
.\" ------------------------------------------------------------
.Sh CODE REFERENCES
This section describes places within the
.Nx
source tree where actual code implementing the buffer cache subsystem
can be found.
All pathnames are relative to
.Pa /usr/src .
.Pp
The buffer cache subsystem is implemented within the file
.Pa sys/kern/vfs_bio.c .
.Sh SEE ALSO
.Xr intro 9 ,
.Xr vnode 9
.Rs
.%A Maurice J. Bach
.%B "The Design of the UNIX Operating System"
.%I "Prentice Hall"
.%D 1986
.Re
.Rs
.%A Marshall Kirk McKusick
.%A Keith Bostic
.%A Michael J. Karels
.%A John S. Quarterman
.%B "The Design and Implementation of the 4.4BSD Operating System"
.%I "Addison Wesley"
.%D 1996
.Re
.\" ------------------------------------------------------------
.Sh BUGS
In the current implementation,
.Fn bread
and its variants
don't use a specified credential.
.Pp
Because
.Fn biodone
and
.Fn biowait
do not really belong to
.Nm ,
they shouldn't be documented here.