NetBSD/share/man/man9/vnode.9

792 lines
21 KiB
Groff

.\" $NetBSD: vnode.9,v 1.82 2019/01/01 10:06:54 hannken Exp $
.\"
.\" Copyright (c) 2001, 2005, 2006 The NetBSD Foundation, Inc.
.\" All rights reserved.
.\"
.\" This code is derived from software contributed to The NetBSD Foundation
.\" by Gregory McGarry.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
.\" POSSIBILITY OF SUCH DAMAGE.
.\"
.Dd January 1, 2019
.Dt VNODE 9
.Os
.Sh NAME
.Nm vnode ,
.Nm vref ,
.Nm vrele ,
.Nm vrele_async ,
.Nm vput ,
.Nm vhold ,
.Nm holdrele ,
.Nm vcache_get ,
.Nm vcache_new ,
.Nm vcache_rekey_enter ,
.Nm vcache_rekey_exit ,
.Nm vrecycle ,
.Nm vgone ,
.Nm vgonel ,
.Nm vdead_check ,
.Nm vflush ,
.Nm vaccess ,
.Nm bdevvp ,
.Nm cdevvp ,
.Nm vfinddev ,
.Nm vdevgone ,
.Nm vwakeup ,
.Nm vflushbuf ,
.Nm vinvalbuf ,
.Nm vtruncbuf ,
.Nm vprint
.Nd kernel representation of a file or directory
.Sh SYNOPSIS
.In sys/param.h
.In sys/vnode.h
.Ft void
.Fn vref "struct vnode *vp"
.Ft void
.Fn vrele "struct vnode *vp"
.Ft void
.Fn vrele_async "struct vnode *vp"
.Ft void
.Fn vput "struct vnode *vp"
.Ft void
.Fn vhold "struct vnode *vp"
.Ft void
.Fn holdrele "struct vnode *vp"
.Ft int
.Fn vcache_get "struct mount *mp" "const void *key" "size_t key_len" "struct vnode **vpp"
.Ft int
.Fn vcache_new "struct mount *mp" "struct vnode *dvp" "struct vattr *vap" "kauth_cred_t cred" "void *extra" "struct vnode **vpp"
.Ft int
.Fn vcache_rekey_enter "struct mount *mp" "struct vnode *vp" "const void *old_key" "size_t old_key_len" "const void *new_key" "size_t new_key_len"
.Ft void
.Fn vcache_rekey_exit "struct mount *mp" "struct vnode *vp" "const void *old_key" "size_t old_key_len" "const void *new_key" "size_t new_key_len"
.Ft int
.Fn vrecycle "struct vnode *vp"
.Ft void
.Fn vgone "struct vnode *vp"
.Ft void
.Fn vgonel "struct vnode *vp" "struct lwp *l"
.Ft int
.Fn vdead_check "struct vnode *vp" "int flags"
.Ft int
.Fn vflush "struct mount *mp" "struct vnode *skipvp" "int flags"
.Ft int
.Fn vaccess "enum vtype type" "mode_t file_mode" "uid_t uid" "gid_t gid" "mode_t acc_mode" "kauth_cred_t cred"
.Ft int
.Fn bdevvp "dev_t dev" "struct vnode **vpp"
.Ft int
.Fn cdevvp "dev_t dev" "struct vnode **vpp"
.Ft int
.Fn vfinddev "dev_t dev" "enum vtype" "struct vnode **vpp"
.Ft void
.Fn vdevgone "int maj" "int minl" "int minh" "enum vtype type"
.Ft void
.Fn vwakeup "struct buf *bp"
.Ft int
.Fn vflushbuf "struct vnode *vp" "int sync"
.Ft int
.Fn vinvalbuf "struct vnode *vp" "int flags" "kauth_cred_t cred" "struct lwp *l" "int slpflag" "int slptimeo"
.Ft int
.Fn vtruncbuf "struct vnode *vp" "daddr_t lbn" "int slpflag" "int slptimeo"
.Ft void
.Fn vprint "const char *label" "struct vnode *vp"
.Sh DESCRIPTION
A
.Em vnode
represents an on-disk file in use by the system.
Each
.Xr vfs 9
file system provides a set of
.Xr vnodeops 9
operations on vnodes, invoked by file-system-independent system calls
and supported by file-system-independent library routines.
.Pp
Each mounted file system provides a vnode for the root of the file
system, via
.Xr VFS_ROOT 9 .
Other vnodes are obtained by
.Xr VOP_LOOKUP 9 .
Users of vnodes usually invoke these indirectly via
.Xr namei 9
to obtain vnodes from paths.
.Pp
Each file system usually maintains a cache mapping recently used inode
numbers, or the equivalent, to vnodes, and a cache mapping recently
used file names to vnodes.
If memory is scarce, the system may decide to
.Em reclaim
an unused cached vnode, calling
.Xr VOP_RECLAIM 9
to remove it from the caches and to free file-system-specific memory
associated with it.
A file system may also choose to immediately reclaim a cached vnode
once it is unused, in
.Xr VOP_INACTIVE 9 ,
if the vnode has been deleted on disk.
.Pp
When a file system retrieves a vnode from a cache, the vnode may not
have any users, and another thread in the system may be simultaneously
deciding to reclaim it.
Thus, to retrieve a vnode from a cache, one must use
.Fn vcache_get ,
not
.Fn vref ,
to acquire the first reference.
.Pp
The vnode has the following structure:
.Bd -literal
struct vnode {
struct uvm_object v_uobj; /* the VM object */
kcondvar_t v_cv; /* synchronization */
voff_t v_size; /* size of file */
voff_t v_writesize; /* new size after write */
int v_iflag; /* VI_* flags */
int v_vflag; /* VV_* flags */
int v_uflag; /* VU_* flags */
int v_numoutput; /* # of pending writes */
int v_writecount; /* ref count of writers */
int v_holdcnt; /* page & buffer refs */
struct mount *v_mount; /* ptr to vfs we are in */
int (**v_op)(void *); /* vnode operations vector */
struct buflists v_cleanblkhd; /* clean blocklist head */
struct buflists v_dirtyblkhd; /* dirty blocklist head */
union {
struct mount *vu_mountedhere;/* ptr to vfs (VDIR) */
struct socket *vu_socket; /* unix ipc (VSOCK) */
struct specnode *vu_specnode; /* device (VCHR, VBLK) */
struct fifoinfo *vu_fifoinfo; /* fifo (VFIFO) */
struct uvm_ractx *vu_ractx; /* read-ahead ctx (VREG) */
} v_un;
enum vtype v_type; /* vnode type */
enum vtagtype v_tag; /* type of underlying data */
void *v_data; /* private data for fs */
struct klist v_klist; /* notes attached to vnode */
};
.Ed
.Pp
Most members of the vnode structure should be treated as opaque and
only manipulated using the proper functions.
There are some rather common exceptions detailed throughout this page.
.Pp
Files and file systems are inextricably linked with the virtual memory
system and
.Em v_uobj
contains the data maintained by the virtual memory system.
For compatibility with code written before the integration of
.Xr uvm 9
into
.Nx ,
C-preprocessor directives are used to alias the members of
.Em v_uobj .
.Pp
Vnode flags are recorded by
.Em v_iflag ,
.Em v_vflag
and
.Em v_uflag .
Valid flags are:
.Pp
.Bl -tag -offset indent -width ".Dv VI_WRMAPDIRTY" -compact
.It Dv VV_ROOT
This vnode is the root of its file system.
.It Dv VV_SYSTEM
This vnode is being used by the kernel; only used to skip quota files in
.Fn vflush .
.It Dv VV_ISTTY
This vnode represents a tty; used when reading dead vnodes.
.It Dv VV_MAPPED
This vnode might have user mappings.
.It Dv VV_MPSAFE
This file system is MP safe.
.It Dv VV_LOCKSWORK
This vnode's file system supports locking.
.It Dv VI_TEXT
This vnode is a pure text prototype.
.It Dv VI_EXECMAP
This vnode has executable mappings.
.It Dv VI_WRMAP
This vnode might have PROT_WRITE user mappings.
.It Dv VI_WRMAPDIRTY
This vnode might have dirty pages due to
.Dv VWRITEMAP .
.It Dv VI_XLOCK
This vnode is currently locked to change underlying type.
.It Dv VI_ONWORKLST
This vnode is on syncer work-list.
.It Dv VI_MARKER
A dummy marker vnode.
.It Dv VI_CLEAN
This vnode has been reclaimed and is no longer attached to a file system.
.It Dv VU_DIROP
This vnode is involved in a directory operation.
This flag is used exclusively by LFS.
.El
.Pp
The
.Dv VI_XLOCK
flag is used to prevent multiple processes from entering
the vnode reclamation code.
It is also used as a flag to indicate that reclamation is in progress.
Before
.Em v_iflag
can be modified, the
.Em v_interlock
mutex must be acquired.
See
.Xr lock 9
for details on the kernel locking API.
.Pp
Each vnode has three reference counts:
.Em v_usecount ,
.Em v_writecount
and
.Em v_holdcnt .
The first is the number of active references within the
kernel to the vnode.
This count is maintained by
.Fn vref ,
.Fn vrele ,
.Fn vrele_async ,
and
.Fn vput .
The second is the number of active references within the kernel to the
vnode performing write access to the file.
It is maintained by the
.Xr open 2
and
.Xr close 2
system calls.
The third is the number of references within the kernel
requiring the vnode to remain active and not be recycled.
This count is maintained by
.Fn vhold
and
.Fn holdrele .
When both the
.Em v_usecount
and
.Em v_holdcnt
reach zero, the vnode is cached.
The transition from the cache is handled by a kernel thread and
.Fn vrecycle .
Access to
.Em v_usecount ,
.Em v_writecount
and
.Em v_holdcnt
is also protected by the
.Em v_interlock
mutex.
.Pp
The number of pending synchronous and asynchronous writes on the
vnode are recorded in
.Em v_numoutput .
It is used by
.Xr fsync 2
to wait for all writes to complete before returning to the user.
Its value must only be modified at splbio (see
.Xr spl 9 ) .
It does not track the number of dirty buffers attached to the
vnode.
.Pp
The link to the file system which owns the vnode is recorded by
.Em v_mount .
See
.Xr vfsops 9
for further information of file system mount status.
.Pp
The
.Em v_op
pointer points to its vnode operations vector.
This vector describes what operations can be done to the file associated
with the vnode.
The system maintains one vnode operations vector for each file system
type configured into the kernel.
The vnode operations vector contains a pointer to a function for
each operation supported by the file system.
See
.Xr vnodeops 9
for a description of vnode operations.
.Pp
When a user wants a new vnode for another file or wants a valid vnode
which is cached,
.Fn vcache_get
or
.Fn vcache_new
is invoked to allocate a vnode and initialize it for the new file.
.Pp
The type of object the vnode represents is recorded by
.Em v_type .
It is used by generic code to perform checks to ensure operations are
performed on valid file system objects.
Valid types are:
.Pp
.Bl -tag -offset indent -width VFIFO -compact
.It Dv VNON
The vnode has no type.
.It Dv VREG
The vnode represents a regular file.
.It Dv VDIR
The vnode represents a directory.
.It Dv VBLK
The vnode represents a block special device.
.It Dv VCHR
The vnode represents a character special device.
.It Dv VLNK
The vnode represents a symbolic link.
.It Dv VSOCK
The vnode represents a socket.
.It Dv VFIFO
The vnode represents a pipe.
.It Dv VBAD
The vnode represents a bad file (not currently used).
.El
.Pp
Vnode tag types are used by external programs only (e.g.,
.Xr pstat 8 ) ,
and should never be inspected by the kernel.
Its use is deprecated
since new
.Em v_tag
values cannot be defined for loadable file systems.
The
.Em v_tag
member is read-only.
Valid tag types are:
.Pp
.Bl -tag -offset indent -width "VT_FILECORE " -compact
.It Dv VT_NON
non file system
.It Dv VT_UFS
universal file system
.It Dv VT_NFS
network file system
.It Dv VT_MFS
memory file system
.It Dv VT_MSDOSFS
FAT file system
.It Dv VT_LFS
log-structured file system
.It Dv VT_LOFS
loopback file system
.It Dv VT_FDESC
file descriptor file system
.It Dv VT_NULL
null file system layer
.It Dv VT_UMAP
uid/gid remapping file system layer
.It Dv VT_KERNFS
kernel interface file system
.It Dv VT_PROCFS
process interface file system
.It Dv VT_AFS
AFS file system
.It Dv VT_ISOFS
ISO 9660 file system(s)
.It Dv VT_UNION
union file system
.It Dv VT_ADOSFS
Amiga file system
.It Dv VT_EXT2FS
Linux's ext2 file system
.It Dv VT_CODA
Coda file system
.It Dv VT_FILECORE
filecore file system
.It Dv VT_NTFS
Microsoft NT's file system
.It Dv VT_VFS
virtual file system
.It Dv VT_OVERLAY
overlay file system
.It Dv VT_SMBFS
SMB file system
.It Dv VT_PTYFS
pseudo-terminal device file system
.It Dv VT_TMPFS
efficient memory file system
.It Dv VT_UDF
universal disk format file system
.It Dv VT_SYSVBFS
systemV boot file system
.El
.Pp
The vnode lock is acquired by calling
.Xr vn_lock 9
and released by calling
.Xr VOP_UNLOCK 9 .
The reason for this asymmetry is that
.Xr vn_lock 9
is a wrapper for
.Xr VOP_LOCK 9
with extra checks, while the unlocking step usually does not need
additional checks and thus has no wrapper.
.Pp
The vnode locking operation is complicated because it is used for many
purposes.
Sometimes it is used to bundle a series of vnode operations (see
.Xr vnodeops 9 )
into an atomic group.
Many file systems rely on it to prevent race conditions in updating
file system type specific data structures rather than using their
own private locks.
The vnode lock can operate as a multiple-reader (shared-access lock)
or single-writer lock (exclusive access lock), however many current file
system implementations were written assuming only single-writer
locking.
Multiple-reader locking functions equivalently only in the presence
of big-lock SMP locking or a uni-processor machine.
The lock may be held while sleeping.
While the vnode lock is acquired, the holder is guaranteed that the
vnode will not be reclaimed or invalidated.
Most file system functions require that you hold the vnode lock on entry.
See
.Xr lock 9
for details on the kernel locking API.
.Pp
Each file system underlying a vnode allocates its own private area and
hangs it from
.Em v_data .
.Pp
Most functions discussed in this page that operate on vnodes cannot be
called from interrupt context.
The members
.Em v_numoutput ,
.Em v_holdcnt ,
.Em v_dirtyblkhd ,
and
.Em v_cleanblkhd
are modified in interrupt context and must be protected by
.Xr splbio 9
unless it is certain that there is no chance an interrupt handler will
modify them.
The vnode lock must not be acquired within interrupt context.
.Sh FUNCTIONS
.Bl -tag -width compact
.It Fn vref "vp"
Increment
.Em v_usecount
of the vnode
.Em vp .
Any kernel thread system which uses a vnode (e.g., during the operation
of some algorithm or to store in a data structure) should call
.Fn vref .
.It Fn vrele "vp"
Decrement
.Em v_usecount
of unlocked vnode
.Em vp .
Any code in the system which is using a vnode should call
.Fn vrele
when it is finished with the vnode.
If
.Em v_usecount
of the vnode reaches zero and
.Em v_holdcnt
is greater than zero, the vnode is placed on the holdlist.
If both
.Em v_usecount
and
.Em v_holdcnt
are zero, the vnode is cached.
.It Fn vrele_async "vp"
Will asynchronously release the vnode in different context than the caller,
sometime after the call.
.It Fn vput "vp"
Legacy convenience routine for unlocking and releasing
.Fa vp .
Equivalent to:
.Bd -literal -offset abcd
.No "VOP_UNLOCK(" Ns Fa vp Ns Li ");"
.No "vrele(" Ns Fa vp Ns Li ");"
.Ed
.Pp
New code should prefer using
.Xr VOP_UNLOCK 9
and
.Fn vrele
directly.
.It Fn vhold "vp"
Mark the vnode
.Fa vp
as active by incrementing
.Em vp->v_holdcnt .
Once held, the vnode will not be recycled until it is
released with
.Fn holdrele .
.It Fn holdrele "vp"
Mark the vnode
.Fa vp
as inactive by decrementing
.Em vp->v_holdcnt .
.It Fn vcache_get "mp" "key" "key_len" "vpp"
Allocate a new vnode.
The new vnode is returned referenced in the address specified by
.Fa vpp .
.Pp
The argument
.Fa mp
is the mount point for the file system to lookup the file in.
.Pp
The arguments
.Fa key
and
.Fa key_len
uniquely identify the file in the file system.
.Pp
If a vnode is successfully retrieved zero is returned, otherwise an
appropriate error code is returned.
.It Fn vcache_new "mp" "dvp" "vap" "cred" "vpp"
Allocate a new vnode with a new file.
The new vnode is returned referenced in the address specified by
.Fa vpp .
.Pp
The argument
.Fa mp
is the mount point for the file system to create the file in.
.Pp
The argument
.Fa dvp
points to the directory to create the file in.
.Pp
The argument
.Fa vap
points to the attributes for the file to create.
.Pp
The argument
.Fa cred
holds the credentials for the file to create.
.Pp
The argument
.Fa extra
allows the caller to pass more information about the file to create.
.Pp
If a vnode is successfully created zero is returned, otherwise an
appropriate error code is returned.
.It Fn vcache_rekey_enter "mp" "vp" "old_key" "old_key_len" "new_key" "new_key_len"
Prepare to change the key of a cached vnode.
.Pp
The argument
.Fa mp
is the mount point for the file system the vnode
.Fa vp
resides in.
.Pp
The arguments
.Fa old_key
and
.Fa old_key_len
identify the cached vnode.
.Pp
The arguments
.Fa new_key
and
.Fa new_key_len
will identify the vnode after rename.
.Pp
If the new key already exists
.Er EEXIST
is returned, otherwise zero is returned.
.It Fn vcache_rekey_exit "mp" "vp" "old_key" "old_key_len" "new_key" "new_key_len"
Finish rename after calling
.Fn vcache_rekey_enter .
.It Fn vrecycle "vp"
Recycle the referenced vnode
.Fa vp
if this is the last reference.
.Fn vrecycle
is a null operation if the reference count is greater than one.
.It Fn vgone "vp"
Eliminate all activity associated with the unlocked vnode
.Fa vp
in preparation for recycling.
This operation is restricted to suspended file systems.
See
.Xr vfs_suspend 9 .
.It Fn vgonel "vp" "p"
Eliminate all activity associated with the locked vnode
.Fa vp
in preparation for recycling.
.It Fn vdead_check "vp" "flags"
Check the vnode
.Fa vp
for being or becoming dead.
Returns
.Er ENOENT
for a dead vnode and zero otherwise.
If
.Fa flags
is
.Dv VDEAD_NOWAIT
it will return
.Er EBUSY
if the vnode is becoming dead and the function will not sleep.
.Pp
Whenever this function returns a non-zero value all future calls
for this
.Fa vp
will also return a non-zero value.
.It Fn vflush "mp" "skipvp" "flags"
Remove any vnodes in the vnode table belonging to mount point
.Fa mp .
If
.Fa skipvp
is not
.Dv NULL
it is exempt from being flushed.
The argument
.Fa flags
is a set of flags modifying the operation of
.Fn vflush .
If
.Dv FORCECLOSE
is not specified, there should not be any active vnodes and
the error
.Er EBUSY
is returned if any are found (this is a user error, not a system error).
If
.Dv FORCECLOSE
is specified, active vnodes that are found are detached.
If
.Dv WRITECLOSE
is set, only flush out regular file vnodes open for writing.
SKIPSYSTEM causes any vnodes marked
.Dv V_SYSTEM
to be skipped.
.It Fn vaccess "type" "file_mode" "uid" "gid" "acc_mode" "cred"
Do access checking by comparing the file's permissions to the caller's
desired access type
.Fa acc_mode
and credentials
.Fa cred .
.It Fn bdevvp "dev" "vpp"
Create a vnode for a block device.
.Fn bdevvp
is used for root file systems, swap areas and for memory file system
special devices.
.It Fn cdevvp "dev" "vpp"
Create a vnode for a character device.
.Fn cdevvp
is used for the console and kernfs special devices.
.It Fn vfinddev "dev" "vtype" "vpp"
Lookup a vnode by device number.
The vnode is referenced and returned in the address specified by
.Fa vpp .
.It Fn vdevgone "int maj" "int min" "int minh" "enum vtype type"
Reclaim all vnodes that correspond to the specified minor number range
.Fa minl
to
.Fa minh
(endpoints inclusive) of the specified major
.Fa maj .
.It Fn vwakeup "bp"
Update outstanding I/O count
.Em vp->v_numoutput
for the vnode
.Em bp->b_vp
and do a wakeup if requested and
.Em vp->vflag
has
.Dv VBWAIT
set.
.It Fn vflushbuf "vp" "sync"
Flush all dirty buffers to disk for the file with the locked vnode
.Fa vp .
The argument
.Fa sync
specifies whether the I/O should be synchronous and
.Fn vflushbuf
will sleep until
.Em vp->v_numoutput
is zero and
.Em vp->v_dirtyblkhd
is empty.
.It Fn vinvalbuf "vp" "flags" "cred" "l" "slpflag" "slptimeo"
Flush out and invalidate all buffers associated with locked vnode
.Fa vp .
The argument
.Fa l
and
.Fa cred
specified the calling process and its credentials.
The
.Xr ltsleep 9
flag and timeout are specified by the arguments
.Fa slpflag
and
.Fa slptimeo
respectively.
If the operation is successful zero is returned, otherwise an
appropriate error code is returned.
.It Fn vtruncbuf "vp" "lbn" "slpflag" "slptimeo"
Destroy any in-core buffers past the file truncation length for the
locked vnode
.Fa vp .
The truncation length is specified by
.Fa lbn .
.Fn vtruncbuf
will sleep while the I/O is performed, The
.Xr ltsleep 9
flag and timeout are specified by the arguments
.Fa slpflag
and
.Fa slptimeo
respectively.
If the operation is successful zero is returned, otherwise an
appropriate error code is returned.
.It Fn vprint "label" "vp"
This function is used by the kernel to dump vnode information during a
panic.
It is only used if the kernel option DIAGNOSTIC is compiled into the kernel.
The argument
.Fa label
is a string to prefix the information dump of vnode
.Fa vp .
.El
.Sh CODE REFERENCES
The vnode framework is implemented within the file
.Pa sys/kern/vfs_subr.c .
.Sh SEE ALSO
.Xr intro 9 ,
.Xr lock 9 ,
.Xr namecache 9 ,
.Xr namei 9 ,
.Xr uvm 9 ,
.Xr vattr 9 ,
.Xr vfs 9 ,
.Xr vfsops 9 ,
.Xr vnodeops 9 ,
.Xr vnsubr 9
.Sh BUGS
The locking protocol is inconsistent.
Many vnode operations are passed locked vnodes on entry but release
the lock before they exit.
The locking protocol is used in some places to attempt to make a
series of operations atomic (e.g., access check then operation).
This does not work for non-local file systems that do not support locking
(e.g., NFS).
The
.Nm
interface would benefit from a simpler locking protocol.