394 lines
11 KiB
Groff
394 lines
11 KiB
Groff
.\" $NetBSD: kmem.9,v 1.23 2017/11/07 18:36:27 christos Exp $
|
|
.\"
|
|
.\" Copyright (c)2006 YAMAMOTO Takashi,
|
|
.\" All rights reserved.
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
|
|
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
|
|
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
.\" SUCH DAMAGE.
|
|
.\"
|
|
.\" ------------------------------------------------------------
|
|
.Dd November 7, 2017
|
|
.Dt KMEM 9
|
|
.Os
|
|
.\" ------------------------------------------------------------
|
|
.Sh NAME
|
|
.Nm kmem
|
|
.Nd kernel wired memory allocator
|
|
.\" ------------------------------------------------------------
|
|
.Sh SYNOPSIS
|
|
.In sys/kmem.h
|
|
.\" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
|
.Ft void *
|
|
.Fn kmem_alloc \
|
|
"size_t size" "km_flag_t kmflags"
|
|
.Ft void *
|
|
.Fn kmem_zalloc \
|
|
"size_t size" "km_flag_t kmflags"
|
|
.Ft void
|
|
.Fn kmem_free \
|
|
"void *p" "size_t size"
|
|
.\" ---
|
|
.Ft void *
|
|
.Fn kmem_intr_alloc \
|
|
"size_t size" "km_flag_t kmflags"
|
|
.Ft void *
|
|
.Fn kmem_intr_zalloc \
|
|
"size_t size" "km_flag_t kmflags"
|
|
.Ft void
|
|
.Fn kmem_intr_free \
|
|
"void *p" "size_t size"
|
|
.\" ---
|
|
.Ft char *
|
|
.Fn kmem_asprintf \
|
|
"const char *fmt" "..."
|
|
.\" ---
|
|
.Ft char *
|
|
.Fn kmem_strdupsize \
|
|
"const char *str" "size_t *size" "km_flag_t kmflags"
|
|
.Ft void
|
|
.Fn kmem_strfree \
|
|
"char *str"
|
|
.\" ------------------------------------------------------------
|
|
.Pp
|
|
.Cd "options KMEM_SIZE"
|
|
.Cd "options KMEM_REDZONE"
|
|
.Cd "options KMEM_GUARD"
|
|
.Sh DESCRIPTION
|
|
.Fn kmem_alloc
|
|
allocates kernel wired memory.
|
|
It takes the following arguments.
|
|
.Bl -tag -width kmflags
|
|
.It Fa size
|
|
Specify the size of allocation in bytes.
|
|
.It Fa kmflags
|
|
Either of the following:
|
|
.Bl -tag -width KM_NOSLEEP
|
|
.It Dv KM_SLEEP
|
|
If the allocation cannot be satisfied immediately, sleep until enough
|
|
memory is available.
|
|
If
|
|
.Dv KM_SLEEP
|
|
is specified, then the allocation cannot fail.
|
|
.It Dv KM_NOSLEEP
|
|
Don't sleep.
|
|
Immediately return
|
|
.Dv NULL
|
|
if there is not enough memory available.
|
|
It should only be used when failure to allocate will not have harmful,
|
|
user-visible effects.
|
|
.Pp
|
|
.Bf -symbolic
|
|
Use of
|
|
.Dv KM_NOSLEEP
|
|
is strongly discouraged as it can create transient, hard to debug failures
|
|
that occur when the system is under memory pressure.
|
|
.Ef
|
|
.Pp
|
|
In situations where it is not possible to sleep, for example because locks
|
|
are held by the caller, the code path should be restructured to allow the
|
|
allocation to be made in another place.
|
|
.El
|
|
.El
|
|
.Pp
|
|
The contents of allocated memory are uninitialized.
|
|
.Pp
|
|
Unlike Solaris, kmem_alloc(0, flags) is illegal.
|
|
.Pp
|
|
.\" ------------------------------------------------------------
|
|
.Fn kmem_zalloc
|
|
is the equivalent of
|
|
.Fn kmem_alloc ,
|
|
except that it initializes the memory to zero.
|
|
.Pp
|
|
.\" ------------------------------------------------------------
|
|
.Fn kmem_asprintf
|
|
functions as the well known
|
|
.Fn asprintf
|
|
function, but allocates memory using
|
|
.Fn kmem_alloc .
|
|
This routine can sleep during allocation.
|
|
The size of the allocated area is the length of the returned character string, plus one (for the NUL terminator).
|
|
This must be taken into consideration when freeing the returned area with
|
|
.Fn kmem_free .
|
|
.Pp
|
|
.\" ------------------------------------------------------------
|
|
.Fn kmem_free
|
|
frees kernel wired memory allocated by
|
|
.Fn kmem_alloc
|
|
or
|
|
.Fn kmem_zalloc
|
|
so that it can be used for other purposes.
|
|
It takes the following arguments.
|
|
.Bl -tag -width kmflags
|
|
.It Fa p
|
|
The pointer to the memory being freed.
|
|
It must be the one returned by
|
|
.Fn kmem_alloc
|
|
or
|
|
.Fn kmem_zalloc .
|
|
.It Fa size
|
|
The size of the memory being freed, in bytes.
|
|
It must be the same as the
|
|
.Fa size
|
|
argument used for
|
|
.Fn kmem_alloc
|
|
or
|
|
.Fn kmem_zalloc
|
|
when the memory was allocated.
|
|
.El
|
|
.Pp
|
|
Freeing
|
|
.Dv NULL
|
|
is illegal.
|
|
.Pp
|
|
.\" ------------------------------------------------------------
|
|
.Fn kmem_intr_alloc ,
|
|
.Fn kmem_intr_zalloc
|
|
and
|
|
.Fn kmem_intr_free
|
|
are the equivalents of the above kmem routines which can be called
|
|
from the interrupt context.
|
|
These routines are for the special cases.
|
|
Normally,
|
|
.Xr pool_cache 9
|
|
should be used for memory allocation from interrupt context.
|
|
.Pp
|
|
The
|
|
.Fn kmem_strdupsize
|
|
function is a utility function that can be used to copy the string in the
|
|
.Fa str
|
|
argument to a new buffer allocated using
|
|
.Fn kmem_alloc
|
|
and optionally return the size of the allocation (the length of the string
|
|
plus the trailing
|
|
.Dv NUL )
|
|
in the
|
|
.Fa size
|
|
argument if that is not
|
|
.Dv NULL .
|
|
.Pp
|
|
The
|
|
.Fn kmem_strfree
|
|
function can be used to free a
|
|
.Dv NUL
|
|
terminated string computing the length of the string using
|
|
.Xr strlen 3
|
|
and adding one for the
|
|
.Dv NUL
|
|
and then using
|
|
.Fn kmem_free .
|
|
.\" ------------------------------------------------------------
|
|
.Sh NOTES
|
|
Making
|
|
.Dv KM_SLEEP
|
|
allocations while holding mutexes or reader/writer locks is discouraged, as the
|
|
caller can sleep for an unbounded amount of time in order to satisfy the
|
|
allocation.
|
|
This can in turn block other threads that wish to acquire locks held by the
|
|
caller.
|
|
It should be noted that
|
|
.Fn kmem_free
|
|
may also block.
|
|
.Pp
|
|
For some locks this is permissible or even unavoidable.
|
|
For others, particularly locks that may be taken from soft interrupt context,
|
|
it is a serious problem.
|
|
As a general rule it is better not to allow this type of situation to develop.
|
|
One way to circumvent the problem is to make allocations speculative and part
|
|
of a retryable sequence.
|
|
For example:
|
|
.Bd -literal
|
|
retry:
|
|
/* speculative unlocked check */
|
|
if (need to allocate) {
|
|
new_item = kmem_alloc(sizeof(*new_item), KM_SLEEP);
|
|
} else {
|
|
new_item = NULL;
|
|
}
|
|
mutex_enter(lock);
|
|
/* check while holding lock for true status */
|
|
if (need to allocate) {
|
|
if (new_item == NULL) {
|
|
mutex_exit(lock);
|
|
goto retry;
|
|
}
|
|
consume(new_item);
|
|
new_item = NULL;
|
|
}
|
|
mutex_exit(lock);
|
|
if (new_item != NULL) {
|
|
/* did not use it after all */
|
|
kmem_free(new_item, sizeof(*new_item));
|
|
}
|
|
.Ed
|
|
.\" ------------------------------------------------------------
|
|
.Sh OPTIONS
|
|
.Ss KMEM_SIZE
|
|
Kernels compiled with the
|
|
.Dv KMEM_SIZE
|
|
option ensure the size given in
|
|
.Fn kmem_free
|
|
matches the actual allocated size.
|
|
On
|
|
.Fn kmem_alloc ,
|
|
the kernel will allocate an additional contiguous kmem page of eight
|
|
bytes in the buffer, will register the allocated size in the first kmem
|
|
page of that buffer, and will return a pointer to the second kmem page
|
|
in that same buffer.
|
|
When freeing, the kernel reads the first page, and compares the
|
|
size registered with the one given in
|
|
.Fn kmem_free .
|
|
Any mismatch triggers a panic.
|
|
.Pp
|
|
.Dv KMEM_SIZE
|
|
is enabled by default on
|
|
.Dv DIAGNOSTIC
|
|
and
|
|
.Dv DEBUG .
|
|
.Ss KMEM_REDZONE
|
|
Kernels compiled with the
|
|
.Dv KMEM_REDZONE
|
|
option add a dynamic pattern of two bytes at the end of each allocated
|
|
buffer, and check this pattern when freeing to ensure the caller hasn't
|
|
written outside the requested area.
|
|
This option does not introduce a significant performance impact,
|
|
but has two drawbacks: it only catches write overflows, and catches
|
|
them only on
|
|
.Fn kmem_free .
|
|
.Pp
|
|
.Dv KMEM_REDZONE
|
|
is enabled by default on
|
|
.Dv DIAGNOSTIC .
|
|
.Ss KMEM_GUARD
|
|
Kernels compiled with the
|
|
.Dv KMEM_GUARD
|
|
option perform CPU intensive sanity checks on kmem operations.
|
|
It adds additional, very high overhead runtime verification to kmem
|
|
operations.
|
|
It must be enabled with
|
|
.Dv KMEM_SIZE .
|
|
.Pp
|
|
.Dv KMEM_GUARD
|
|
tries to catch the following types of bugs:
|
|
.Bl -bullet
|
|
.It
|
|
Overflow at time of occurrence, by means of a guard page.
|
|
An unmapped guard page sits immediately after the requested area;
|
|
a read/write overflow therefore triggers a page fault.
|
|
.It
|
|
Underflow at
|
|
.Fn kmem_free ,
|
|
by using
|
|
.Dv KMEM_SIZE Ap s
|
|
registered size.
|
|
If an underflow occurs, the size stored by
|
|
.Dv KMEM_SIZE
|
|
will be overwritten, which means that when freeing, the kernel will
|
|
spot the mismatch.
|
|
.It
|
|
Use-after-free at time of occurrence.
|
|
When freeing, the memory is unmapped, and depending on the value
|
|
of kmem_guard_depth, the kernel will more or less delay the recycling
|
|
of that memory.
|
|
Which means that any ulterior read/write access to the memory will
|
|
trigger a page fault, given it hasn't been recycled yet.
|
|
.El
|
|
.Pp
|
|
To enable it, boot the system with the
|
|
.Fl d
|
|
option, which causes the debugger to be entered early during the kernel
|
|
boot process.
|
|
Issue commands such as the following:
|
|
.Bd -literal
|
|
db> w kmem_guard_depth 0t30000
|
|
db> c
|
|
.Ed
|
|
.Pp
|
|
This instructs
|
|
.Dv kmem_guard
|
|
to queue up to 60000 (30000*2) pages of unmapped KVA to catch
|
|
use-after-free type errors.
|
|
When
|
|
.Fn kmem_free
|
|
is called, memory backing a freed item is unmapped and the kernel VA
|
|
space pushed onto a FIFO.
|
|
The VA space will not be reused until another 30k items have been freed.
|
|
Until reused the kernel will catch invalid accesses and panic with a page fault.
|
|
Limitations:
|
|
.Bl -bullet
|
|
.It
|
|
It has a severe impact on performance.
|
|
.It
|
|
It is best used on a 64-bit machine with lots of RAM.
|
|
.El
|
|
.Pp
|
|
.Dv KMEM_GUARD
|
|
is enabled by default on
|
|
.Dv DEBUG .
|
|
.Sh RETURN VALUES
|
|
On success,
|
|
.Fn kmem_alloc ,
|
|
.Fn kmem_asprintf ,
|
|
.Fn kmem_intr_alloc ,
|
|
.Fn kmem_intr_zalloc ,
|
|
.Fn kmem_strdupsize ,
|
|
and
|
|
.Fn kmem_zalloc
|
|
return a pointer to allocated memory.
|
|
Otherwise,
|
|
.Dv NULL
|
|
is returned.
|
|
.\" ------------------------------------------------------------
|
|
.Sh CODE REFERENCES
|
|
The
|
|
.Nm
|
|
subsystem is implemented within the file
|
|
.Pa sys/kern/subr_kmem.c .
|
|
.\" ------------------------------------------------------------
|
|
.Sh SEE ALSO
|
|
.Xr intro 9 ,
|
|
.Xr memoryallocators 9 ,
|
|
.Xr percpu 9 ,
|
|
.Xr pool_cache 9 ,
|
|
.Xr uvm_km 9
|
|
.\" ------------------------------------------------------------
|
|
.Sh CAVEATS
|
|
The
|
|
.Fn kmem_alloc ,
|
|
.Fn kmem_asprintf ,
|
|
.Fn kmem_free ,
|
|
.Fn kmem_strdupsize ,
|
|
.Fn kmem_strfree ,
|
|
and
|
|
.Fn kmem_zalloc
|
|
functions cannot be used from interrupt context, from a soft interrupt,
|
|
or from a callout.
|
|
Use
|
|
.Xr pool_cache 9
|
|
in these situations.
|
|
.\" ------------------------------------------------------------
|
|
.Sh SECURITY CONSIDERATIONS
|
|
As the memory allocated by
|
|
.Fn kmem_alloc
|
|
is uninitialized, it can contain security-sensitive data left by its
|
|
previous user.
|
|
It is the caller's responsibility not to expose it to the world.
|