Overhaul memory management README.
The README was written as a "historical account", and that style hasn't aged particularly well. Rephrase it to describe the current situation, instead of having various version specific comments. This also updates the description of how allocated chunks are associated with their corresponding context, the method of which has changed in the preceding commit. Author: Andres Freund Discussion: https://postgr.es/m/20170228074420.aazv4iw6k562mnxg@alap3.anarazel.de
This commit is contained in:
parent
7e3aa03b41
commit
f4e2d50cd7
@ -1,15 +1,7 @@
|
|||||||
src/backend/utils/mmgr/README
|
src/backend/utils/mmgr/README
|
||||||
|
|
||||||
Notes About Memory Allocation Redesign
|
Memory Context System Design Overview
|
||||||
======================================
|
=====================================
|
||||||
|
|
||||||
Up through version 7.0, Postgres had serious problems with memory leakage
|
|
||||||
during large queries that process a lot of pass-by-reference data. There
|
|
||||||
was no provision for recycling memory until end of query. This needed to be
|
|
||||||
fixed, even more so with the advent of TOAST which allows very large chunks
|
|
||||||
of data to be passed around in the system. This document describes the new
|
|
||||||
memory management system implemented in 7.1.
|
|
||||||
|
|
||||||
|
|
||||||
Background
|
Background
|
||||||
----------
|
----------
|
||||||
@ -38,10 +30,10 @@ to or get more memory from the same context the chunk was originally
|
|||||||
allocated in.
|
allocated in.
|
||||||
|
|
||||||
At all times there is a "current" context denoted by the
|
At all times there is a "current" context denoted by the
|
||||||
CurrentMemoryContext global variable. The backend macro palloc()
|
CurrentMemoryContext global variable. palloc() implicitly allocates space
|
||||||
implicitly allocates space in that context. The MemoryContextSwitchTo()
|
in that context. The MemoryContextSwitchTo() operation selects a new current
|
||||||
operation selects a new current context (and returns the previous context,
|
context (and returns the previous context, so that the caller can restore the
|
||||||
so that the caller can restore the previous context before exiting).
|
previous context before exiting).
|
||||||
|
|
||||||
The main advantage of memory contexts over plain use of malloc/free is
|
The main advantage of memory contexts over plain use of malloc/free is
|
||||||
that the entire contents of a memory context can be freed easily, without
|
that the entire contents of a memory context can be freed easily, without
|
||||||
@ -60,8 +52,10 @@ The behavior of palloc and friends is similar to the standard C library's
|
|||||||
malloc and friends, but there are some deliberate differences too. Here
|
malloc and friends, but there are some deliberate differences too. Here
|
||||||
are some notes to clarify the behavior.
|
are some notes to clarify the behavior.
|
||||||
|
|
||||||
* If out of memory, palloc and repalloc exit via elog(ERROR). They never
|
* If out of memory, palloc and repalloc exit via elog(ERROR). They
|
||||||
return NULL, and it is not necessary or useful to test for such a result.
|
never return NULL, and it is not necessary or useful to test for such
|
||||||
|
a result. With palloc_extended() that behavior can be overridden
|
||||||
|
using the MCXT_ALLOC_NO_OOM flag.
|
||||||
|
|
||||||
* palloc(0) is explicitly a valid operation. It does not return a NULL
|
* palloc(0) is explicitly a valid operation. It does not return a NULL
|
||||||
pointer, but a valid chunk of which no bytes may be used. However, the
|
pointer, but a valid chunk of which no bytes may be used. However, the
|
||||||
@ -71,28 +65,18 @@ error. Similarly, repalloc allows realloc'ing to zero size.
|
|||||||
* pfree and repalloc do not accept a NULL pointer. This is intentional.
|
* pfree and repalloc do not accept a NULL pointer. This is intentional.
|
||||||
|
|
||||||
|
|
||||||
pfree/repalloc No Longer Depend On CurrentMemoryContext
|
The Current Memory Context
|
||||||
-------------------------------------------------------
|
--------------------------
|
||||||
|
|
||||||
Since Postgres 7.1, pfree() and repalloc() can be applied to any chunk
|
Because it would be too much notational overhead to always pass an
|
||||||
whether it belongs to CurrentMemoryContext or not --- the chunk's owning
|
appropriate memory context to called routines, there always exists the
|
||||||
context will be invoked to handle the operation, regardless. This is a
|
notion of the current memory context CurrentMemoryContext. Without it,
|
||||||
change from the old requirement that CurrentMemoryContext must be set
|
for example, the copyObject routines would need to be passed a context, as
|
||||||
to the same context the memory was allocated from before one can use
|
would function execution routines that return a pass-by-reference
|
||||||
pfree() or repalloc().
|
datatype. Similarly for routines that temporarily allocate space
|
||||||
|
internally, but don't return it to their caller? We certainly don't
|
||||||
There was some consideration of getting rid of CurrentMemoryContext entirely,
|
want to clutter every call in the system with "here is a context to
|
||||||
instead requiring the target memory context for allocation to be specified
|
use for any temporary memory allocation you might want to do".
|
||||||
explicitly. But we decided that would be too much notational overhead ---
|
|
||||||
we'd have to pass an appropriate memory context to called routines in
|
|
||||||
many places. For example, the copyObject routines would need to be passed
|
|
||||||
a context, as would function execution routines that return a
|
|
||||||
pass-by-reference datatype. And what of routines that temporarily
|
|
||||||
allocate space internally, but don't return it to their caller? We
|
|
||||||
certainly don't want to clutter every call in the system with "here is
|
|
||||||
a context to use for any temporary memory allocation you might want to
|
|
||||||
do". So there'd still need to be a global variable specifying a suitable
|
|
||||||
temporary-allocation context. That might as well be CurrentMemoryContext.
|
|
||||||
|
|
||||||
The upshot of that reasoning, though, is that CurrentMemoryContext should
|
The upshot of that reasoning, though, is that CurrentMemoryContext should
|
||||||
generally point at a short-lifespan context if at all possible. During
|
generally point at a short-lifespan context if at all possible. During
|
||||||
@ -102,42 +86,83 @@ context having greater than transaction lifespan, since doing so risks
|
|||||||
permanent memory leaks.
|
permanent memory leaks.
|
||||||
|
|
||||||
|
|
||||||
Additions to the Memory-Context Mechanism
|
pfree/repalloc Do Not Depend On CurrentMemoryContext
|
||||||
-----------------------------------------
|
----------------------------------------------------
|
||||||
|
|
||||||
Before 7.1 memory contexts were all independent, but it was too hard to
|
pfree() and repalloc() can be applied to any chunk whether it belongs
|
||||||
keep track of them; with lots of contexts there needs to be explicit
|
to CurrentMemoryContext or not --- the chunk's owning context will be
|
||||||
mechanism for that.
|
invoked to handle the operation, regardless.
|
||||||
|
|
||||||
We solved this by creating a tree of "parent" and "child" contexts. When
|
|
||||||
creating a memory context, the new context can be specified to be a child
|
|
||||||
of some existing context. A context can have many children, but only one
|
|
||||||
parent. In this way the contexts form a forest (not necessarily a single
|
|
||||||
tree, since there could be more than one top-level context; although in
|
|
||||||
current practice there is only one top context, TopMemoryContext).
|
|
||||||
|
|
||||||
We then say that resetting or deleting any particular context resets or
|
"Parent" and "Child" Contexts
|
||||||
deletes all its direct and indirect children as well. This feature allows
|
-----------------------------
|
||||||
us to manage a lot of contexts without fear that some will be leaked; we
|
|
||||||
only need to keep track of one top-level context that we are going to
|
|
||||||
delete at transaction end, and make sure that any shorter-lived contexts
|
|
||||||
we create are descendants of that context. Since the tree can have
|
|
||||||
multiple levels, we can deal easily with nested lifetimes of storage,
|
|
||||||
such as per-transaction, per-statement, per-scan, per-tuple. Storage
|
|
||||||
lifetimes that only partially overlap can be handled by allocating
|
|
||||||
from different trees of the context forest (there are some examples
|
|
||||||
in the next section).
|
|
||||||
|
|
||||||
Actually, it turns out that resetting a given context should almost
|
If all contexts were independent, it'd be hard to keep track of them,
|
||||||
always imply deleting, not just resetting, any child contexts it has.
|
especially in error cases. That is solved this by creating a tree of
|
||||||
So MemoryContextReset() means that, and if you really do want a tree of
|
"parent" and "child" contexts. When creating a memory context, the
|
||||||
empty contexts you need to call MemoryContextResetOnly() plus
|
new context can be specified to be a child of some existing context.
|
||||||
MemoryContextResetChildren().
|
A context can have many children, but only one parent. In this way
|
||||||
|
the contexts form a forest (not necessarily a single tree, since there
|
||||||
|
could be more than one top-level context; although in current practice
|
||||||
|
there is only one top context, TopMemoryContext).
|
||||||
|
|
||||||
|
Deleting a context deletes all its direct and indirect children as
|
||||||
|
well. When resetting a context it's almost always more useful to
|
||||||
|
delete child contexts, thus MemoryContextReset() means that, and if
|
||||||
|
you really do want a tree of empty contexts you need to call
|
||||||
|
MemoryContextResetOnly() plus MemoryContextResetChildren().
|
||||||
|
|
||||||
|
These features allow us to manage a lot of contexts without fear that
|
||||||
|
some will be leaked; we only need to keep track of one top-level
|
||||||
|
context that we are going to delete at transaction end, and make sure
|
||||||
|
that any shorter-lived contexts we create are descendants of that
|
||||||
|
context. Since the tree can have multiple levels, we can deal easily
|
||||||
|
with nested lifetimes of storage, such as per-transaction,
|
||||||
|
per-statement, per-scan, per-tuple. Storage lifetimes that only
|
||||||
|
partially overlap can be handled by allocating from different trees of
|
||||||
|
the context forest (there are some examples in the next section).
|
||||||
|
|
||||||
For convenience we also provide operations like "reset/delete all children
|
For convenience we also provide operations like "reset/delete all children
|
||||||
of a given context, but don't reset or delete that context itself".
|
of a given context, but don't reset or delete that context itself".
|
||||||
|
|
||||||
|
|
||||||
|
Memory Context Reset/Delete Callbacks
|
||||||
|
-------------------------------------
|
||||||
|
|
||||||
|
A feature introduced in Postgres 9.5 allows memory contexts to be used
|
||||||
|
for managing more resources than just plain palloc'd memory. This is
|
||||||
|
done by registering a "reset callback function" for a memory context.
|
||||||
|
Such a function will be called, once, just before the context is next
|
||||||
|
reset or deleted. It can be used to give up resources that are in some
|
||||||
|
sense associated with an object allocated within the context. Possible
|
||||||
|
use-cases include
|
||||||
|
* closing open files associated with a tuplesort object;
|
||||||
|
* releasing reference counts on long-lived cache objects that are held
|
||||||
|
by some object within the context being reset;
|
||||||
|
* freeing malloc-managed memory associated with some palloc'd object.
|
||||||
|
That last case would just represent bad programming practice for pure
|
||||||
|
Postgres code; better to have made all the allocations using palloc,
|
||||||
|
in the target context or some child context. However, it could well
|
||||||
|
come in handy for code that interfaces to non-Postgres libraries.
|
||||||
|
|
||||||
|
Any number of reset callbacks can be established for a memory context;
|
||||||
|
they are called in reverse order of registration. Also, callbacks
|
||||||
|
attached to child contexts are called before callbacks attached to
|
||||||
|
parent contexts, if a tree of contexts is being reset or deleted.
|
||||||
|
|
||||||
|
The API for this requires the caller to provide a MemoryContextCallback
|
||||||
|
memory chunk to hold the state for a callback. Typically this should be
|
||||||
|
allocated in the same context it is logically attached to, so that it
|
||||||
|
will be released automatically after use. The reason for asking the
|
||||||
|
caller to provide this memory is that in most usage scenarios, the caller
|
||||||
|
will be creating some larger struct within the target context, and the
|
||||||
|
MemoryContextCallback struct can be made "for free" without a separate
|
||||||
|
palloc() call by including it in this larger struct.
|
||||||
|
|
||||||
|
|
||||||
|
Memory Contexts in Practice
|
||||||
|
===========================
|
||||||
|
|
||||||
Globally Known Contexts
|
Globally Known Contexts
|
||||||
-----------------------
|
-----------------------
|
||||||
|
|
||||||
@ -325,83 +350,64 @@ copy step.
|
|||||||
Mechanisms to Allow Multiple Types of Contexts
|
Mechanisms to Allow Multiple Types of Contexts
|
||||||
----------------------------------------------
|
----------------------------------------------
|
||||||
|
|
||||||
We may want several different types of memory contexts with different
|
To efficiently allow for different allocation patterns, and for
|
||||||
allocation policies but similar external behavior. To handle this,
|
experimentation, we allow for different types of memory contexts with
|
||||||
memory allocation functions will be accessed via function pointers,
|
different allocation policies but similar external behavior. To
|
||||||
and we will require all context types to obey the conventions given here.
|
handle this, memory allocation functions are accessed via function
|
||||||
(As of 2015, there's actually still just one context type; but interest in
|
pointers, and we require all context types to obey the conventions
|
||||||
creating other types has never gone away entirely, so we retain this API.)
|
given here.
|
||||||
|
|
||||||
A memory context is represented by an object like
|
A memory context is represented by struct MemoryContextData (see
|
||||||
|
memnodes.h). This struct identifies the exact type of the context, and
|
||||||
|
contains information common between the different types of
|
||||||
|
MemoryContext like the parent and child contexts, and the name of the
|
||||||
|
context.
|
||||||
|
|
||||||
typedef struct MemoryContextData
|
This is essentially an abstract superclass, and the behavior is
|
||||||
{
|
determined by the "methods" pointer is its virtual function table
|
||||||
NodeTag type; /* identifies exact kind of context */
|
(struct MemoryContextMethods). Specific memory context types will use
|
||||||
MemoryContextMethods methods;
|
|
||||||
MemoryContextData *parent; /* NULL if no parent (toplevel context) */
|
|
||||||
MemoryContextData *firstchild; /* head of linked list of children */
|
|
||||||
MemoryContextData *nextchild; /* next child of same parent */
|
|
||||||
char *name; /* context name (just for debugging) */
|
|
||||||
} MemoryContextData, *MemoryContext;
|
|
||||||
|
|
||||||
This is essentially an abstract superclass, and the "methods" pointer is
|
|
||||||
its virtual function table. Specific memory context types will use
|
|
||||||
derived structs having these fields as their first fields. All the
|
derived structs having these fields as their first fields. All the
|
||||||
contexts of a specific type will have methods pointers that point to the
|
contexts of a specific type will have methods pointers that point to
|
||||||
same static table of function pointers, which look like
|
the same static table of function pointers.
|
||||||
|
|
||||||
typedef struct MemoryContextMethodsData
|
While operations like allocating from and resetting a context take the
|
||||||
{
|
relevant MemoryContext as a parameter, operations like free and
|
||||||
Pointer (*alloc) (MemoryContext c, Size size);
|
realloc are trickier. To make those work, we require all memory
|
||||||
void (*free_p) (Pointer chunk);
|
context types to produce allocated chunks that are immediately,
|
||||||
Pointer (*realloc) (Pointer chunk, Size newsize);
|
without any padding, preceded by a pointer to the corresponding
|
||||||
void (*reset) (MemoryContext c);
|
MemoryContext.
|
||||||
void (*delete) (MemoryContext c);
|
|
||||||
} MemoryContextMethodsData, *MemoryContextMethods;
|
|
||||||
|
|
||||||
Alloc, reset, and delete requests will take a MemoryContext pointer
|
If a type of allocator needs additional information about its chunks,
|
||||||
as parameter, so they'll have no trouble finding the method pointer
|
like e.g. the size of the allocation, that information can in turn
|
||||||
to call. Free and realloc are trickier. To make those work, we
|
precede the MemoryContext. This means the only overhead implied by
|
||||||
require all memory context types to produce allocated chunks that
|
the memory context mechanism is a pointer to its context, so we're not
|
||||||
are immediately preceded by a standard chunk header, which has the
|
constraining context-type designers very much.
|
||||||
layout
|
|
||||||
|
|
||||||
typedef struct StandardChunkHeader
|
Given this, routines like pfree their corresponding context with an
|
||||||
{
|
operation like (although that is usually encapsulated in
|
||||||
MemoryContext mycontext; /* Link to owning context object */
|
GetMemoryChunkContext())
|
||||||
Size size; /* Allocated size of chunk */
|
|
||||||
};
|
|
||||||
|
|
||||||
It turns out that the pre-existing aset.c memory context type did this
|
MemoryContext context = *(MemoryContext*) (((char *) pointer) - sizeof(void *));
|
||||||
already, and probably any other kind of context would need to have the
|
|
||||||
same data available to support realloc, so this is not really creating
|
|
||||||
any additional overhead. (Note that if a context type needs more per-
|
|
||||||
allocated-chunk information than this, it can make an additional
|
|
||||||
nonstandard header that precedes the standard header. So we're not
|
|
||||||
constraining context-type designers very much.)
|
|
||||||
|
|
||||||
Given this, the pfree routine looks something like
|
and then invoke the corresponding method for the context
|
||||||
|
|
||||||
StandardChunkHeader * header =
|
(*context->methods->free_p) (p);
|
||||||
(StandardChunkHeader *) ((char *) p - sizeof(StandardChunkHeader));
|
|
||||||
|
|
||||||
(*header->mycontext->methods->free_p) (p);
|
|
||||||
|
|
||||||
|
|
||||||
More Control Over aset.c Behavior
|
More Control Over aset.c Behavior
|
||||||
---------------------------------
|
---------------------------------
|
||||||
|
|
||||||
Previously, aset.c always allocated an 8K block upon the first allocation
|
By default aset.c always allocates an 8K block upon the first
|
||||||
in a context, and doubled that size for each successive block request.
|
allocation in a context, and doubles that size for each successive
|
||||||
That's good behavior for a context that might hold *lots* of data, and
|
block request. That's good behavior for a context that might hold
|
||||||
the overhead wasn't bad when we had only a few contexts in existence.
|
*lots* of data. But if there are dozens if not hundreds of smaller
|
||||||
With dozens if not hundreds of smaller contexts in the system, we need
|
contexts in the system, we need to be able to fine-tune things a
|
||||||
to be able to fine-tune things a little better.
|
little better.
|
||||||
|
|
||||||
The creator of a context is now able to specify an initial block size
|
The creator of a context is able to specify an initial block size and
|
||||||
and a maximum block size. Selecting smaller values can prevent wastage
|
a maximum block size. Selecting smaller values can prevent wastage of
|
||||||
of space in contexts that aren't expected to hold very much (an example is
|
space in contexts that aren't expected to hold very much (an example
|
||||||
the relcache's per-relation contexts).
|
is the relcache's per-relation contexts).
|
||||||
|
|
||||||
Also, it is possible to specify a minimum context size. If this
|
Also, it is possible to specify a minimum context size. If this
|
||||||
value is greater than zero then a block of that size will be grabbed
|
value is greater than zero then a block of that size will be grabbed
|
||||||
@ -414,37 +420,3 @@ will not allocate very much space per tuple cycle. To make this usage
|
|||||||
pattern cheap, the first block allocated in a context is not given
|
pattern cheap, the first block allocated in a context is not given
|
||||||
back to malloc() during reset, but just cleared. This avoids malloc
|
back to malloc() during reset, but just cleared. This avoids malloc
|
||||||
thrashing.
|
thrashing.
|
||||||
|
|
||||||
|
|
||||||
Memory Context Reset/Delete Callbacks
|
|
||||||
-------------------------------------
|
|
||||||
|
|
||||||
A feature introduced in Postgres 9.5 allows memory contexts to be used
|
|
||||||
for managing more resources than just plain palloc'd memory. This is
|
|
||||||
done by registering a "reset callback function" for a memory context.
|
|
||||||
Such a function will be called, once, just before the context is next
|
|
||||||
reset or deleted. It can be used to give up resources that are in some
|
|
||||||
sense associated with an object allocated within the context. Possible
|
|
||||||
use-cases include
|
|
||||||
* closing open files associated with a tuplesort object;
|
|
||||||
* releasing reference counts on long-lived cache objects that are held
|
|
||||||
by some object within the context being reset;
|
|
||||||
* freeing malloc-managed memory associated with some palloc'd object.
|
|
||||||
That last case would just represent bad programming practice for pure
|
|
||||||
Postgres code; better to have made all the allocations using palloc,
|
|
||||||
in the target context or some child context. However, it could well
|
|
||||||
come in handy for code that interfaces to non-Postgres libraries.
|
|
||||||
|
|
||||||
Any number of reset callbacks can be established for a memory context;
|
|
||||||
they are called in reverse order of registration. Also, callbacks
|
|
||||||
attached to child contexts are called before callbacks attached to
|
|
||||||
parent contexts, if a tree of contexts is being reset or deleted.
|
|
||||||
|
|
||||||
The API for this requires the caller to provide a MemoryContextCallback
|
|
||||||
memory chunk to hold the state for a callback. Typically this should be
|
|
||||||
allocated in the same context it is logically attached to, so that it
|
|
||||||
will be released automatically after use. The reason for asking the
|
|
||||||
caller to provide this memory is that in most usage scenarios, the caller
|
|
||||||
will be creating some larger struct within the target context, and the
|
|
||||||
MemoryContextCallback struct can be made "for free" without a separate
|
|
||||||
palloc() call by including it in this larger struct.
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user