docs/devel/rcu: Convert to rST format
Convert docs/devel/rcu.txt to rST format. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20240816132212.3602106-6-peter.maydell@linaro.org
This commit is contained in:
parent
4f0b3e0b95
commit
90655d815a
@ -3059,7 +3059,7 @@ Read, Copy, Update (RCU)
|
|||||||
M: Paolo Bonzini <pbonzini@redhat.com>
|
M: Paolo Bonzini <pbonzini@redhat.com>
|
||||||
S: Maintained
|
S: Maintained
|
||||||
F: docs/devel/lockcnt.rst
|
F: docs/devel/lockcnt.rst
|
||||||
F: docs/devel/rcu.txt
|
F: docs/devel/rcu.rst
|
||||||
F: include/qemu/rcu*.h
|
F: include/qemu/rcu*.h
|
||||||
F: tests/unit/rcutorture.c
|
F: tests/unit/rcutorture.c
|
||||||
F: tests/unit/test-rcu-*.c
|
F: tests/unit/test-rcu-*.c
|
||||||
|
@ -8,6 +8,7 @@ Details about QEMU's various subsystems including how to add features to them.
|
|||||||
|
|
||||||
qom
|
qom
|
||||||
atomics
|
atomics
|
||||||
|
rcu
|
||||||
block-coroutine-wrapper
|
block-coroutine-wrapper
|
||||||
clocks
|
clocks
|
||||||
ebpf_rss
|
ebpf_rss
|
||||||
|
@ -20,7 +20,7 @@ for the execution of all *currently running* critical sections before
|
|||||||
proceeding, or before asynchronously executing a callback.
|
proceeding, or before asynchronously executing a callback.
|
||||||
|
|
||||||
The key point here is that only the currently running critical sections
|
The key point here is that only the currently running critical sections
|
||||||
are waited for; critical sections that are started _after_ the beginning
|
are waited for; critical sections that are started **after** the beginning
|
||||||
of the wait do not extend the wait, despite running concurrently with
|
of the wait do not extend the wait, despite running concurrently with
|
||||||
the updater. This is the reason why RCU is more scalable than,
|
the updater. This is the reason why RCU is more scalable than,
|
||||||
for example, reader-writer locks. It is so much more scalable that
|
for example, reader-writer locks. It is so much more scalable that
|
||||||
@ -37,7 +37,7 @@ do not matter; as soon as all previous critical sections have finished,
|
|||||||
there cannot be any readers who hold references to the data structure,
|
there cannot be any readers who hold references to the data structure,
|
||||||
and these can now be safely reclaimed (e.g., freed or unref'ed).
|
and these can now be safely reclaimed (e.g., freed or unref'ed).
|
||||||
|
|
||||||
Here is a picture:
|
Here is a picture::
|
||||||
|
|
||||||
thread 1 thread 2 thread 3
|
thread 1 thread 2 thread 3
|
||||||
------------------- ------------------------ -------------------
|
------------------- ------------------------ -------------------
|
||||||
@ -58,43 +58,38 @@ that critical section.
|
|||||||
|
|
||||||
|
|
||||||
RCU API
|
RCU API
|
||||||
=======
|
-------
|
||||||
|
|
||||||
The core RCU API is small:
|
The core RCU API is small:
|
||||||
|
|
||||||
void rcu_read_lock(void);
|
``void rcu_read_lock(void);``
|
||||||
|
|
||||||
Used by a reader to inform the reclaimer that the reader is
|
Used by a reader to inform the reclaimer that the reader is
|
||||||
entering an RCU read-side critical section.
|
entering an RCU read-side critical section.
|
||||||
|
|
||||||
void rcu_read_unlock(void);
|
``void rcu_read_unlock(void);``
|
||||||
|
|
||||||
Used by a reader to inform the reclaimer that the reader is
|
Used by a reader to inform the reclaimer that the reader is
|
||||||
exiting an RCU read-side critical section. Note that RCU
|
exiting an RCU read-side critical section. Note that RCU
|
||||||
read-side critical sections may be nested and/or overlapping.
|
read-side critical sections may be nested and/or overlapping.
|
||||||
|
|
||||||
void synchronize_rcu(void);
|
``void synchronize_rcu(void);``
|
||||||
|
|
||||||
Blocks until all pre-existing RCU read-side critical sections
|
Blocks until all pre-existing RCU read-side critical sections
|
||||||
on all threads have completed. This marks the end of the removal
|
on all threads have completed. This marks the end of the removal
|
||||||
phase and the beginning of reclamation phase.
|
phase and the beginning of reclamation phase.
|
||||||
|
|
||||||
Note that it would be valid for another update to come while
|
Note that it would be valid for another update to come while
|
||||||
synchronize_rcu is running. Because of this, it is better that
|
``synchronize_rcu`` is running. Because of this, it is better that
|
||||||
the updater releases any locks it may hold before calling
|
the updater releases any locks it may hold before calling
|
||||||
synchronize_rcu. If this is not possible (for example, because
|
``synchronize_rcu``. If this is not possible (for example, because
|
||||||
the updater is protected by the BQL), you can use call_rcu.
|
the updater is protected by the BQL), you can use ``call_rcu``.
|
||||||
|
|
||||||
void call_rcu1(struct rcu_head * head,
|
``void call_rcu1(struct rcu_head * head, void (*func)(struct rcu_head *head));``
|
||||||
void (*func)(struct rcu_head *head));
|
This function invokes ``func(head)`` after all pre-existing RCU
|
||||||
|
|
||||||
This function invokes func(head) after all pre-existing RCU
|
|
||||||
read-side critical sections on all threads have completed. This
|
read-side critical sections on all threads have completed. This
|
||||||
marks the end of the removal phase, with func taking care
|
marks the end of the removal phase, with func taking care
|
||||||
asynchronously of the reclamation phase.
|
asynchronously of the reclamation phase.
|
||||||
|
|
||||||
The foo struct needs to have an rcu_head structure added,
|
The ``foo`` struct needs to have an ``rcu_head`` structure added,
|
||||||
perhaps as follows:
|
perhaps as follows::
|
||||||
|
|
||||||
struct foo {
|
struct foo {
|
||||||
struct rcu_head rcu;
|
struct rcu_head rcu;
|
||||||
@ -103,8 +98,8 @@ The core RCU API is small:
|
|||||||
long c;
|
long c;
|
||||||
};
|
};
|
||||||
|
|
||||||
so that the reclaimer function can fetch the struct foo address
|
so that the reclaimer function can fetch the ``struct foo`` address
|
||||||
and free it:
|
and free it::
|
||||||
|
|
||||||
call_rcu1(&foo.rcu, foo_reclaim);
|
call_rcu1(&foo.rcu, foo_reclaim);
|
||||||
|
|
||||||
@ -114,29 +109,27 @@ The core RCU API is small:
|
|||||||
g_free(fp);
|
g_free(fp);
|
||||||
}
|
}
|
||||||
|
|
||||||
For the common case where the rcu_head member is the first of the
|
``call_rcu1`` is typically used via either the ``call_rcu`` or
|
||||||
struct, you can use the following macro.
|
``g_free_rcu`` macros, which handle the common case where the
|
||||||
|
``rcu_head`` member is the first of the struct.
|
||||||
|
|
||||||
void call_rcu(T *p,
|
``void call_rcu(T *p, void (*func)(T *p), field-name);``
|
||||||
void (*func)(T *p),
|
If the ``struct rcu_head`` is the first field in the struct, you can
|
||||||
field-name);
|
use this macro instead of ``call_rcu1``.
|
||||||
void g_free_rcu(T *p,
|
|
||||||
field-name);
|
|
||||||
|
|
||||||
call_rcu1 is typically used through these macro, in the common case
|
``void g_free_rcu(T *p, field-name);``
|
||||||
where the "struct rcu_head" is the first field in the struct. If
|
This is a special-case version of ``call_rcu`` where the callback
|
||||||
the callback function is g_free, in particular, g_free_rcu can be
|
function is ``g_free``.
|
||||||
used. In the above case, one could have written simply:
|
In the example given in ``call_rcu1``, one could have written simply::
|
||||||
|
|
||||||
g_free_rcu(&foo, rcu);
|
g_free_rcu(&foo, rcu);
|
||||||
|
|
||||||
typeof(*p) qatomic_rcu_read(p);
|
``typeof(*p) qatomic_rcu_read(p);``
|
||||||
|
``qatomic_rcu_read()`` is similar to ``qatomic_load_acquire()``, but
|
||||||
|
it makes some assumptions on the code that calls it. This allows a
|
||||||
|
more optimized implementation.
|
||||||
|
|
||||||
qatomic_rcu_read() is similar to qatomic_load_acquire(), but it makes
|
``qatomic_rcu_read`` assumes that whenever a single RCU critical
|
||||||
some assumptions on the code that calls it. This allows a more
|
|
||||||
optimized implementation.
|
|
||||||
|
|
||||||
qatomic_rcu_read assumes that whenever a single RCU critical
|
|
||||||
section reads multiple shared data, these reads are either
|
section reads multiple shared data, these reads are either
|
||||||
data-dependent or need no ordering. This is almost always the
|
data-dependent or need no ordering. This is almost always the
|
||||||
case when using RCU, because read-side critical sections typically
|
case when using RCU, because read-side critical sections typically
|
||||||
@ -144,7 +137,7 @@ The core RCU API is small:
|
|||||||
every update) until reaching a data structure of interest,
|
every update) until reaching a data structure of interest,
|
||||||
and then read from there.
|
and then read from there.
|
||||||
|
|
||||||
RCU read-side critical sections must use qatomic_rcu_read() to
|
RCU read-side critical sections must use ``qatomic_rcu_read()`` to
|
||||||
read data, unless concurrent writes are prevented by another
|
read data, unless concurrent writes are prevented by another
|
||||||
synchronization mechanism.
|
synchronization mechanism.
|
||||||
|
|
||||||
@ -152,18 +145,17 @@ The core RCU API is small:
|
|||||||
data structure in a single direction, opposite to the direction
|
data structure in a single direction, opposite to the direction
|
||||||
in which the updater initializes it.
|
in which the updater initializes it.
|
||||||
|
|
||||||
void qatomic_rcu_set(p, typeof(*p) v);
|
``void qatomic_rcu_set(p, typeof(*p) v);``
|
||||||
|
``qatomic_rcu_set()`` is similar to ``qatomic_store_release()``,
|
||||||
|
though it also makes assumptions on the code that calls it in
|
||||||
|
order to allow a more optimized implementation.
|
||||||
|
|
||||||
qatomic_rcu_set() is similar to qatomic_store_release(), though it also
|
In particular, ``qatomic_rcu_set()`` suffices for synchronization
|
||||||
makes assumptions on the code that calls it in order to allow a more
|
|
||||||
optimized implementation.
|
|
||||||
|
|
||||||
In particular, qatomic_rcu_set() suffices for synchronization
|
|
||||||
with readers, if the updater never mutates a field within a
|
with readers, if the updater never mutates a field within a
|
||||||
data item that is already accessible to readers. This is the
|
data item that is already accessible to readers. This is the
|
||||||
case when initializing a new copy of the RCU-protected data
|
case when initializing a new copy of the RCU-protected data
|
||||||
structure; just ensure that initialization of *p is carried out
|
structure; just ensure that initialization of ``*p`` is carried out
|
||||||
before qatomic_rcu_set() makes the data item visible to readers.
|
before ``qatomic_rcu_set()`` makes the data item visible to readers.
|
||||||
If this rule is observed, writes will happen in the opposite
|
If this rule is observed, writes will happen in the opposite
|
||||||
order as reads in the RCU read-side critical sections (or if
|
order as reads in the RCU read-side critical sections (or if
|
||||||
there is just one update), and there will be no need for other
|
there is just one update), and there will be no need for other
|
||||||
@ -171,58 +163,54 @@ The core RCU API is small:
|
|||||||
|
|
||||||
The following APIs must be used before RCU is used in a thread:
|
The following APIs must be used before RCU is used in a thread:
|
||||||
|
|
||||||
void rcu_register_thread(void);
|
``void rcu_register_thread(void);``
|
||||||
|
|
||||||
Mark a thread as taking part in the RCU mechanism. Such a thread
|
Mark a thread as taking part in the RCU mechanism. Such a thread
|
||||||
will have to report quiescent points regularly, either manually
|
will have to report quiescent points regularly, either manually
|
||||||
or through the QemuCond/QemuSemaphore/QemuEvent APIs.
|
or through the ``QemuCond``/``QemuSemaphore``/``QemuEvent`` APIs.
|
||||||
|
|
||||||
void rcu_unregister_thread(void);
|
|
||||||
|
|
||||||
|
``void rcu_unregister_thread(void);``
|
||||||
Mark a thread as not taking part anymore in the RCU mechanism.
|
Mark a thread as not taking part anymore in the RCU mechanism.
|
||||||
It is not a problem if such a thread reports quiescent points,
|
It is not a problem if such a thread reports quiescent points,
|
||||||
either manually or by using the QemuCond/QemuSemaphore/QemuEvent
|
either manually or by using the
|
||||||
APIs.
|
``QemuCond``/``QemuSemaphore``/``QemuEvent`` APIs.
|
||||||
|
|
||||||
Note that these APIs are relatively heavyweight, and should _not_ be
|
Note that these APIs are relatively heavyweight, and should **not** be
|
||||||
nested.
|
nested.
|
||||||
|
|
||||||
Convenience macros
|
Convenience macros
|
||||||
==================
|
------------------
|
||||||
|
|
||||||
Two macros are provided that automatically release the read lock at the
|
Two macros are provided that automatically release the read lock at the
|
||||||
end of the scope.
|
end of the scope.
|
||||||
|
|
||||||
RCU_READ_LOCK_GUARD()
|
``RCU_READ_LOCK_GUARD()``
|
||||||
|
|
||||||
Takes the lock and will release it at the end of the block it's
|
Takes the lock and will release it at the end of the block it's
|
||||||
used in.
|
used in.
|
||||||
|
|
||||||
WITH_RCU_READ_LOCK_GUARD() { code }
|
``WITH_RCU_READ_LOCK_GUARD() { code }``
|
||||||
|
|
||||||
Is used at the head of a block to protect the code within the block.
|
Is used at the head of a block to protect the code within the block.
|
||||||
|
|
||||||
Note that 'goto'ing out of the guarded block will also drop the lock.
|
Note that a ``goto`` out of the guarded block will also drop the lock.
|
||||||
|
|
||||||
DIFFERENCES WITH LINUX
|
Differences with Linux
|
||||||
======================
|
----------------------
|
||||||
|
|
||||||
- Waiting on a mutex is possible, though discouraged, within an RCU critical
|
- Waiting on a mutex is possible, though discouraged, within an RCU critical
|
||||||
section. This is because spinlocks are rarely (if ever) used in userspace
|
section. This is because spinlocks are rarely (if ever) used in userspace
|
||||||
programming; not allowing this would prevent upgrading an RCU read-side
|
programming; not allowing this would prevent upgrading an RCU read-side
|
||||||
critical section to become an updater.
|
critical section to become an updater.
|
||||||
|
|
||||||
- qatomic_rcu_read and qatomic_rcu_set replace rcu_dereference and
|
- ``qatomic_rcu_read`` and ``qatomic_rcu_set`` replace ``rcu_dereference`` and
|
||||||
rcu_assign_pointer. They take a _pointer_ to the variable being accessed.
|
``rcu_assign_pointer``. They take a **pointer** to the variable being accessed.
|
||||||
|
|
||||||
- call_rcu is a macro that has an extra argument (the name of the first
|
- ``call_rcu`` is a macro that has an extra argument (the name of the first
|
||||||
field in the struct, which must be a struct rcu_head), and expects the
|
field in the struct, which must be a struct ``rcu_head``), and expects the
|
||||||
type of the callback's argument to be the type of the first argument.
|
type of the callback's argument to be the type of the first argument.
|
||||||
call_rcu1 is the same as Linux's call_rcu.
|
``call_rcu1`` is the same as Linux's ``call_rcu``.
|
||||||
|
|
||||||
|
|
||||||
RCU PATTERNS
|
RCU Patterns
|
||||||
============
|
------------
|
||||||
|
|
||||||
Many patterns using read-writer locks translate directly to RCU, with
|
Many patterns using read-writer locks translate directly to RCU, with
|
||||||
the advantages of higher scalability and deadlock immunity.
|
the advantages of higher scalability and deadlock immunity.
|
||||||
@ -243,28 +231,28 @@ Here are some frequently-used RCU idioms that are worth noting.
|
|||||||
|
|
||||||
|
|
||||||
RCU list processing
|
RCU list processing
|
||||||
-------------------
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
TBD (not yet used in QEMU)
|
TBD (not yet used in QEMU)
|
||||||
|
|
||||||
|
|
||||||
RCU reference counting
|
RCU reference counting
|
||||||
----------------------
|
^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
Because grace periods are not allowed to complete while there is an RCU
|
Because grace periods are not allowed to complete while there is an RCU
|
||||||
read-side critical section in progress, the RCU read-side primitives
|
read-side critical section in progress, the RCU read-side primitives
|
||||||
may be used as a restricted reference-counting mechanism. For example,
|
may be used as a restricted reference-counting mechanism. For example,
|
||||||
consider the following code fragment:
|
consider the following code fragment::
|
||||||
|
|
||||||
rcu_read_lock();
|
rcu_read_lock();
|
||||||
p = qatomic_rcu_read(&foo);
|
p = qatomic_rcu_read(&foo);
|
||||||
/* do something with p. */
|
/* do something with p. */
|
||||||
rcu_read_unlock();
|
rcu_read_unlock();
|
||||||
|
|
||||||
The RCU read-side critical section ensures that the value of "p" remains
|
The RCU read-side critical section ensures that the value of ``p`` remains
|
||||||
valid until after the rcu_read_unlock(). In some sense, it is acquiring
|
valid until after the ``rcu_read_unlock()``. In some sense, it is acquiring
|
||||||
a reference to p that is later released when the critical section ends.
|
a reference to ``p`` that is later released when the critical section ends.
|
||||||
The write side looks simply like this (with appropriate locking):
|
The write side looks simply like this (with appropriate locking)::
|
||||||
|
|
||||||
qemu_mutex_lock(&foo_mutex);
|
qemu_mutex_lock(&foo_mutex);
|
||||||
old = foo;
|
old = foo;
|
||||||
@ -274,7 +262,7 @@ The write side looks simply like this (with appropriate locking):
|
|||||||
free(old);
|
free(old);
|
||||||
|
|
||||||
If the processing cannot be done purely within the critical section, it
|
If the processing cannot be done purely within the critical section, it
|
||||||
is possible to combine this idiom with a "real" reference count:
|
is possible to combine this idiom with a "real" reference count::
|
||||||
|
|
||||||
rcu_read_lock();
|
rcu_read_lock();
|
||||||
p = qatomic_rcu_read(&foo);
|
p = qatomic_rcu_read(&foo);
|
||||||
@ -283,7 +271,7 @@ is possible to combine this idiom with a "real" reference count:
|
|||||||
/* do something with p. */
|
/* do something with p. */
|
||||||
foo_unref(p);
|
foo_unref(p);
|
||||||
|
|
||||||
The write side can be like this:
|
The write side can be like this::
|
||||||
|
|
||||||
qemu_mutex_lock(&foo_mutex);
|
qemu_mutex_lock(&foo_mutex);
|
||||||
old = foo;
|
old = foo;
|
||||||
@ -292,7 +280,7 @@ The write side can be like this:
|
|||||||
synchronize_rcu();
|
synchronize_rcu();
|
||||||
foo_unref(old);
|
foo_unref(old);
|
||||||
|
|
||||||
or with call_rcu:
|
or with ``call_rcu``::
|
||||||
|
|
||||||
qemu_mutex_lock(&foo_mutex);
|
qemu_mutex_lock(&foo_mutex);
|
||||||
old = foo;
|
old = foo;
|
||||||
@ -301,10 +289,10 @@ or with call_rcu:
|
|||||||
call_rcu(foo_unref, old, rcu);
|
call_rcu(foo_unref, old, rcu);
|
||||||
|
|
||||||
In both cases, the write side only performs removal. Reclamation
|
In both cases, the write side only performs removal. Reclamation
|
||||||
happens when the last reference to a "foo" object is dropped.
|
happens when the last reference to a ``foo`` object is dropped.
|
||||||
Using synchronize_rcu() is undesirably expensive, because the
|
Using ``synchronize_rcu()`` is undesirably expensive, because the
|
||||||
last reference may be dropped on the read side. Hence you can
|
last reference may be dropped on the read side. Hence you can
|
||||||
use call_rcu() instead:
|
use ``call_rcu()`` instead::
|
||||||
|
|
||||||
foo_unref(struct foo *p) {
|
foo_unref(struct foo *p) {
|
||||||
if (qatomic_fetch_dec(&p->refcount) == 1) {
|
if (qatomic_fetch_dec(&p->refcount) == 1) {
|
||||||
@ -314,7 +302,7 @@ use call_rcu() instead:
|
|||||||
|
|
||||||
|
|
||||||
Note that the same idioms would be possible with reader/writer
|
Note that the same idioms would be possible with reader/writer
|
||||||
locks:
|
locks::
|
||||||
|
|
||||||
read_lock(&foo_rwlock); write_mutex_lock(&foo_rwlock);
|
read_lock(&foo_rwlock); write_mutex_lock(&foo_rwlock);
|
||||||
p = foo; p = foo;
|
p = foo; p = foo;
|
||||||
@ -334,15 +322,15 @@ locks:
|
|||||||
foo_unref(p);
|
foo_unref(p);
|
||||||
read_unlock(&foo_rwlock);
|
read_unlock(&foo_rwlock);
|
||||||
|
|
||||||
foo_unref could use a mechanism such as bottom halves to move deallocation
|
``foo_unref`` could use a mechanism such as bottom halves to move deallocation
|
||||||
out of the write-side critical section.
|
out of the write-side critical section.
|
||||||
|
|
||||||
|
|
||||||
RCU resizable arrays
|
RCU resizable arrays
|
||||||
--------------------
|
^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
Resizable arrays can be used with RCU. The expensive RCU synchronization
|
Resizable arrays can be used with RCU. The expensive RCU synchronization
|
||||||
(or call_rcu) only needs to take place when the array is resized.
|
(or ``call_rcu``) only needs to take place when the array is resized.
|
||||||
The two items to take care of are:
|
The two items to take care of are:
|
||||||
|
|
||||||
- ensuring that the old version of the array is available between removal
|
- ensuring that the old version of the array is available between removal
|
||||||
@ -351,10 +339,10 @@ The two items to take care of are:
|
|||||||
- avoiding mismatches in the read side between the array data and the
|
- avoiding mismatches in the read side between the array data and the
|
||||||
array size.
|
array size.
|
||||||
|
|
||||||
The first problem is avoided simply by not using realloc. Instead,
|
The first problem is avoided simply by not using ``realloc``. Instead,
|
||||||
each resize will allocate a new array and copy the old data into it.
|
each resize will allocate a new array and copy the old data into it.
|
||||||
The second problem would arise if the size and the data pointers were
|
The second problem would arise if the size and the data pointers were
|
||||||
two members of a larger struct:
|
two members of a larger struct::
|
||||||
|
|
||||||
struct mystuff {
|
struct mystuff {
|
||||||
...
|
...
|
||||||
@ -364,7 +352,7 @@ two members of a larger struct:
|
|||||||
...
|
...
|
||||||
};
|
};
|
||||||
|
|
||||||
Instead, we store the size of the array with the array itself:
|
Instead, we store the size of the array with the array itself::
|
||||||
|
|
||||||
struct arr {
|
struct arr {
|
||||||
int size;
|
int size;
|
||||||
@ -400,7 +388,7 @@ Instead, we store the size of the array with the array itself:
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
SOURCES
|
References
|
||||||
=======
|
----------
|
||||||
|
|
||||||
* Documentation/RCU/ from the Linux kernel
|
* The `Linux kernel RCU documentation <https://docs.kernel.org/RCU/>`__
|
Loading…
Reference in New Issue
Block a user