docs/system: Convert security.texi to rST format
security.texi is included from qemu-doc.texi but is not used in the qemu.1 manpage. So we can do a straightforward conversion of the contents, which go into the system manual. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20200228153619.9906-17-peter.maydell@linaro.org Message-id: 20200226113034.6741-16-pbonzini@redhat.com
This commit is contained in:
parent
1bf84a1e2e
commit
c02c112a2c
@ -14,4 +14,5 @@ Contents:
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
security
|
||||
vfio-ap
|
||||
|
173
docs/system/security.rst
Normal file
173
docs/system/security.rst
Normal file
@ -0,0 +1,173 @@
|
||||
Security
|
||||
========
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
This chapter explains the security requirements that QEMU is designed to meet
|
||||
and principles for securely deploying QEMU.
|
||||
|
||||
Security Requirements
|
||||
---------------------
|
||||
|
||||
QEMU supports many different use cases, some of which have stricter security
|
||||
requirements than others. The community has agreed on the overall security
|
||||
requirements that users may depend on. These requirements define what is
|
||||
considered supported from a security perspective.
|
||||
|
||||
Virtualization Use Case
|
||||
'''''''''''''''''''''''
|
||||
|
||||
The virtualization use case covers cloud and virtual private server (VPS)
|
||||
hosting, as well as traditional data center and desktop virtualization. These
|
||||
use cases rely on hardware virtualization extensions to execute guest code
|
||||
safely on the physical CPU at close-to-native speed.
|
||||
|
||||
The following entities are untrusted, meaning that they may be buggy or
|
||||
malicious:
|
||||
|
||||
- Guest
|
||||
- User-facing interfaces (e.g. VNC, SPICE, WebSocket)
|
||||
- Network protocols (e.g. NBD, live migration)
|
||||
- User-supplied files (e.g. disk images, kernels, device trees)
|
||||
- Passthrough devices (e.g. PCI, USB)
|
||||
|
||||
Bugs affecting these entities are evaluated on whether they can cause damage in
|
||||
real-world use cases and treated as security bugs if this is the case.
|
||||
|
||||
Non-virtualization Use Case
|
||||
'''''''''''''''''''''''''''
|
||||
|
||||
The non-virtualization use case covers emulation using the Tiny Code Generator
|
||||
(TCG). In principle the TCG and device emulation code used in conjunction with
|
||||
the non-virtualization use case should meet the same security requirements as
|
||||
the virtualization use case. However, for historical reasons much of the
|
||||
non-virtualization use case code was not written with these security
|
||||
requirements in mind.
|
||||
|
||||
Bugs affecting the non-virtualization use case are not considered security
|
||||
bugs at this time. Users with non-virtualization use cases must not rely on
|
||||
QEMU to provide guest isolation or any security guarantees.
|
||||
|
||||
Architecture
|
||||
------------
|
||||
|
||||
This section describes the design principles that ensure the security
|
||||
requirements are met.
|
||||
|
||||
Guest Isolation
|
||||
'''''''''''''''
|
||||
|
||||
Guest isolation is the confinement of guest code to the virtual machine. When
|
||||
guest code gains control of execution on the host this is called escaping the
|
||||
virtual machine. Isolation also includes resource limits such as throttling of
|
||||
CPU, memory, disk, or network. Guests must be unable to exceed their resource
|
||||
limits.
|
||||
|
||||
QEMU presents an attack surface to the guest in the form of emulated devices.
|
||||
The guest must not be able to gain control of QEMU. Bugs in emulated devices
|
||||
could allow malicious guests to gain code execution in QEMU. At this point the
|
||||
guest has escaped the virtual machine and is able to act in the context of the
|
||||
QEMU process on the host.
|
||||
|
||||
Guests often interact with other guests and share resources with them. A
|
||||
malicious guest must not gain control of other guests or access their data.
|
||||
Disk image files and network traffic must be protected from other guests unless
|
||||
explicitly shared between them by the user.
|
||||
|
||||
Principle of Least Privilege
|
||||
''''''''''''''''''''''''''''
|
||||
|
||||
The principle of least privilege states that each component only has access to
|
||||
the privileges necessary for its function. In the case of QEMU this means that
|
||||
each process only has access to resources belonging to the guest.
|
||||
|
||||
The QEMU process should not have access to any resources that are inaccessible
|
||||
to the guest. This way the guest does not gain anything by escaping into the
|
||||
QEMU process since it already has access to those same resources from within
|
||||
the guest.
|
||||
|
||||
Following the principle of least privilege immediately fulfills guest isolation
|
||||
requirements. For example, guest A only has access to its own disk image file
|
||||
``a.img`` and not guest B's disk image file ``b.img``.
|
||||
|
||||
In reality certain resources are inaccessible to the guest but must be
|
||||
available to QEMU to perform its function. For example, host system calls are
|
||||
necessary for QEMU but are not exposed to guests. A guest that escapes into
|
||||
the QEMU process can then begin invoking host system calls.
|
||||
|
||||
New features must be designed to follow the principle of least privilege.
|
||||
Should this not be possible for technical reasons, the security risk must be
|
||||
clearly documented so users are aware of the trade-off of enabling the feature.
|
||||
|
||||
Isolation mechanisms
|
||||
''''''''''''''''''''
|
||||
|
||||
Several isolation mechanisms are available to realize this architecture of
|
||||
guest isolation and the principle of least privilege. With the exception of
|
||||
Linux seccomp, these mechanisms are all deployed by management tools that
|
||||
launch QEMU, such as libvirt. They are also platform-specific so they are only
|
||||
described briefly for Linux here.
|
||||
|
||||
The fundamental isolation mechanism is that QEMU processes must run as
|
||||
unprivileged users. Sometimes it seems more convenient to launch QEMU as
|
||||
root to give it access to host devices (e.g. ``/dev/net/tun``) but this poses a
|
||||
huge security risk. File descriptor passing can be used to give an otherwise
|
||||
unprivileged QEMU process access to host devices without running QEMU as root.
|
||||
It is also possible to launch QEMU as a non-root user and configure UNIX groups
|
||||
for access to ``/dev/kvm``, ``/dev/net/tun``, and other device nodes.
|
||||
Some Linux distros already ship with UNIX groups for these devices by default.
|
||||
|
||||
- SELinux and AppArmor make it possible to confine processes beyond the
|
||||
traditional UNIX process and file permissions model. They restrict the QEMU
|
||||
process from accessing processes and files on the host system that are not
|
||||
needed by QEMU.
|
||||
|
||||
- Resource limits and cgroup controllers provide throughput and utilization
|
||||
limits on key resources such as CPU time, memory, and I/O bandwidth.
|
||||
|
||||
- Linux namespaces can be used to make process, file system, and other system
|
||||
resources unavailable to QEMU. A namespaced QEMU process is restricted to only
|
||||
those resources that were granted to it.
|
||||
|
||||
- Linux seccomp is available via the QEMU ``--sandbox`` option. It disables
|
||||
system calls that are not needed by QEMU, thereby reducing the host kernel
|
||||
attack surface.
|
||||
|
||||
Sensitive configurations
|
||||
------------------------
|
||||
|
||||
There are aspects of QEMU that can have security implications which users &
|
||||
management applications must be aware of.
|
||||
|
||||
Monitor console (QMP and HMP)
|
||||
'''''''''''''''''''''''''''''
|
||||
|
||||
The monitor console (whether used with QMP or HMP) provides an interface
|
||||
to dynamically control many aspects of QEMU's runtime operation. Many of the
|
||||
commands exposed will instruct QEMU to access content on the host file system
|
||||
and/or trigger spawning of external processes.
|
||||
|
||||
For example, the ``migrate`` command allows for the spawning of arbitrary
|
||||
processes for the purpose of tunnelling the migration data stream. The
|
||||
``blockdev-add`` command instructs QEMU to open arbitrary files, exposing
|
||||
their content to the guest as a virtual disk.
|
||||
|
||||
Unless QEMU is otherwise confined using technologies such as SELinux, AppArmor,
|
||||
or Linux namespaces, the monitor console should be considered to have privileges
|
||||
equivalent to those of the user account QEMU is running under.
|
||||
|
||||
It is further important to consider the security of the character device backend
|
||||
over which the monitor console is exposed. It needs to have protection against
|
||||
malicious third parties which might try to make unauthorized connections, or
|
||||
perform man-in-the-middle attacks. Many of the character device backends do not
|
||||
satisfy this requirement and so must not be used for the monitor console.
|
||||
|
||||
The general recommendation is that the monitor console should be exposed over
|
||||
a UNIX domain socket backend to the local host only. Use of the TCP based
|
||||
character device backend is inappropriate unless configured to use both TLS
|
||||
encryption and authorization control policy on client connections.
|
||||
|
||||
In summary, the monitor console is considered a privileged control interface to
|
||||
QEMU and as such should only be made accessible to a trusted management
|
||||
application or user.
|
Loading…
Reference in New Issue
Block a user