cacheflush is an arm-specific syscall that qemu built for arm
uses. Add it to the whitelist, but only if we're linking with
a recent enough libseccomp.
Signed-off-by: Andrew Jones <drjones@redhat.com>
This is used by memfd code.
Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
This is used by "-realtime mlock=on".
Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Tested-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
The "memory-backend-ram" QOM object utilizes the mbind(2) syscall to
set the policy for a memory range. Add the syscall to the seccomp
sandbox whitelist.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
Acked-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
Tested-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
fallocate() is needed for snapshotting. If it isn’t whitelisted
$ qemu-img create -f qcow2 x.qcow 1G
Formatting 'x.qcow', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 lazy_refcounts=off
$ qemu-kvm -display none -monitor stdio -sandbox on x.qcow
QEMU 2.1.50 monitor - type 'help' for more information
(qemu) savevm foo
(qemu) loadvm foo
will fail, as will subsequent savevm commands on the same image.
fadvise64(), inotify_init1(), inotify_add_watch() are needed by
the SDL display. Without the whitelist entries,
qemu-kvm -sandbox on
fails immediately.
In my tests fadvise64() is called 50--51 times per VM run. That
number seems independent of the duration of the run. fallocate(),
inotify_init1(), inotify_add_watch() are called once each.
Accordingly, they are added to the whitelist at a very low
priority.
Signed-off-by: Philipp Gesang <philipp.gesang@intra2net.com>
Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
QEMU needs to call semctl() for correct operation. This particular
problem was identified on shutdown with the following commandline:
# qemu -sandbox on -monitor stdio \
-device intel-hda -device hda-duplex -vnc :0
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
Additional testing reveals that PulseAudio requires shmctl() and the
mlock()/munlock() syscalls on some systems/configurations. As before,
on systems that do require these syscalls, the problem can be seen with
the following command line:
# qemu -monitor stdio -sandbox on \
-device intel-hda -device hda-duplex
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
PulseAudio requires the use of shared memory so add shmget(), shmat(),
and shmdt() to the syscall whitelist.
Reported-by: xuhan@redhat.com
Signed-off-by: Paul Moore <pmoore@redhat.com>
The PulseAudio library attempts to do a mkdir(2) and fchmod(2) on
"/run/user/<UID>/pulse" which is currently blocked by the syscall
filter; this patch adds the two missing syscalls to the whitelist.
You can reproduce this problem with the following command:
# qemu -monitor stdio -device intel-hda -device hda-duplex
If watched under strace the following syscalls are shown:
mkdir("/run/user/0/pulse", 0700)
fchmod(11, 0700) [NOTE: 11 is the fd for /run/user/0/pulse]
Reported-by: xuhan@redhat.com
Signed-off-by: Paul Moore <pmoore@redhat.com>
This fixes a bug where we weren't exiting if seccomp_init() failed.
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
Acked-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
Acked-by: Paul Moore <pmoore@redhat.com>
This was causing Qemu process to hang when using -sandbox on as
discribed on RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1004175
Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
Tested-by: Paul Moore <pmoore@redhat.com>
Acked-by: Paul Moore <pmoore@redhat.com>
It appears that even a very simple /etc/qemu-ifup configuration can
require the arch_prctl() syscall, see the example below:
#!/bin/sh
/sbin/ifconfig $1 0.0.0.0 up
/usr/sbin/brctl addif <switch> $1
Signed-off-by: Paul Moore <pmoore@redhat.com>
Reviewed-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
Message-id: 20130718135703.8247.19213.stgit@localhost
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
A previous commit, "seccomp: add the asynchronous I/O syscalls to the
whitelist", added several asynchronous I/O syscalls but left out the
io_submit() and io_cancel() syscalls. This patch corrects this by
adding the two missing asynchronous I/O syscalls.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Reviewed-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
Message-id: 20130715193201.943.4913.stgit@localhost
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
v3 update:
- reincluding getrlimit(), it is used by Xen.
v2 update:
- reincluding setrlimit(), it is used by Xen.
Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1374518017-10424-3-git-send-email-otubo@linux.vnet.ibm.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
v2 update:
- set libseccomp 2.1.0 as requirement on configure script.
Since libseccomp 2.0 there's no need to check the architecture type
anymore.
Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1374518017-10424-2-git-send-email-otubo@linux.vnet.ibm.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
In order to enable the asynchronous I/O functionality when using the
seccomp sandbox we need to add the associated syscalls to the
whitelist.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Reviewed-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
Message-id: 20130529203001.20939.83322.stgit@localhost
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
According to the bug 855162[0] - there's the need of adding new syscalls
to the whitelist when using Qemu with Libvirt.
[0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162
Reported-by: Paul Moore <pmoore@redhat.com>
Tested-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
---
v1:
- I added a syscall struct using priority levels as described in the
libseccomp man page. The priority numbers are based to the frequency
they appear in a sample strace from a regular qemu guest run under
libvirt.
Libseccomp generates linear BPF code to filter system calls, those rules
are read one after another. The priority system places the most common
rules first in order to reduce the overhead when processing them.
v1 -> v2:
- Fixed some style issues
- Removed code from vl.c and created qemu-seccomp.[ch]
- Now using ARRAY_SIZE macro
- Added more syscalls without priority/frequency set yet
v2 -> v3:
- Adding copyright and license information
- Replacing seccomp_whitelist_count just by ARRAY_SIZE
- Adding header protection to qemu-seccomp.h
- Moving QemuSeccompSyscall definition to qemu-seccomp.c
- Negative return from seccomp_start is fatal now.
- Adding open() and execve() to the whitelis
v3 -> v4:
- Tests revealed a bigger set of syscalls.
- seccomp_start() now has an argument to set the mode according to the
configure option trap or kill.
v4 -> v5:
- Tests on x86_64 required a new specific set of system calls.
- libseccomp release 1.0.0: part of the API have changed in this last
release, had to adapt to the new function signatures.