mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Thomas Huth	e81e7b52f9	seccomp: Work-around GCC 4.x bug in gnu99 mode We'd like to compile QEMU with -std=gnu99, but GCC 4.8 currently fails to compile qemu-seccomp.c in this mode: qemu-seccomp.c:45:1: error: initializer element is not constant }; ^ qemu-seccomp.c:45:1: error: (near initialization for ‘sched_setscheduler_arg[0]’) This is due to a compiler bug which has just been fixed in GCC 5.0: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63567 Since we still want to support GCC 4.8 for a while and also want to use gnu99 mode, work-around the issue by expanding the macro manually. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Eduardo Otubo <otubo@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2019-01-22 06:26:32 +01:00
Markus Armbruster	6548459769	seccomp: Clean up error reporting in parse_sandbox() Calling error_report() in a function that takes an Error ** argument is suspicious. parse_sandbox() does that, and then fails without setting an error. Its caller main(), via qemu_opts_foreach(), is fine with it, but clean it up anyway. Cc: Eduardo Otubo <otubo@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Acked-by: Eduardo Otubo <otubo@redhat.com> Message-Id: <20181017082702.5581-18-armbru@redhat.com>	2018-10-19 14:51:34 +02:00
Marc-André Lureau	5780760f5e	seccomp: check TSYNC host capability Remove -sandbox option if the host is not capable of TSYNC, since the sandbox will fail at setup time otherwise. This will help libvirt, for ex, to figure out if -sandbox will work. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Eduardo Otubo <otubo@redhat.com> Acked-by: Eduardo Otubo <otubo@redhat.com>	2018-09-26 15:07:35 +02:00
Marc-André Lureau	70dfabeaa7	seccomp: set the seccomp filter to all threads When using "-seccomp on", the seccomp policy is only applied to the main thread, the vcpu worker thread and other worker threads created after seccomp policy is applied; the seccomp policy is not applied to e.g. the RCU thread because it is created before the seccomp policy is applied and SECCOMP_FILTER_FLAG_TSYNC isn't used. This can be verified with for task in /proc/`pidof qemu`/task/*; do cat $task/status \| grep Secc ; done Seccomp: 2 Seccomp: 0 Seccomp: 0 Seccomp: 2 Seccomp: 2 Seccomp: 2 Starting with libseccomp 2.2.0 and kernel >= 3.17, we can use seccomp_attr_set(ctx, > SCMP_FLTATR_CTL_TSYNC, 1) to update the policy on all threads. libseccomp requirement was bumped to 2.2.0 in previous patch. libseccomp should fail to set the filter if it can't honour SCMP_FLTATR_CTL_TSYNC (untested), and thus -sandbox will now fail on kernel < 3.17. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Acked-by: Eduardo Otubo <otubo@redhat.com>	2018-08-23 16:45:44 +02:00
Marc-André Lureau	bda08a5764	seccomp: prefer SCMP_ACT_KILL_PROCESS if available The upcoming libseccomp release should have SCMP_ACT_KILL_PROCESS action (https://github.com/seccomp/libseccomp/issues/96). SCMP_ACT_KILL_PROCESS is preferable to immediately terminate the offending process, rather than having the SIGSYS handler running. Use SECCOMP_GET_ACTION_AVAIL to check availability of kernel support, as libseccomp will fallback on SCMP_ACT_KILL otherwise, and we still prefer SCMP_ACT_TRAP. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Acked-by: Eduardo Otubo <otubo@redhat.com>	2018-08-23 16:45:23 +02:00
Marc-André Lureau	6f2231e9b0	seccomp: use SIGSYS signal instead of killing the thread The seccomp action SCMP_ACT_KILL results in immediate termination of the thread that made the bad system call. However, qemu being multi-threaded, it keeps running. There is no easy way for parent process / management layer (libvirt) to know about that situation. Instead, the default SIGSYS handler when invoked with SCMP_ACT_TRAP will terminate the program and core dump. This may not be the most secure solution, but probably better than just killing the offending thread. SCMP_ACT_KILL_PROCESS has been added in Linux 4.14 to improve the situation, which I propose to use by default if available in the next patch. Related to: https://bugzilla.redhat.com/show_bug.cgi?id=1594456 Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Acked-by: Eduardo Otubo <otubo@redhat.com>	2018-08-23 16:45:20 +02:00
Marc-André Lureau	056de1e894	seccomp: allow sched_setscheduler() with SCHED_IDLE policy Current and upcoming mesa releases rely on a shader disk cash. It uses a thread job queue with low priority, set with sched_setscheduler(SCHED_IDLE). However, that syscall is rejected by the "resourcecontrol" seccomp qemu filter. Since it should be safe to allow lowering thread priority, let's allow scheduling thread to idle policy. Related to: https://bugzilla.redhat.com/show_bug.cgi?id=1594456 Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Acked-by: Eduardo Otubo <otubo@redhat.com>	2018-07-12 14:52:39 +02:00
Yi Min Zhao	9d0fdecbad	sandbox: disable -sandbox if CONFIG_SECCOMP undefined If CONFIG_SECCOMP is undefined, the option 'elevatedprivileges' remains compiled. This would make libvirt set the corresponding capability and then trigger failure during guest startup. This patch moves the code regarding seccomp command line options to qemu-seccomp.c file and wraps qemu_opts_foreach finding sandbox option with CONFIG_SECCOMP. Because parse_sandbox() is moved into qemu-seccomp.c file, change seccomp_start() to static function. Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Tested-by: Ján Tomko <jtomko@redhat.com> Acked-by: Eduardo Otubo <otubo@redhat.com>	2018-06-01 13:44:15 +02:00
Eduardo Otubo	24f8cdc572	seccomp: add resourcecontrol argument to command line This patch adds [,resourcecontrol=deny] to `-sandbox on' option. It blacklists all process affinity and scheduler priority system calls to avoid any bigger of the process. Signed-off-by: Eduardo Otubo <otubo@redhat.com>	2017-09-15 10:15:06 +02:00
Eduardo Otubo	995a226f88	seccomp: add spawn argument to command line This patch adds [,spawn=deny] argument to `-sandbox on' option. It blacklists fork and execve system calls, avoiding Qemu to spawn new threads or processes. Signed-off-by: Eduardo Otubo <otubo@redhat.com>	2017-09-15 10:15:06 +02:00
Eduardo Otubo	73a1e64725	seccomp: add elevateprivileges argument to command line This patch introduces the new argument [,elevateprivileges=allow\|deny\|children] to the `-sandbox on'. It allows or denies Qemu process to elevate its privileges by blacklisting all set*uid\|gid system calls. The 'children' option will let forks and execves run unprivileged. Signed-off-by: Eduardo Otubo <otubo@redhat.com>	2017-09-15 10:15:06 +02:00
Eduardo Otubo	2b716fa6d6	seccomp: add obsolete argument to command line This patch introduces the argument [,obsolete=allow] to the `-sandbox on' option. It allows Qemu to run safely on old system that still relies on old system calls. Signed-off-by: Eduardo Otubo <otubo@redhat.com>	2017-09-15 10:15:05 +02:00
Eduardo Otubo	1bd6152ae2	seccomp: changing from whitelist to blacklist This patch changes the default behavior of the seccomp filter from whitelist to blacklist. By default now all system calls are allowed and a small black list of definitely forbidden ones was created. Signed-off-by: Eduardo Otubo <otubo@redhat.com>	2017-09-15 10:13:35 +02:00
Eduardo Otubo	cf9dc9e480	seccomp: adding getrusage to the whitelist getrusage is used in a number of places throughout the qemu codebase (notably, in crypto/pbkdf.c). Without this syscall being whitelisted, qemu ends up getting killed by the kernel whenever you try to connect to a VNC console. Signed-off-by: Brian Rak <brak@gameservers.com> Acked-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>	2016-09-21 11:26:02 +02:00
Miroslav Rezanina	8e08f8a4a7	seccomp: adding sysinfo system call to whitelist Newer version of nss-softokn libraries (> 3.16.2.3) use sysinfo call so qemu using rbd image hang after start when run in sandbox mode. To allow using rbd images in sandbox mode we have to whitelist it. Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com> Acked-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>	2016-04-16 20:27:44 +02:00
James Hogan	81bed73b53	seccomp: Whitelist cacheflush since 2.2.0 not 2.2.3 The cacheflush system call (found on MIPS and ARM) has been included in the libseccomp header since 2.2.0, so include it back to that version. Previously it was only enabled since 2.2.3 since that is when it was enabled properly for ARM. This will allow seccomp support to be enabled for MIPS back to libseccomp 2.2.0. Signed-off-by: James Hogan <james.hogan@imgtec.com> Reviewed-By: Andrew Jones <drjones@redhat.com> Acked-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>	2016-04-16 20:27:41 +02:00
Peter Maydell	d38ea87ac5	all: Clean up includes Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1454089805-5470-16-git-send-email-peter.maydell@linaro.org	2016-02-04 17:41:30 +00:00
Andrew Jones	47d2067af3	seccomp: add cacheflush to whitelist cacheflush is an arm-specific syscall that qemu built for arm uses. Add it to the whitelist, but only if we're linking with a recent enough libseccomp. Signed-off-by: Andrew Jones <drjones@redhat.com>	2015-11-16 09:48:53 +01:00
Eduardo Otubo	f8d82b8eb8	seccomp: add memfd_create to whitelist This is used by memfd code. Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Thibaut Collet <thibaut.collet@6wind.com>	2015-10-22 14:34:50 +03:00
Paolo Bonzini	4b45b05549	seccomp: add mlockall to whitelist This is used by "-realtime mlock=on". Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Tested-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>	2015-01-23 14:07:08 +01:00
Paul Moore	ea259acae5	seccomp: add mbind() to the syscall whitelist The "memory-backend-ram" QOM object utilizes the mbind(2) syscall to set the policy for a memory range. Add the syscall to the seccomp sandbox whitelist. Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com> Acked-by: Eduardo Otubo <eduardo.otubo@profitbricks.com> Tested-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>	2015-01-05 18:13:38 +01:00
Philipp Gesang	f73adec709	seccomp: whitelist syscalls fallocate(), fadvise64(), inotify_init1() and inotify_add_watch() fallocate() is needed for snapshotting. If it isn’t whitelisted $ qemu-img create -f qcow2 x.qcow 1G Formatting 'x.qcow', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 lazy_refcounts=off $ qemu-kvm -display none -monitor stdio -sandbox on x.qcow QEMU 2.1.50 monitor - type 'help' for more information (qemu) savevm foo (qemu) loadvm foo will fail, as will subsequent savevm commands on the same image. fadvise64(), inotify_init1(), inotify_add_watch() are needed by the SDL display. Without the whitelist entries, qemu-kvm -sandbox on fails immediately. In my tests fadvise64() is called 50--51 times per VM run. That number seems independent of the duration of the run. fallocate(), inotify_init1(), inotify_add_watch() are called once each. Accordingly, they are added to the whitelist at a very low priority. Signed-off-by: Philipp Gesang <philipp.gesang@intra2net.com> Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>	2014-11-11 17:01:35 +01:00
Paul Moore	b22876cc2f	seccomp: add semctl() to the syscall whitelist QEMU needs to call semctl() for correct operation. This particular problem was identified on shutdown with the following commandline: # qemu -sandbox on -monitor stdio \ -device intel-hda -device hda-duplex -vnc :0 Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>	2014-08-21 10:29:16 +02:00
Paul Moore	e3f9bb011a	seccomp: add shmctl(), mlock(), and munlock() to the syscall whitelist Additional testing reveals that PulseAudio requires shmctl() and the mlock()/munlock() syscalls on some systems/configurations. As before, on systems that do require these syscalls, the problem can be seen with the following command line: # qemu -monitor stdio -sandbox on \ -device intel-hda -device hda-duplex Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>	2014-04-25 14:52:03 -03:00
Felix Geyer	8439761852	seccomp: add timerfd_create and timerfd_settime to the whitelist libusb calls timerfd_create() and timerfd_settime() when it's built with timerfd support. Command to reproduce: -device usb-host,hostbus=1,hostaddr=3,id=hostdev0 Log messages: audit(1390730418.924:135): auid=4294967295 uid=121 gid=103 ses=4294967295 pid=5232 comm="qemu-system-x86" sig=31 syscall=283 compat=0 ip=0x7f2b0f4e96a7 code=0x0 audit(1390733100.580:142): auid=4294967295 uid=121 gid=103 ses=4294967295 pid=16909 comm="qemu-system-x86" sig=31 syscall=286 compat=0 ip=0x7f03513a06da code=0x0 Reading a few hundred MB from a USB drive on x86_64 shows this syscall distribution. Therefore the timerfd_settime priority is set to 242. calls syscall --------- ---------------- 5303600 write 2240554 read 2167030 ppoll 2134828 ioctl 704023 timerfd_settime 689105 poll 83122 futex 803 writev 476 rt_sigprocmask 287 recvmsg 178 brk Signed-off-by: Felix Geyer <debfx@fobos.de> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>	2014-04-25 14:51:59 -03:00
Paul Moore	918b94e287	seccomp: add some basic shared memory syscalls to the whitelist PulseAudio requires the use of shared memory so add shmget(), shmat(), and shmdt() to the syscall whitelist. Reported-by: xuhan@redhat.com Signed-off-by: Paul Moore <pmoore@redhat.com>	2014-01-20 11:19:34 -02:00
Paul Moore	0c2acb163f	seccomp: add mkdir() and fchmod() to the whitelist The PulseAudio library attempts to do a mkdir(2) and fchmod(2) on "/run/user/<UID>/pulse" which is currently blocked by the syscall filter; this patch adds the two missing syscalls to the whitelist. You can reproduce this problem with the following command: # qemu -monitor stdio -device intel-hda -device hda-duplex If watched under strace the following syscalls are shown: mkdir("/run/user/0/pulse", 0700) fchmod(11, 0700) [NOTE: 11 is the fd for /run/user/0/pulse] Reported-by: xuhan@redhat.com Signed-off-by: Paul Moore <pmoore@redhat.com>	2014-01-20 11:19:29 -02:00
Corey Bryant	2a13f99112	seccomp: exit if seccomp_init() fails This fixes a bug where we weren't exiting if seccomp_init() failed. Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> Acked-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Acked-by: Paul Moore <pmoore@redhat.com>	2013-12-20 16:38:29 -02:00
Paul Moore	e9eecb5bf8	seccomp: add kill() to the syscall whitelist The kill() syscall is triggered with the following command: # qemu -sandbox on -monitor stdio \ -device intel-hda -device hda-duplex -vnc :0 The resulting syslog/audit message: # ausearch -m SECCOMP ---- time->Wed Nov 20 09:52:08 2013 type=SECCOMP msg=audit(1384912328.482:6656): auid=0 uid=0 gid=0 ses=854 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=12087 comm="qemu-kvm" sig=31 syscall=62 compat=0 ip=0x7f7a1d2abc67 code=0x0 # scmp_sys_resolver 62 kill Reported-by: CongLi <coli@redhat.com> Tested-by: CongLi <coli@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>	2013-12-03 10:21:32 -02:00
Eduardo Otubo	c236f4519c	seccomp: fine tuning whitelist by adding times() This was causing Qemu process to hang when using -sandbox on as discribed on RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1004175 Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Tested-by: Paul Moore <pmoore@redhat.com> Acked-by: Paul Moore <pmoore@redhat.com>	2013-09-24 15:15:16 -03:00
Paul Moore	d2509b667c	seccomp: add arch_prctl() to the syscall whitelist It appears that even a very simple /etc/qemu-ifup configuration can require the arch_prctl() syscall, see the example below: #!/bin/sh /sbin/ifconfig $1 0.0.0.0 up /usr/sbin/brctl addif <switch> $1 Signed-off-by: Paul Moore <pmoore@redhat.com> Reviewed-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Message-id: 20130718135703.8247.19213.stgit@localhost Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-07-29 19:56:52 -05:00
Paul Moore	94113bd8a1	seccomp: add additional asynchronous I/O syscalls A previous commit, "seccomp: add the asynchronous I/O syscalls to the whitelist", added several asynchronous I/O syscalls but left out the io_submit() and io_cancel() syscalls. This patch corrects this by adding the two missing asynchronous I/O syscalls. Signed-off-by: Paul Moore <pmoore@redhat.com> Reviewed-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Message-id: 20130715193201.943.4913.stgit@localhost Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-07-29 19:56:52 -05:00
Eduardo Otubo	2fb861eb02	seccomp: removing unused syscalls gtom whitelist v3 update: - reincluding getrlimit(), it is used by Xen. v2 update: - reincluding setrlimit(), it is used by Xen. Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1374518017-10424-3-git-send-email-otubo@linux.vnet.ibm.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-07-26 16:54:08 -05:00
Eduardo Otubo	7d7b2ad436	seccomp: no need to check arch in syscall whitelist v2 update: - set libseccomp 2.1.0 as requirement on configure script. Since libseccomp 2.0 there's no need to check the architecture type anymore. Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1374518017-10424-2-git-send-email-otubo@linux.vnet.ibm.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-07-26 16:54:08 -05:00
Paul Moore	fd21faadb1	seccomp: add the asynchronous I/O syscalls to the whitelist In order to enable the asynchronous I/O functionality when using the seccomp sandbox we need to add the associated syscalls to the whitelist. Signed-off-by: Paul Moore <pmoore@redhat.com> Reviewed-by: Corey Bryant <coreyb@linux.vnet.ibm.com> Message-id: 20130529203001.20939.83322.stgit@localhost Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-05-30 11:46:07 -05:00
Paolo Bonzini	9c17d615a6	softmmu: move include files to include/sysemu/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:32:45 +01:00
Eduardo Otubo	fe512d65e0	seccomp: adding new syscalls (bugzilla 855162) According to the bug 855162[0] - there's the need of adding new syscalls to the whitelist when using Qemu with Libvirt. [0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162 Reported-by: Paul Moore <pmoore@redhat.com> Tested-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2012-11-30 08:27:27 -06:00
Eduardo Otubo	2f668be775	Adding qemu-seccomp.[ch] (v8) Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> --- v1: - I added a syscall struct using priority levels as described in the libseccomp man page. The priority numbers are based to the frequency they appear in a sample strace from a regular qemu guest run under libvirt. Libseccomp generates linear BPF code to filter system calls, those rules are read one after another. The priority system places the most common rules first in order to reduce the overhead when processing them. v1 -> v2: - Fixed some style issues - Removed code from vl.c and created qemu-seccomp.[ch] - Now using ARRAY_SIZE macro - Added more syscalls without priority/frequency set yet v2 -> v3: - Adding copyright and license information - Replacing seccomp_whitelist_count just by ARRAY_SIZE - Adding header protection to qemu-seccomp.h - Moving QemuSeccompSyscall definition to qemu-seccomp.c - Negative return from seccomp_start is fatal now. - Adding open() and execve() to the whitelis v3 -> v4: - Tests revealed a bigger set of syscalls. - seccomp_start() now has an argument to set the mode according to the configure option trap or kill. v4 -> v5: - Tests on x86_64 required a new specific set of system calls. - libseccomp release 1.0.0: part of the API have changed in this last release, had to adapt to the new function signatures.	2012-08-16 13:41:16 -05:00

38 Commits