Commit Graph

280 Commits

Author SHA1 Message Date
Anthony Harivel
84369d7621 tools: build qemu-vmsr-helper
Introduce a privileged helper to access RAPL MSR.

The privileged helper tool, qemu-vmsr-helper, is designed to provide
virtual machines with the ability to read specific RAPL (Running Average
Power Limit) MSRs without requiring CAP_SYS_RAWIO privileges or relying
on external, out-of-tree patches.

The helper tool leverages Unix permissions and SO_PEERCRED socket
options to enforce access control, ensuring that only processes
explicitly requesting read access via readmsr() from a valid Thread ID
can access these MSRs.

The list of RAPL MSRs that are allowed to be read by the helper tool is
defined in rapl-msr-index.h. This list corresponds to the RAPL MSRs that
will be supported in the next commit titled "Add support for RAPL MSRs
in KVM/QEMU."

The tool is intentionally designed to run on the Linux x86 platform.
This initial implementation is tailored for Intel CPUs but can be
extended to support AMD CPUs in the future.

Signed-off-by: Anthony Harivel <aharivel@redhat.com>
Link: https://lore.kernel.org/r/20240522153453.1230389-3-aharivel@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22 13:50:21 +02:00
Akihiko Odaki
6832aa80a1 ebpf: Add a separate target for skeleton
This generalizes the rule to generate the skeleton and allows to add
another.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-06-04 15:14:26 +08:00
Akihiko Odaki
f5c69e7ab2 ebpf: Refactor tun_rss_steering_prog()
This saves branches and makes later BPF program changes easier.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-06-04 15:14:26 +08:00
Akihiko Odaki
8dc8220e23 ebpf: Return 0 when configuration fails
The kernel interprets the returned value as an unsigned 32-bit so -1
will mean queue 4294967295, which is awkward. Return 0 instead.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-06-04 15:14:26 +08:00
Akihiko Odaki
72fa42cfca ebpf: Fix RSS error handling
calculate_rss_hash() was using hash value 0 to tell if it calculated
a hash, but the hash value may be 0 on a rare occasion. Have a
distinct bool value for correctness.

Fixes: f3fa412de2 ("ebpf: Added eBPF RSS program.")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-06-04 15:14:26 +08:00
Andrew Melnychenko
0cc14182ab ebpf: Updated eBPF program and skeleton.
Updated section name, so libbpf should init/gues proper
program type without specifications during open/load.
Also, added map_flags with explicitly declared BPF_F_MMAPABLE.
Added check for BPF_F_MMAPABLE flag to meson script and
requirements to libbpf version.
Also changed fragmentation flag check - some TCP/UDP packets
may be considered fragmented if DF flag is set.

Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-03-12 19:31:47 +08:00
Shreesh Adiga
197a137290 ebpf: fix compatibility with libbpf 1.0+
The current implementation fails to load on a system with
libbpf 1.0 and reports that legacy map definitions in 'maps'
section are not supported by libbpf v1.0+. This commit updates
the Makefile to add BTF (-g flag) and appropriately updates
the maps in rss.bpf.c and update the skeleton file in repo.

Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2023-03-10 17:26:47 +08:00
Dr. David Alan Gilbert
e0dc2631ec virtiofsd: Remove source
Now remove all the source.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-02-16 18:15:08 +00:00
Dr. David Alan Gilbert
8ab5e8a503 virtiofsd: Remove build and docs glue
Remove all the virtiofsd build and docs infrastructure.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-02-16 18:15:08 +00:00
Daniel P. Berrangé
e4418354c0 tools/virtiofsd: add G_GNUC_PRINTF for logging functions
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20221219130205.687815-4-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-11 10:44:34 +01:00
Markus Armbruster
66997c42e0 cleanup: Tweak and re-run return_directly.cocci
Tweak the semantic patch to drop redundant parenthesis around the
return expression.

Coccinelle drops a comment in hw/rdma/vmw/pvrdma_cmd.c; restored
manually.

Coccinelle messes up vmdk_co_create(), not sure why.  Change dropped,
will be done manually in the next commit.

Line breaks in target/avr/cpu.h and hw/rdma/vmw/pvrdma_cmd.c tidied up
manually.

Whitespace in tools/virtiofsd/fuse_lowlevel.c tidied up manually.

checkpatch.pl complains "return of an errno should typically be -ve"
two times for hw/9pfs/9p-synth.c.  Preexisting, the patch merely makes
it visible to checkpatch.pl.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221122134917.1217307-2-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-12-14 16:19:35 +01:00
Marc Hartmayer
c23a956366 virtiofsd: Add sigreturn to the seccomp whitelist
The virtiofsd currently crashes on s390x. This is because of a
`sigreturn` system call. See audit log below:

type=SECCOMP msg=audit(1669382477.611:459): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 pid=6649 comm="virtiofsd" exe="/usr/libexec/virtiofsd" sig=31 arch=80000016 syscall=119 compat=0 ip=0x3fff15f748a code=0x80000000AUID="unset" UID="root" GID="root" ARCH=s390x SYSCALL=sigreturn

Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: German Maglione <gmaglione@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20221125143946.27717-1-mhartmay@linux.ibm.com>
2022-11-25 13:56:05 -05:00
Yusuke Okada
f16d15c927 virtiofsd: use g_date_time_get_microsecond to get subsecond
The "%f" specifier in g_date_time_format() is only available in glib
2.65.2 or later. If combined with older glib, the function returns null
and the timestamp displayed as "(null)".

For backward compatibility, g_date_time_get_microsecond should be used
to retrieve subsecond.

In this patch the g_date_time_format() leaves subsecond field as "%06d"
and let next snprintf to format with g_date_time_get_microsecond.

Signed-off-by: Yusuke Okada <okada.yusuke@jp.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-id: 20220818184618.2205172-1-yokada.996@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-09-22 13:13:47 -04:00
Stefan Weil
7b0ca31364 virtiofsd: Fix format strings
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-Id: <20220804074833.892604-1-sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-08-04 14:44:25 -04:00
Vivek Goyal
a21ba54dd5 virtiofsd: Disable killpriv_v2 by default
We are having bunch of issues with killpriv_v2 enabled by default. First
of all it relies on clearing suid/sgid bits as needed by dropping
capability CAP_FSETID. This does not work for remote filesystems like
NFS (and possibly others).

Secondly, we are noticing other issues related to clearing of SGID
which leads to failures for xfstests generic/355 and generic/193.

Thirdly, there are other issues w.r.t caching of metadata (suid/sgid)
bits in fuse client with killpriv_v2 enabled. Guest can cache that
data for sometime even if cleared on server.

Second and Third issue are fixable. Just that it might take a little
while to get it fixed in kernel. First one will probably not see
any movement for a long time.

Given these issues, killpriv_v2 does not seem to be a good candidate
for enabling by default. We have already disabled it by default in
rust version of virtiofsd.

Hence this patch disabled killpriv_v2 by default. User can choose to
enable it by passing option "-o killpriv_v2".

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <YuPd0itNIAz4tQRt@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-08-02 16:46:52 +01:00
Daniel P. Berrangé
7a21bee2aa misc: fix commonly doubled up words
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220707163720.1421716-5-berrange@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2022-08-01 11:58:02 +02:00
Dr. David Alan Gilbert
118d4ed045 Trivial: 3 char repeat typos
Inspired by Julia Lawall's fixing of Linux
kernel comments, I looked at qemu, although I did it manually.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Message-Id: <20220614104045.85728-2-dgilbert@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-06-28 11:06:02 +02:00
Markus Armbruster
52581c718c Clean up header guards that don't match their file name
Header guard symbols should match their file name to make guard
collisions less likely.

Cleaned up with scripts/clean-header-guards.pl, followed by some
renaming of new guard symbols picked by the script to better ones.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20220506134911.2856099-2-armbru@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
[Change to generated file ebpf/rss.bpf.skeleton.h backed out]
2022-05-11 16:49:06 +02:00
Paolo Bonzini
2a3129a376 meson: create have_vhost_* variables
When using Meson options rather than config-host.h, the "when" clauses
have to be changed to if statements (which is not necessarily great,
though at least it highlights which parts of the build are per-target
and which are not).

Do that before moving vhost logic to meson.build, though for now
the variables are just based on config-host.mak data.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-07 07:46:58 +02:00
Marc-André Lureau
bd2142c353 virtiofsd: replace pipe() with g_unix_open_pipe(CLOEXEC)
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2022-05-03 15:47:08 +04:00
Richard Henderson
c49abc8406 Pull request
Small contrib/vhost-user-blk, contrib/vhost-user-scsi, and tools/virtiofsd
 improvements.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmJmYGYACgkQnKSrs4Gr
 c8gNIAgAgCEeBMP61cdT8DGBBw26abmrNmCCjXYL3rNcR2GNsn0x9VbedBhSPt9O
 z+/nej9UkRKHgQ/+V1LqWD2D/TU327nLQ74z1JJvGtjWhvM18XTTAeh1BQbVywKU
 z+o6WSyP22Xx87cUIuOGGMgNDDfIY2j/t5sU8eR+lxXxDuKXx3tulTV65QlNSw9z
 19rb8eJkaau5YWhN5gPEI65O/YVgGUtA+c5z39AoBG85XAAhm+6+mTFfuy8J8gp/
 wqr61+xB7bB3AxIOv1/0PWCl3F/+kPs7ybJRGkHMNtKyJtp34Y86kwsVEBtOMGVO
 wm/ht7FMy2GhnaKGjNMtvJm29ZArqA==
 =zZcV
 -----END PGP SIGNATURE-----

Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging

Pull request

Small contrib/vhost-user-blk, contrib/vhost-user-scsi, and tools/virtiofsd
improvements.

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmJmYGYACgkQnKSrs4Gr
# c8gNIAgAgCEeBMP61cdT8DGBBw26abmrNmCCjXYL3rNcR2GNsn0x9VbedBhSPt9O
# z+/nej9UkRKHgQ/+V1LqWD2D/TU327nLQ74z1JJvGtjWhvM18XTTAeh1BQbVywKU
# z+o6WSyP22Xx87cUIuOGGMgNDDfIY2j/t5sU8eR+lxXxDuKXx3tulTV65QlNSw9z
# 19rb8eJkaau5YWhN5gPEI65O/YVgGUtA+c5z39AoBG85XAAhm+6+mTFfuy8J8gp/
# wqr61+xB7bB3AxIOv1/0PWCl3F/+kPs7ybJRGkHMNtKyJtp34Y86kwsVEBtOMGVO
# wm/ht7FMy2GhnaKGjNMtvJm29ZArqA==
# =zZcV
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 25 Apr 2022 01:48:38 AM PDT
# gpg:                using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]

* tag 'block-pull-request' of https://gitlab.com/stefanha/qemu:
  virtiofsd: Add docs/helper for killpriv_v2/no_killpriv_v2 option
  contrib/vhost-user-blk: add missing GOptionEntry NULL terminator
  Implements Backend Program conventions for vhost-user-scsi

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-04-25 10:21:56 -07:00
Marc-André Lureau
1fbf2665e6 util: replace qemu_get_local_state_pathname()
Simplify the function to only return the directory path. Callers are
adjusted to use the GLib function to build paths, g_build_filename().

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220420132624.2439741-39-marcandre.lureau@redhat.com>
2022-04-21 17:09:09 +04:00
Marc-André Lureau
49f9522193 include: rename qemu-common.h qemu/help-texts.h
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Warner Losh <imp@bsdimp.com>
Message-Id: <20220420132624.2439741-7-marcandre.lureau@redhat.com>
2022-04-21 16:58:24 +04:00
Liu Yiding
d45c83328f virtiofsd: Add docs/helper for killpriv_v2/no_killpriv_v2 option
virtiofsd has introduced killpriv_v2/no_killpriv_v2 for a while. Add
description of it to docs/helper.

Signed-off-by: Liu Yiding <liuyd.fnst@fujitsu.com>
Message-Id: <20220421095151.2231099-1-liuyd.fnst@fujitsu.com>

[Small documentation fixes: s/as client supports/as the client supports/
and s/.  /. /.
--Stefan]

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-04-21 12:05:15 +02:00
Hanna Reitz
4ce7a08d3e virtiofsd: Let meson check for statx.stx_mnt_id
In virtiofsd, we assume that the presence of the STATX_MNT_ID macro
implies existence of the statx.stx_mnt_id field.  Unfortunately, that is
not necessarily the case: glibc has introduced the macro in its commit
88a2cf6c4bab6e94a65e9c0db8813709372e9180, but the statx.stx_mnt_id field
is still missing from its own headers.

Let meson.build actually chek for both STATX_MNT_ID and
statx.stx_mnt_id, and set CONFIG_STATX_MNT_ID if both are present.
Then, use this config macro in virtiofsd.

Closes: https://gitlab.com/qemu-project/qemu/-/issues/882
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220223092340.9043-1-hreitz@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-03-02 18:12:40 +00:00
Peter Maydell
922268067f * More Meson conversions (0.59.x now required rather than suggested)
* UMIP support for TCG x86
 * Fix migration crash
 * Restore error output for check-block
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmITXP8UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOsdQf/Srx+8BImb+LtRpiKHhn4SiucGSe8
 EhEAPSnblbvIGK9BYfj953svDzlLN2JIADcmOI59QE2xsPEtxLlEmMlvg/JIUMQp
 jk07oxbVXdv4olTyECmO3hj2VbSG7VR3tP9TOuJA5Vi4a+VzYXc6zv1/mp/8rdnl
 pGW0pjBZTXSp2Z/Be9/aGN8IuW+GnQuVZDXWBuEJmz2UzcdPWaOUVDro7IaUXmqp
 eB4XcT0jPR5uKetA1R1cyHCUVd7P0v6UV8SLYj905H1a8sqxDWMiUzX6fKkoN0SJ
 r/y7kCuyNzpxoWRuA2KN6Q5f9kAlMI/j9H3ih0wUfEkauiPtTATAc1+s+Q==
 =sSBY
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging

* More Meson conversions (0.59.x now required rather than suggested)
* UMIP support for TCG x86
* Fix migration crash
* Restore error output for check-block

# gpg: Signature made Mon 21 Feb 2022 09:35:59 GMT
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini-gitlab/tags/for-upstream: (29 commits)
  configure, meson: move CONFIG_IASL to a Meson option
  meson, configure: move ntddscsi API check to meson
  meson: require dynamic linking for VSS support
  qga/vss-win32: require widl/midl, remove pre-built TLB file
  meson: do not make qga/vss-win32/meson.build conditional on C++ presence
  configure, meson: replace VSS SDK checks and options with --enable-vss-sdk
  qga/vss: use standard windows headers location
  qga/vss-win32: use widl if available
  meson: drop --with-win-sdk
  qga/vss-win32: fix midl arguments
  meson: refine check for whether to look for virglrenderer
  configure, meson: move guest-agent, tools to meson
  configure, meson: move smbd options to meson_options.txt
  configure, meson: move coroutine options to meson_options.txt
  configure, meson: move some default-disabled options to meson_options.txt
  meson: define qemu_cflags/qemu_ldflags
  configure, meson: move block layer options to meson_options.txt
  configure, meson: move image format options to meson_options.txt
  configure, meson: cleanup qemu-ga libraries
  configure, meson: move TPM check to meson
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-02-21 17:24:05 +00:00
Greg Kurz
45b04ef48d virtiofsd: Add basic support for FUSE_SYNCFS request
Honor the expected behavior of syncfs() to synchronously flush all data
and metadata to disk on linux systems.

If virtiofsd is started with '-o announce_submounts', the client is
expected to send a FUSE_SYNCFS request for each individual submount.
In this case, we just create a new file descriptor on the submount
inode with lo_inode_open(), call syncfs() on it and close it. The
intermediary file is needed because O_PATH descriptors aren't
backed by an actual file and syncfs() would fail with EBADF.

If virtiofsd is started without '-o announce_submounts' or if the
client doesn't have the FUSE_CAP_SUBMOUNTS capability, the client
only sends a single FUSE_SYNCFS request for the root inode. The
server would thus need to track submounts internally and call
syncfs() on each of them. This will be implemented later.

Note that syncfs() might suffer from a time penalty if the submounts
are being hammered by some unrelated workload on the host. The only
solution to prevent that is to avoid shared mounts.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20220215181529.164070-2-groug@kaod.org>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-02-17 17:22:26 +00:00
Vivek Goyal
963061dc11 virtiofsd: Add an option to enable/disable security label
Provide an option "-o security_label/no_security_label" to enable/disable
security label functionality. By default these are turned off.

If enabled, server will indicate to client that it is capable of handling
one security label during file creation. Typically this is expected to
be a SELinux label. File server will set this label on the file. It will
try to set it atomically wherever possible. But its not possible in
all the cases.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20220208204813.682906-11-vgoyal@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-02-17 17:22:26 +00:00
Vivek Goyal
a675c9a600 virtiofsd: Create new file using O_TMPFILE and set security context
If guest and host policies can't work with each other, then guest security
context (selinux label) needs to be set into an xattr. Say remap guest
security.selinux xattr to trusted.virtiofs.security.selinux.

That means setting "fscreate" is not going to help as that's ony useful
for security.selinux xattr on host.

So we need another method which is atomic. Use O_TMPFILE to create new
file, set xattr and then linkat() to proper place.

But this works only for regular files. So dir, symlinks will continue
to be non-atomic.

Also if host filesystem does not support O_TMPFILE, we fallback to
non-atomic behavior.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20220208204813.682906-10-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-02-17 17:22:26 +00:00
Vivek Goyal
0c3f81e131 virtiofsd: Create new file with security context
This patch adds support for creating new file with security context
as sent by client. It basically takes three paths.

- If no security context enabled, then it continues to create files without
  security context.

- If security context is enabled and but security.selinux has not been
  remapped, then it uses /proc/thread-self/attr/fscreate knob to set
  security context and then create the file. This will make sure that
  newly created file gets the security context as set in "fscreate" and
  this is atomic w.r.t file creation.

  This is useful and host and guest SELinux policies don't conflict and
  can work with each other. In that case, guest security.selinux xattr
  is not remapped and it is passthrough as "security.selinux" xattr
  on host.

- If security context is enabled but security.selinux xattr has been
  remapped to something else, then it first creates the file and then
  uses setxattr() to set the remapped xattr with the security context.
  This is a non-atomic operation w.r.t file creation.

  This mode will be most versatile and allow host and guest to have their
  own separate SELinux xattrs and have their own separate SELinux policies.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20220208204813.682906-9-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-02-17 17:22:26 +00:00
Vivek Goyal
cb282e556a virtiofsd: Add helpers to work with /proc/self/task/tid/attr/fscreate
Soon we will be able to create and also set security context on the file
atomically using /proc/self/task/tid/attr/fscreate knob. If this knob
is available on the system, first set the knob with the desired context
and then create the file. It will be created with the context set in
fscreate. This works basically for SELinux and its per thread.

This patch just introduces the helper functions. Subsequent patches will
make use of these helpers.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20220208204813.682906-8-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  dgilbert: Manually merged gettid syscall number fixup from Vivek
2022-02-17 17:22:26 +00:00
Vivek Goyal
81489726ad virtiofsd: Move core file creation code in separate function
Move core file creation bits in a separate function. Soon this is going
to get more complex as file creation need to set security context also.
And there will be multiple modes of file creation in next patch.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20220208204813.682906-7-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-02-17 17:22:26 +00:00
Vivek Goyal
36cfab870e virtiofsd, fuse_lowlevel.c: Add capability to parse security context
Add capability to enable and parse security context as sent by client
and put into fuse_req. Filesystems now can get security context from
request and set it on files during creation.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20220208204813.682906-6-vgoyal@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-02-17 17:22:26 +00:00
Vivek Goyal
4c7c393c7b virtiofsd: Extend size of fuse_conn_info->capable and ->want fields
->capable keeps track of what capabilities kernel supports and ->wants keep
track of what capabilities filesytem wants.

Right now these fields are 32bit in size. But now fuse has run out of
bits and capabilities can now have bit number which are higher than 31.

That means 32 bit fields are not suffcient anymore. Increase size to 64
bit so that we can add newer capabilities and still be able to use existing
code to check and set the capabilities.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20220208204813.682906-5-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-02-17 17:22:26 +00:00
Vivek Goyal
776dc4b165 virtiofsd: Parse extended "struct fuse_init_in"
Add some code to parse extended "struct fuse_init_in". And use a local
variable "flag" to represent 64 bit flags. This will make it easier
to add more features without having to worry about two 32bit flags (->flags
and ->flags2) in "fuse_struct_in".

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20220208204813.682906-4-vgoyal@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  dgilbert: Fixed up long line
2022-02-17 17:22:19 +00:00
Vivek Goyal
a086d54c6f virtiofsd: Fix breakage due to fuse_init_in size change
Kernel version 5.17 has increased the size of "struct fuse_init_in" struct.
Previously this struct was 16 bytes and now it has been extended to
64 bytes in size.

Once qemu headers are updated to latest, it will expect to receive 64 byte
size struct (for protocol version major 7 and minor > 6). But if guest is
booting older kernel (older than 5.17), then it still sends older
fuse_init_in of size 16 bytes. And do_init() fails. It is expecting
64 byte struct. And this results in mount of virtiofs failing.

Fix this by parsing 16 bytes only for now. Separate patches will be
posted which will parse rest of the bytes and enable new functionality.
Right now we don't support any of the new functionality, so we don't
lose anything by not parsing bytes beyond 16.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20220208204813.682906-2-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-02-17 17:21:43 +00:00
Sebastian Hasler
41af4459ac virtiofsd: Do not support blocking flock
With the current implementation, blocking flock can lead to
deadlock. Thus, it's better to return EOPNOTSUPP if a user attempts
to perform a blocking flock request.

Signed-off-by: Sebastian Hasler <sebastian.hasler@stuvus.uni-stuttgart.de>
Message-Id: <20220113153249.710216-1-sebastian.hasler@stuvus.uni-stuttgart.de>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
2022-02-16 17:29:31 +00:00
Paolo Bonzini
a436d6d412 meson: use .require() and .disable_auto_if() method for features
The method is now in 0.59, using it simplifies some conditionals.

There is a small change, which is to build virtfs-proxy-helper in a
tools-only build.  This is done for consistency with other tools,
which are not culled by the absence of system emulator binaries.

.disable_auto_if() would also be useful to check for packages,
for example

-linux_io_uring = not_found
-if not get_option('linux_io_uring').auto() or have_block
-  linux_io_uring = dependency('liburing', required: get_option('linux_io_uring'),
-                              method: 'pkg-config', kwargs: static_kwargs)
-endif
+linux_io_uring = dependency('liburing',
+  required: get_option('linux_io_uring').disable_auto_if(not have_block),
+  method: 'pkg-config', kwargs: static_kwargs)

This change however is much larger and I am not sure about the improved
readability, so I am not performing it right now.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-16 15:01:33 +01:00
Christian Ehrhardt
7b223e3860 tools/virtiofsd: Add rseq syscall to the seccomp allowlist
The virtiofsd currently crashes when used with glibc 2.35.
That is due to the rseq system call being added to every thread
creation [1][2].

[1]: https://www.efficios.com/blog/2019/02/08/linux-restartable-sequences/
[2]: https://sourceware.org/pipermail/libc-alpha/2022-February/136040.html

This happens not at daemon start, but when a guest connects

    /usr/lib/qemu/virtiofsd -f --socket-path=/tmp/testvfsd -o sandbox=chroot \
        -o source=/var/guests/j-virtiofs --socket-group=kvm
    virtio_session_mount: Waiting for vhost-user socket connection...
    # start ok, now guest will connect
    virtio_session_mount: Received vhost-user socket connection
    virtio_loop: Entry
    fv_queue_set_started: qidx=0 started=1
    fv_queue_set_started: qidx=1 started=1
    Bad system call (core dumped)

We have to put rseq on the seccomp allowlist to avoid that the daemon
is crashing in this case.

Reported-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-id: 20220209111456.3328420-1-christian.ehrhardt@canonical.com

[Moved rseq to its alphabetically ordered position in the seccomp
allowlist.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-02-14 17:11:20 +00:00
Vivek Goyal
449e8171f9 virtiofsd: Drop membership of all supplementary groups (CVE-2022-0358)
At the start, drop membership of all supplementary groups. This is
not required.

If we have membership of "root" supplementary group and when we switch
uid/gid using setresuid/setsgid, we still retain membership of existing
supplemntary groups. And that can allow some operations which are not
normally allowed.

For example, if root in guest creates a dir as follows.

$ mkdir -m 03777 test_dir

This sets SGID on dir as well as allows unprivileged users to write into
this dir.

And now as unprivileged user open file as follows.

$ su test
$ fd = open("test_dir/priviledge_id", O_RDWR|O_CREAT|O_EXCL, 02755);

This will create SGID set executable in test_dir/.

And that's a problem because now an unpriviliged user can execute it,
get egid=0 and get access to resources owned by "root" group. This is
privilege escalation.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2044863
Fixes: CVE-2022-0358
Reported-by: JIETAO XIAO <shawtao1125@gmail.com>
Suggested-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <YfBGoriS38eBQrAb@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  dgilbert: Fixed missing {}'s style nit
2022-01-26 10:32:05 +00:00
Dr. David Alan Gilbert
555a76e5e5 virtiofsd: Error on bad socket group name
Make the '--socket-group=' option fail if the group name is unknown:

./tools/virtiofsd/virtiofsd .... --socket-group=zaphod
vhost socket: unable to find group 'zaphod'

Reported-by: Xiaoling Gao <xiagao@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20211014122554.34599-1-dgilbert@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-10-25 19:38:32 +01:00
Vivek Goyal
50cf6d6cb7 virtiofsd: Add a helper to stop all queues
Use a helper to stop all the queues. Later in the patch series I am
planning to use this helper at one more place later in the patch series.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20210930153037.1194279-6-vgoyal@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-10-25 18:58:42 +01:00
Vivek Goyal
c68276556a virtiofsd: Add a helper to send element on virtqueue
We have open coded logic to take locks and push element on virtqueue at
three places. Add a helper and use it everywhere. Code is easier to read and
less number of lines of code.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20210930153037.1194279-5-vgoyal@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-10-25 18:58:20 +01:00
Vivek Goyal
a88abc6f84 virtiofsd: Remove unused virtio_fs_config definition
"struct virtio_fs_config" definition seems to be unused in fuse_virtio.c.
Remove it.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20210930153037.1194279-4-vgoyal@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-10-25 18:58:02 +01:00
Vivek Goyal
5afc8df46c virtiofsd: xattr mapping add a new type "unsupported"
Right now for xattr remapping, we support types of "prefix", "ok" or "bad".
Type "bad" returns -EPERM on setxattr and hides xattr in listxattr. For
getxattr, mapping code returns -EPERM but getxattr code converts it to -ENODATA.

I need a new semantics where if an xattr is unsupported, then
getxattr()/setxattr() return -ENOTSUP and listxattr() should hide the xattr.
This is needed to simulate that security.selinux is not supported by
virtiofs filesystem and in that case client falls back to some default
label specified by policy.

So add a new type "unsupported" which returns -ENOTSUP on getxattr() and
setxattr() and hides xattrs in listxattr().

For example, one can use following mapping rule to not support
security.selinux xattr and allow others.

"-o xattrmap=/unsupported/all/security.selinux/security.selinux//ok/all///"

Suggested-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <YUt9qbmgAfCFfg5t@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-10-25 18:48:23 +01:00
Peter Maydell
7adb961995 virtiofsd pull 2021-08-16
Two minor fixes; one for performance, the other seccomp
 on s390x.
 
 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEERfXHG0oMt/uXep+pBRYzHrxb/ecFAmFDS+oACgkQBRYzHrxb
 /edhFg//Wldr2jMom7TlbgkhsbjPQbdbsbdHx7rWXviq3bqMq2cB8UzcHvhUFkW4
 j15ElKDO2ZFsjtjaFQLtQmuO4zoMN/V4u8c/0V93b/vaAh4IgMYiDXaZ18fpuYi9
 Y02tiTcTUyUj4fv5OUoUeynNUkkzgxGrL8Q5oZK3KSHn5uwFOWgnZiAycbECKzJH
 wNljEpXjyMcZoIUJvoJ9oO246be3Flo82eA8UVOj+O0MMb2/tl18DL5IGOnglBYB
 U/9CkFoa5qHfiIQ63/OsndHBMXSTZiOqnv+S9RSbAdoZRTjVP1BjFYvNjnzz9/Nv
 czHfM+ecjLxNzG7WSHOrdm8D+L3E4O42Xuf6umcib1KfV6l/giQ3V9WfqfEX1JA4
 V6XpZ5C25rs6AqFwbbh/Eo+wvb2syck30sXbwh7C1oDK/nfwHDgegd1EPE9VUaxi
 c0yoqV/ScHfo2fbK0QVkIt2mwi8lcjH3cg2gSKfZ0YiHcYBqC7RB9IonndEkIoqd
 mdvAdGafD27dDEsm1OkSxBDItmE+BZ7C2+7OF5x+a/rnt7L+yQszTG/A0DNH23kD
 ktvTjEBLx2jzhmR+4My5YnTYoK0rE3zs0Nh2zxg7Kx9Sh3LlMtdN5o2GS0USq0wm
 YCT4EfNAhPumPi+59mOCybgV6tayxz/ihgjAymQONAcRrTwddyo=
 =GMdE
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgilbert-gitlab/tags/pull-virtiofs-20210916' into staging

virtiofsd pull 2021-08-16

Two minor fixes; one for performance, the other seccomp
on s390x.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

# gpg: Signature made Thu 16 Sep 2021 14:51:38 BST
# gpg:                using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert-gitlab/tags/pull-virtiofs-20210916:
  virtiofsd: Reverse req_list before processing it
  tools/virtiofsd: Add fstatfs64 syscall to the seccomp allowlist

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-19 18:53:29 +01:00
Sergio Lopez
046d91c83c virtiofsd: Reverse req_list before processing it
With the thread pool disabled, we add the requests in the queue to a
GList, processing by iterating over there afterwards.

For adding them, we're using "g_list_prepend()", which is more
efficient but causes the requests to be processed in reverse order,
breaking the read-ahead and request-merging optimizations in the host
for sequential operations.

According to the documentation, if you need to process the request
in-order, using "g_list_prepend()" and then reversing the list with
"g_list_reverse()" is more efficient than using "g_list_append()", so
let's do it that way.

Testing on a spinning disk (to boost the increase of read-ahead and
request-merging) shows a 4x improvement on sequential write fio test:

Test:
fio --directory=/mnt/virtio-fs --filename=fio-file1 --runtime=20
--iodepth=16 --size=4G --direct=1 --blocksize=4K --ioengine libaio
--rw write --name seqwrite-libaio

Without "g_list_reverse()":
...
Jobs: 1 (f=1): [W(1)][100.0%][w=22.4MiB/s][w=5735 IOPS][eta 00m:00s]
seqwrite-libaio: (groupid=0, jobs=1): err= 0: pid=710: Tue Aug 24 12:58:16 2021
  write: IOPS=5709, BW=22.3MiB/s (23.4MB/s)(446MiB/20002msec); 0 zone resets
...

With "g_list_reverse()":
...
Jobs: 1 (f=1): [W(1)][100.0%][w=84.0MiB/s][w=21.5k IOPS][eta 00m:00s]
seqwrite-libaio: (groupid=0, jobs=1): err= 0: pid=716: Tue Aug 24 13:00:15 2021
  write: IOPS=21.3k, BW=83.1MiB/s (87.2MB/s)(1663MiB/20001msec); 0 zone resets
...

Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20210824131158.39970-1-slp@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-09-16 14:50:48 +01:00
Thomas Huth
8cfd339b3d tools/virtiofsd: Add fstatfs64 syscall to the seccomp allowlist
The virtiofsd currently crashes on s390x when doing something like
this in the guest:

 mkdir -p /mnt/myfs
 mount -t virtiofs myfs /mnt/myfs
 touch /mnt/myfs/foo.txt
 stat -f /mnt/myfs/foo.txt

The problem is that the fstatfs64 syscall is called in this case
from the virtiofsd. We have to put it on the seccomp allowlist to
avoid that the daemon gets killed in this case.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2001728
Suggested-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210914123214.181885-1-thuth@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-09-16 14:50:48 +01:00
Michael Tokarev
68857f13aa spelling: sytem => system
Signed-off-By: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <fefb5f5c-82bc-05e2-b4c1-665e9d6896ff@msgid.tls.msk.ru>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-09-15 15:51:07 +02:00
Hubert Jasudowicz
7ef2408a96 virtiofsd: Add missing newline in error message
Signed-off-by: Hubert Jasudowicz <hubert.jasudowicz@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <e5914ad202a13e9c1bc2a5efa267ff3bd4f48db6.1625173475.git.hubert.jasudowicz@gmail.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-07-09 18:42:46 +02:00