Commit Graph

241 Commits

Author SHA1 Message Date
Greg Kurz
5bb8327b65 virtiofsd: Release vu_dispatch_lock when stopping queue
QEMU can stop a virtqueue by sending a VHOST_USER_GET_VRING_BASE request
to virtiofsd. As with all other vhost-user protocol messages, the thread
that runs the main event loop in virtiofsd takes the vu_dispatch lock in
write mode. This ensures that no other thread can access virtqueues or
memory tables at the same time.

In the case of VHOST_USER_GET_VRING_BASE, the main thread basically
notifies the queue thread that it should terminate and waits for its
termination:

main()
 virtio_loop()
  vu_dispatch_wrlock()
  vu_dispatch()
   vu_process_message()
    vu_get_vring_base_exec()
     fv_queue_cleanup_thread()
      pthread_join()

Unfortunately, the queue thread ends up calling virtio_send_msg()
at some point, which itself needs to grab the lock:

fv_queue_thread()
 g_list_foreach()
  fv_queue_worker()
   fuse_session_process_buf_int()
    do_release()
     lo_release()
      fuse_reply_err()
       send_reply()
        send_reply_iov()
         fuse_send_reply_iov_nofree()
          fuse_send_msg()
           virtio_send_msg()
            vu_dispatch_rdlock() <-- Deadlock !

Simply have the main thread to release the lock before going to
sleep and take it back afterwards. A very similar patch was already
sent by Vivek Goyal sometime back:

https://listman.redhat.com/archives/virtio-fs/2021-January/msg00073.html

The only difference here is that this done in fv_queue_set_started()
because fv_queue_cleanup_thread() can also be called from virtio_loop()
without the lock being held.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210312092212.782255-8-groug@kaod.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-03-15 20:01:55 +00:00
Dr. David Alan Gilbert
e586edcb41 virtiofs: drop remapped security.capability xattr as needed
On Linux, the 'security.capability' xattr holds a set of
capabilities that can change when an executable is run, giving
a limited form of privilege escalation to those programs that
the writer of the file deemed worthy.

Any write causes the 'security.capability' xattr to be dropped,
stopping anyone from gaining privilege by modifying a blessed
file.

Fuse relies on the daemon to do this dropping, and in turn the
daemon relies on the host kernel to drop the xattr for it.  However,
with the addition of -o xattrmap, the xattr that the guest
stores its capabilities in is now not the same as the one that
the host kernel automatically clears.

Where the mapping changes 'security.capability', explicitly clear
the remapped name to preserve the same behaviour.

This bug is assigned CVE-2021-20263.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
2021-03-04 10:26:16 +00:00
Vivek Goyal
26ec190964 virtiofsd: Do not use a thread pool by default
Currently we created a thread pool (With 64 max threads per pool) for
each virtqueue. We hoped that this will provide us with better scalability
and performance.

But in practice, we are getting better numbers in most of the cases
when we don't create a thread pool at all and a single thread per
virtqueue receives the request and processes it.

Hence, I am proposing that we switch to no thread pool by default
(equivalent of --thread-pool-size=0). This will provide out of
box better performance to most of the users. In fact other users
have confirmed that not using a thread pool gives them better
numbers. So why not use this as default. It can be changed when
somebody can fix the issues with thread pool performance.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20210210182744.27324-2-vgoyal@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-16 17:54:18 +00:00
Vivek Goyal
d64907acbf viriofsd: Add support for FUSE_HANDLE_KILLPRIV_V2
This patch adds basic support for FUSE_HANDLE_KILLPRIV_V2. virtiofsd
can enable/disable this by specifying option "-o killpriv_v2/no_killpriv_v2".
By default this is enabled as long as client supports it

Enabling this option helps with performance in write path. Without this
option, currently every write is first preceeded with a getxattr() operation
to find out if security.capability is set. (Write is supposed to clear
security.capability). With this option enabled, server is signing up for
clearing security.capability on every WRITE and also clearing suid/sgid
subject to certain rules. This gets rid of extra getxattr() call for every
WRITE and improves performance. This is true when virtiofsd is run with
option -o xattr.

What does enabling FUSE_HANDLE_KILLPRIV_V2 mean for file server implementation.
It needs to adhere to following rules. Thanks to Miklos for this summary.

- clear "security.capability" on write, truncate and chown unconditionally
- clear suid/sgid in case of following. Note, sgid is cleared only if
  group executable bit is set.
    o setattr has FATTR_SIZE and FATTR_KILL_SUIDGID set.
    o setattr has FATTR_UID or FATTR_GID
    o open has O_TRUNC and FUSE_OPEN_KILL_SUIDGID
    o create has O_TRUNC and FUSE_OPEN_KILL_SUIDGID flag set.
    o write has FUSE_WRITE_KILL_SUIDGID

>From Linux VFS client perspective, here are the requirements.

- caps are always cleared on chown/write/truncate
- suid is always cleared on chown, while for truncate/write it is cleared
  only if caller does not have CAP_FSETID.
- sgid is always cleared on chown, while for truncate/write it is cleared
  only if caller does not have CAP_FSETID as well as file has group execute
  permission.

virtiofsd implementation has not changed much to adhere to above ruls. And
reason being that current assumption is that we are running on Linux
and on top of filesystems like ext4/xfs which already follow above rules.
On write, truncate, chown, seucurity.capability is cleared. And virtiofsd
drops CAP_FSETID if need be and that will lead to clearing of suid/sgid.

But if virtiofsd is running on top a filesystem which breaks above assumptions,
then it will have to take extra actions to emulate above. That's a TODO
for later when need arises.

Note: create normally is supposed to be called only when file does not
      exist. So generally there should not be any question of clearing
      setuid/setgid. But it is possible that after client checks that
      file is not present, some other client creates file on server
      and this race can trigger sending FUSE_CREATE. In that case, if
      O_TRUNC is set, we should clear suid/sgid if FUSE_OPEN_KILL_SUIDGID
      is also set.

v3:
  - Resolved conflicts due to lo_inode_open() changes.
  - Moved capability code in lo_do_open() so that both lo_open() and
    lo_create() can benefit from common code.
  - Dropped changes to kernel headers as these are part of qemu already.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210208224024.43555-3-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-16 17:03:09 +00:00
Vivek Goyal
1e08f164e9 virtiofsd: Save error code early at the failure callsite
Change error code handling slightly in lo_setattr(). Right now we seem
to jump to out_err and assume that "errno" is valid and use that to
send reply.

But if caller has to do some other operations before jumping to out_err,
then it does the dance of first saving errno to saverr and the restore
errno before jumping to out_err. This makes it more confusing.

I am about to make more changes where caller will have to do some
work after error before jumping to out_err. I found it easier to
change the convention a bit. That is caller saves error in "saverr"
before jumping to out_err. And out_err uses "saverr" to send error
back and does not rely on "errno" having actual error.

v3: Resolved conflicts in lo_setattr() due to lo_inode_open() changes.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210208224024.43555-2-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-16 17:03:09 +00:00
Philippe Mathieu-Daudé
a65963efa3 tools/virtiofsd: Replace the word 'whitelist'
Follow the inclusive terminology from the "Conscious Language in your
Open Source Projects" guidelines [*] and replace the words "whitelist"
appropriately.

[*] https://github.com/conscious-lang/conscious-lang-docs/blob/main/faq.md

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210205171817.2108907-3-philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-16 17:03:09 +00:00
Greg Kurz
525a3030a8 virtiofsd: vu_dispatch locking should never fail
pthread_rwlock_rdlock() and pthread_rwlock_wrlock() can fail if a
deadlock condition is detected or the current thread already owns
the lock. They can also fail, like pthread_rwlock_unlock(), if the
mutex wasn't properly initialized. None of these are ever expected
to happen with fv_VuDev::vu_dispatch_rwlock.

Some users already check the return value and assert, some others
don't. Introduce rdlock/wrlock/unlock wrappers that just do the
former and use them everywhere for improved consistency and
robustness.

This is just cleanup. It doesn't fix any actual issue.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210203182434.93870-1-groug@kaod.org>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-16 17:03:09 +00:00
Wainer dos Santos Moschetta
0958ee89b6 virtiofsd: Allow to build it without the tools
This changed the Meson build script to allow virtiofsd be built even
though the tools build is disabled, thus honoring the --enable-virtiofsd
option.

Fixes: cece116c93 (configure: add option for virtiofsd)
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Message-Id: <20210201211456.1133364-2-wainersm@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-16 17:03:09 +00:00
Greg Kurz
cf269ff803 virtiofsd: Add restart_syscall to the seccomp whitelist
This is how linux restarts some system calls after SIGSTOP/SIGCONT.
This is needed to avoid virtiofsd termination when resuming execution
under GDB for example.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210201193305.136390-1-groug@kaod.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-04 17:50:08 +00:00
Greg Kurz
62124e5080 virtiofsd: Add _llseek to the seccomp whitelist
This is how glibc implements lseek(2) on POWER.

BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1917692
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210121171540.1449777-1-groug@kaod.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-04 17:49:17 +00:00
Stefan Hajnoczi
a3fdbbc7f2 virtiofsd: prevent opening of special files (CVE-2020-35517)
A well-behaved FUSE client does not attempt to open special files with
FUSE_OPEN because they are handled on the client side (e.g. device nodes
are handled by client-side device drivers).

The check to prevent virtiofsd from opening special files is missing in
a few cases, most notably FUSE_OPEN. A malicious client can cause
virtiofsd to open a device node, potentially allowing the guest to
escape. This can be exploited by a modified guest device driver. It is
not exploitable from guest userspace since the guest kernel will handle
special files inside the guest instead of sending FUSE requests.

This patch fixes this issue by introducing the lo_inode_open() function
to check the file type before opening it. This is a short-term solution
because it does not prevent a compromised virtiofsd process from opening
device nodes on the host.

Restructure lo_create() to try O_CREAT | O_EXCL first. Note that O_CREAT
| O_EXCL does not follow symlinks, so O_NOFOLLOW masking is not
necessary here. If the file exists and the user did not specify O_EXCL,
open it via lo_do_open().

Reported-by: Alex Xu <alex@alxu.ca>
Fixes: CVE-2020-35517
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210204150208.367837-4-stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-04 17:34:55 +00:00
Stefan Hajnoczi
22d2ece71e virtiofsd: optionally return inode pointer from lo_do_lookup()
lo_do_lookup() finds an existing inode or allocates a new one. It
increments nlookup so that the inode stays alive until the client
releases it.

Existing callers don't need the struct lo_inode so the function doesn't
return it. Extend the function to optionally return the inode. The next
commit will need it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210204150208.367837-3-stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-04 17:34:35 +00:00
Stefan Hajnoczi
8afaaee976 virtiofsd: extract lo_do_open() from lo_open()
Both lo_open() and lo_create() have similar code to open a file. Extract
a common lo_do_open() function from lo_open() that will be used by
lo_create() in a later commit.

Since lo_do_open() does not otherwise need fuse_req_t req, convert
lo_add_fd_mapping() to use struct lo_data *lo instead.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210204150208.367837-2-stefanha@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-04 17:32:53 +00:00
Paolo Bonzini
727c8bb809 cap_ng: convert to meson
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-06 10:21:20 +01:00
Paolo Bonzini
90835c2b81 seccomp: convert to meson
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-06 10:21:20 +01:00
Alex Chen
03350a1e8d virtiofsd: Remove useless code about send_notify_iov
The 'ch' will be NULL in the following stack:
send_notify_iov()->fuse_send_msg()->virtio_send_msg(), and
this may lead to NULL pointer dereferenced in virtio_send_msg().
But send_notify_iov() was never called, so remove the useless code
about send_notify_iov() to fix this problem.

Signed-off-by: Alex Chen <alex.chen@huawei.com>
Message-Id: <20201214121615.29967-1-alex.chen@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-12-18 10:08:24 +00:00
Laszlo Ersek
d6211148f6 virtiofsd: update FUSE_FORGET comment on "lo_inode.nlookup"
Miklos confirms it's *only* the FUSE_FORGET request that the client can
use for decrementing "lo_inode.nlookup".

Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Fixes: 1222f01555
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20201208073936.8629-1-lersek@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-12-18 10:08:24 +00:00
Vivek Goyal
31a4990f8d virtiofsd: Check file type in lo_flush()
Currently lo_flush() is written in such a way that it expects to receive
a FLUSH requests on a regular file (and not directories). For example,
we call lo_fi_fd() which searches lo->fd_map. If we open directories
using opendir(), we keep don't keep track of these in lo->fd_map instead
we keep them in lo->dir_map. So we expect lo_flush() to be called on
regular files only.

Even linux fuse client calls FLUSH only for regular files and not
directories. So put a check for filetype and return EBADF if
lo_flush() is called on a non-regular file.

Reported-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20201211142544.GB3285@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-12-18 10:08:24 +00:00
Vivek Goyal
e7e8aa8aea virtiofsd: Disable posix_lock hash table if remote locks are not enabled
If remote posix locks are not enabled (lo->posix_lock == false), then disable
code paths taken to initialize inode->posix_lock hash table and corresponding
destruction and search etc.

lo_getlk() and lo_setlk() have been modified to return ENOSYS if daemon
does not support posix lock but client still sends a lock/unlock request.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20201207183021.22752-3-vgoyal@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-12-18 10:08:24 +00:00
Vivek Goyal
ad3bfe1bd6 virtiofsd: Set up posix_lock hash table for root inode
We setup per inode hash table ->posix_lock to support remote posix locks.
But we forgot to initialize this table for root inode.

Laszlo managed to trigger an issue where he sent a FUSE_FLUSH request for
root inode and lo_flush() found inode with inode->posix_lock NULL and
accessing this table crashed virtiofsd.

May be we can get rid of initializing this hash table for directory
objects completely. But that optimization is for another day.

Reported-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20201207195539.GB3107@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-12-18 10:08:24 +00:00
Laszlo Ersek
bebc3c24aa virtiofsd: make the debug log timestamp on stderr more human-readable
The current timestamp format doesn't help me visually notice small jumps
in time ("small" as defined on human scale, such as a few seconds or a few
ten seconds). Replace it with a local time format where such differences
stand out.

Before:

> [13316826770337] [ID: 00000004] unique: 62, opcode: RELEASEDIR (29), nodeid: 1, insize: 64, pid: 1
> [13316826778175] [ID: 00000004]    unique: 62, success, outsize: 16
> [13316826781156] [ID: 00000004] virtio_send_msg: elem 0: with 1 in desc of length 16
> [15138279317927] [ID: 00000001] virtio_loop: Got VU event
> [15138279504884] [ID: 00000001] fv_queue_set_started: qidx=1 started=0
> [15138279519034] [ID: 00000003] fv_queue_thread: kill event on queue 1 - quitting
> [15138280876463] [ID: 00000001] fv_remove_watch: TODO! fd=9
> [15138280897381] [ID: 00000001] virtio_loop: Waiting for VU event
> [15138280946834] [ID: 00000001] virtio_loop: Got VU event
> [15138281175421] [ID: 00000001] virtio_loop: Waiting for VU event
> [15138281182387] [ID: 00000001] virtio_loop: Got VU event
> [15138281189474] [ID: 00000001] virtio_loop: Waiting for VU event
> [15138309321936] [ID: 00000001] virtio_loop: Unexpected poll revents 11
> [15138309434150] [ID: 00000001] virtio_loop: Exit

(Notice how you don't (easily) notice the gap in time after
"virtio_send_msg", and especially the amount of time passed is hard to
estimate.)

After:

> [2020-12-08 06:43:22.58+0100] [ID: 00000004] unique: 51, opcode: RELEASEDIR (29), nodeid: 1, insize: 64, pid: 1
> [2020-12-08 06:43:22.58+0100] [ID: 00000004]    unique: 51, success, outsize: 16
> [2020-12-08 06:43:22.58+0100] [ID: 00000004] virtio_send_msg: elem 0: with 1 in desc of length 16
> [2020-12-08 06:43:29.34+0100] [ID: 00000001] virtio_loop: Got VU event
> [2020-12-08 06:43:29.34+0100] [ID: 00000001] fv_queue_set_started: qidx=1 started=0
> [2020-12-08 06:43:29.34+0100] [ID: 00000003] fv_queue_thread: kill event on queue 1 - quitting
> [2020-12-08 06:43:29.34+0100] [ID: 00000001] fv_remove_watch: TODO! fd=9
> [2020-12-08 06:43:29.34+0100] [ID: 00000001] virtio_loop: Waiting for VU event
> [2020-12-08 06:43:29.34+0100] [ID: 00000001] virtio_loop: Got VU event
> [2020-12-08 06:43:29.34+0100] [ID: 00000001] virtio_loop: Waiting for VU event
> [2020-12-08 06:43:29.34+0100] [ID: 00000001] virtio_loop: Got VU event
> [2020-12-08 06:43:29.34+0100] [ID: 00000001] virtio_loop: Waiting for VU event
> [2020-12-08 06:43:29.37+0100] [ID: 00000001] virtio_loop: Unexpected poll revents 11
> [2020-12-08 06:43:29.37+0100] [ID: 00000001] virtio_loop: Exit

Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20201208055043.31548-1-lersek@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-12-18 10:08:24 +00:00
Vivek Goyal
e49393a349 virtiofsd: Use --thread-pool-size=0 to mean no thread pool
Right now we create a thread pool and main thread hands over the request
to thread in thread pool to process. Number of threads in thread pool
can be managed by option --thread-pool-size.

In tests we have noted that many of the workloads are getting better
performance if we don't use a thread pool at all and process all
the requests in the context of a thread receiving the request.

Hence give user an option to be able to run virtiofsd without using
a thread pool.

To implement this, I have used existing option --thread-pool-size. This
option defines how many maximum threads can be in the thread pool.
Thread pool size zero freezes thead pool. I can't see why will one
start virtiofsd with a frozen thread pool (hence frozen file system).
So I am redefining --thread-pool-size=0 to mean, don't use a thread pool.
Instead process the request in the context of thread receiving request
from the queue.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20201109143548.GA1479853@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-12-18 10:08:24 +00:00
Marc-André Lureau
db5deef996 virtiofsd: replace _Static_assert with QEMU_BUILD_BUG_ON
This allows to get rid of a check for older GCC version (which was a bit
bogus too since it was falling back on c++ version..)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20201210134752.780923-7-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15 12:52:10 -05:00
Markus Armbruster
4bd802b209 Clean up includes
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes, with the changes
to the following files manually reverted:

    contrib/libvhost-user/libvhost-user-glib.h
    contrib/libvhost-user/libvhost-user.c
    contrib/libvhost-user/libvhost-user.h
    contrib/plugins/hotblocks.c
    contrib/plugins/hotpages.c
    contrib/plugins/howvec.c
    contrib/plugins/lockstep.c
    linux-user/mips64/cpu_loop.c
    linux-user/mips64/signal.c
    linux-user/sparc64/cpu_loop.c
    linux-user/sparc64/signal.c
    linux-user/x86_64/cpu_loop.c
    linux-user/x86_64/signal.c
    target/s390x/gen-features.c
    tests/fp/platform.h
    tests/migration/s390x/a-b-bios.c
    tests/plugin/bb.c
    tests/plugin/empty.c
    tests/plugin/insn.c
    tests/plugin/mem.c
    tests/test-rcu-simpleq.c
    tests/test-rcu-slist.c
    tests/test-rcu-tailq.c
    tests/uefi-test-tools/UefiTestToolsPkg/BiosTablesTest/BiosTablesTest.c

contrib/plugins/, tests/plugin/, and tests/test-rcu-slist.c appear not
to include osdep.h intentionally.  The remaining reverts are the same
as in commit bbfff19688.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201113061216.2483385-1-armbru@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Alexander Bulekov <alxndr@bu.edu>
2020-12-10 17:16:44 +01:00
Marc-André Lureau
0df750e9d3 libvhost-user: make it a meson subproject
By making libvhost-user a subproject, check it builds
standalone (without the global QEMU cflags etc).

Note that the library still relies on QEMU include/qemu/atomic.h and
linux_headers/.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20201125100640.366523-6-marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-08 13:48:58 -05:00
Haotian Li
7632b56c8f virtiofsd: check whether strdup lo.source return NULL in main func
In main func, strdup lo.source may fail. So check whether strdup
lo.source return NULL before using it.

Signed-off-by: Haotian Li <lihaotian9@huawei.com>
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Message-Id: <f1e48ca8-d6de-d901-63c8-4f4024bda518@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-12 16:25:38 +00:00
Haotian Li
db2e026a39 virtiofsd: check whether lo_map_reserve returns NULL in, main func
In main func, func lo_map_reserve is called without NULL check.
If reallocing new_elems fails in func lo_map_grow, the func
lo_map_reserve may return NULL. We should check whether
lo_map_reserve returns NULL before using it.

Signed-off-by: Haotian Li <lihaotian9@huawei.com>
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Message-Id: <48887813-1c95-048c-6d10-48e3dd2bac71@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-12 16:25:32 +00:00
Haotian Li
7fa87944f8 tools/virtiofsd/buffer.c: check whether buf is NULL in fuse_bufvec_advance func
In fuse_bufvec_advance func, calling fuse_bufvec_current func
may return NULL, so we should check whether buf is NULL before
using it.

Signed-off-by: Haotian Li <lihaotian9@huawei.com>
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Message-Id: <29fc87c2-b87c-4c34-40d4-75381f228849@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-12 16:25:23 +00:00
Max Reitz
f26688a911 virtiofsd: Announce submounts even without statx()
Contrary to what the check (and warning) in lo_init() claims, we can
announce submounts just fine even without statx() -- the check is based
on comparing both the mount ID and st_dev of parent and child.  Without
statx(), we will not have the mount ID; but we always have st_dev.

The only problems we have (without statx() and its mount ID) are:

(1) Mounting the same device twice may lead to both trees being treated
    as exactly the same tree by virtiofsd.  But that is a problem that
    is completely independent of mirroring host submounts in the guest.
    Both submount roots will still show the FUSE_SUBMOUNT flag, because
    their st_dev still differs from their respective parent.

(2) There is only one exception to (1), and that is if you mount a
    device inside a mount of itself: Then, its st_dev will be the same
    as that of its parent, and so without a mount ID, virtiofsd will not
    be able to recognize the nested mount's root as a submount.
    However, thanks to virtiofsd then treating both trees as exactly the
    same tree, it will be caught up in a loop when the guest tries to
    examine the nested submount, so the guest will always see nothing
    but an ELOOP there.  Therefore, this case is just fully broken
    without statx(), whether we check for submounts (based on st_dev) or
    not.

All in all, checking for submounts works well even without comparing the
mount ID (i.e., without statx()).  The only concern is an edge case
that, without statx() mount IDs, is utterly broken anyway.

Thus, drop said check in lo_init().

Reported-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201103164135.169325-1-mreitz@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-12 15:52:20 +00:00
Marc Hartmayer
cd57deabad meson: vhost-user-gpu/virtiofsd: use absolute path
The option `libexecdir` is relative to `prefix` (see
https://mesonbuild.com/Builtin-options.html), so we have to be aware
of this when creating 50-qemu-gpu.json and
50-qemu-virtiofsd.json. Otherwise, tools like libvirt will not be able
to find the executable.

Fixes: 16bf7a3326 ("configure: move directory options from config-host.mak to meson")
Signed-off-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Message-Id: <20201103112333.24734-1-mhartmay@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-03 09:42:53 -05:00
Max Reitz
9d82f6a3e6 virtiofsd: Announce sub-mount points
Whenever we encounter a directory with an st_dev or mount ID that
differs from that of its parent, we set the FUSE_ATTR_SUBMOUNT flag so
the guest can create a submount for it.

We only need to do so in lo_do_lookup().  The following functions return
a fuse_attr object:
- lo_create(), though fuse_reply_create(): Calls lo_do_lookup().
- lo_lookup(), though fuse_reply_entry(): Calls lo_do_lookup().
- lo_mknod_symlink(), through fuse_reply_entry(): Calls lo_do_lookup().
- lo_link(), through fuse_reply_entry(): Creating a link cannot create a
  submount, so there is no need to check for it.
- lo_getattr(), through fuse_reply_attr(): Announcing submounts when the
  node is first detected (at lookup) is sufficient.  We do not need to
  return the submount attribute later.
- lo_do_readdir(), through fuse_add_direntry_plus(): Calls
  lo_do_lookup().

Make announcing submounts optional, so submounts are only announced to
the guest with the announce_submounts option.  Some users may prefer the
current behavior, so that the guest learns nothing about the host mount
structure.

(announce_submounts is force-disabled when the guest does not present
the FUSE_SUBMOUNTS capability, or when there is no statx().)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201102161859.156603-6-mreitz@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-02 19:22:51 +00:00
Max Reitz
d672fce6ba virtiofsd: Add mount ID to the lo_inode key
Using st_dev is not sufficient to uniquely identify a mount: You can
mount the same device twice, but those are still separate trees, and
e.g. by mounting something else inside one of them, they may differ.

Using statx(), we can get a mount ID that uniquely identifies a mount.
If that is available, add it to the lo_inode key.

Most of this patch is taken from Miklos's mail here:
https://marc.info/?l=fuse-devel&m=160062521827983
(virtiofsd-use-mount-id.patch attachment)

Suggested-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201102161859.156603-5-mreitz@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-02 19:22:50 +00:00
Max Reitz
93e79851ab virtiofsd: Add attr_flags to fuse_entry_param
fuse_entry_param is converted to fuse_attr on the line (by
fill_entry()), so it should have a member that mirrors fuse_attr.flags.

fill_entry() should then copy this fuse_entry_param.attr_flags to
fuse_attr.flags.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201102161859.156603-3-mreitz@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-02 19:22:48 +00:00
Max Reitz
9c6ac04363 virtiofsd: Check FUSE_SUBMOUNTS
FUSE_SUBMOUNTS is a pure indicator by the kernel to signal that it
supports submounts.  It does not check its state in the init reply, so
there is nothing for fuse_lowlevel.c to do but to check its existence
and copy it into fuse_conn_info.capable.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201102161859.156603-2-mreitz@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-02 19:22:06 +00:00
Jiachen Zhang
0429eaf518 virtiofsd: Fix the help message of posix lock
The commit 88fc107956 disabled remote
posix locks by default. But the --help message still says it is enabled
by default. So fix it to output no_posix_lock.

Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
Message-Id: <20201027081558.29904-1-zhangjiachen.jaycee@bytedance.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-02 18:43:19 +00:00
Philippe Mathieu-Daudé
2693026042 tools/virtiofsd: Check vu_init() return value (CID 1435958)
Since commit 6f5fd83788, vu_init() can fail if malloc() returns NULL.

This fixes the following Coverity warning:

  CID 1435958 (#1 of 1): Unchecked return value (CHECKED_RETURN)

Fixes: 6f5fd83788 ("libvhost-user: support many virtqueues")
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20201102092339.2034297-1-philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-02 18:32:41 +00:00
Dr. David Alan Gilbert
dcaac9f124 virtiofsd: Seccomp: Add 'send' for syslog
On ppc, and some other archs, it looks like syslog ends up using 'send'
rather than 'sendto'.

Reference: https://github.com/kata-containers/kata-containers/issues/1050

Reported-by: amulmek1@in.ibm.com
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20201102150750.34565-1-dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-02 18:29:54 +00:00
Alex Williamson
33dc9914ea Revert series: virtiofsd: Announce submounts to the guest
This reverts the following commits due to their basis on a bogus
linux kernel header update:

c93a656f7b ("tests/acceptance: Add virtiofs_submounts.py")
45ced7ca2f ("tests/acceptance/boot_linux: Accept SSH pubkey")
08dce386e7 ("virtiofsd: Announce sub-mount points")
eba8b096c1 ("virtiofsd: Store every lo_inode's parent_dev")
ede24b6be7 ("virtiofsd: Add fuse_reply_attr_with_flags()")
e2577435d3 ("virtiofsd: Add attr_flags to fuse_entry_param")
2f10415abf ("virtiofsd: Announce FUSE_ATTR_FLAGS")
97d741cc96 ("linux/fuse.h: Pull in from Linux")

Cc: Max Reitz <mreitz@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 160385090886.20017.13382256442750027666.stgit@gimli.home
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-10-28 13:17:32 +00:00
Max Reitz
08dce386e7 virtiofsd: Announce sub-mount points
Whenever we encounter a directory with an st_dev that differs from that
of its parent, we set the FUSE_ATTR_SUBMOUNT flag so the guest can
create a submount for it.

Make this behavior optional, so submounts are only announced to the
guest with the announce_submounts option.  Some users may prefer the
current behavior, so that the guest learns nothing about the host mount
structure.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200909184028.262297-7-mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  Manual merge
2020-10-26 18:35:32 +00:00
Max Reitz
eba8b096c1 virtiofsd: Store every lo_inode's parent_dev
We want to detect mount points in the shared tree.  We report them to
the guest by setting the FUSE_ATTR_SUBMOUNT flag in fuse_attr.flags, but
because the FUSE client will create a submount for every directory that
has this flag set, we must do this only for the actual mount points.

We can detect mount points by comparing a directory's st_dev with its
parent's st_dev.  To be able to do so, we need to store the parent's
st_dev in the lo_inode object.

Note that mount points need not necessarily be directories; a single
file can be a mount point as well.  However, for the sake of simplicity
let us ignore any non-directory mount points for now.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200909184028.262297-6-mreitz@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26 18:35:32 +00:00
Max Reitz
ede24b6be7 virtiofsd: Add fuse_reply_attr_with_flags()
The plain fuse_reply_attr() function does not allow setting
fuse_attr.flags, so add this new function that does.

Make fuse_reply_attr() a wrapper around it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200909184028.262297-5-mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26 18:35:32 +00:00
Max Reitz
e2577435d3 virtiofsd: Add attr_flags to fuse_entry_param
fuse_entry_param is converted to fuse_attr on the line (by
fill_entry()), so it should have a member that mirrors fuse_attr.flags.

fill_entry() should then copy this fuse_entry_param.attr_flags to
fuse_attr.flags.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200909184028.262297-4-mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26 18:35:32 +00:00
Max Reitz
2f10415abf virtiofsd: Announce FUSE_ATTR_FLAGS
The fuse_attr.flags field is currently just initialized to 0, which is
valid.  Thus, there is no reason not to always announce FUSE_ATTR_FLAGS
(when the kernel supports it).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200909184028.262297-3-mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26 18:35:32 +00:00
Dr. David Alan Gilbert
1d84a0213a tools/virtiofsd: xattr name mappings: Simple 'map'
The mapping rule system implemented in the last few patches is
extremely flexible, but not easy to use.  Add a simple
'map' type as a sprinkling of sugar to make it easy.

e.g.

  -o xattrmap=":map::user.virtiofs.:"

would be sufficient to prefix all xattr's
or

  -o xattrmap=":map:trusted.:user.virtiofs.:"

would just prefix 'trusted.' xattr's and leave
everything else alone.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20201023165812.36028-6-dgilbert@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26 18:35:32 +00:00
Dr. David Alan Gilbert
6409cf19ca tools/virtiofsd: xattr name mappings: Map server xattr names
Map xattr names coming from the server, i.e. the host filesystem;
currently this is only from listxattr.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20201023165812.36028-4-dgilbert@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26 18:35:32 +00:00
Dr. David Alan Gilbert
4f088dbf98 tools/virtiofsd: xattr name mappings: Map client xattr names
Map xattr names originating at the client; from get/set/remove xattr.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20201023165812.36028-3-dgilbert@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26 18:35:32 +00:00
Dr. David Alan Gilbert
6084633dff tools/virtiofsd: xattr name mappings: Add option
Add an option to define mappings of xattr names so that
the client and server filesystems see different views.
This can be used to have different SELinux mappings as
seen by the guest, to run the virtiofsd with less privileges
(e.g. in a case where it can't set trusted/system/security
xattrs but you want the guest to be able to), or to isolate
multiple users of the same name; e.g. trusted attributes
used by stacking overlayfs.

A mapping engine is used with 3 simple rules; the rules can
be combined to allow most useful mapping scenarios.
The ruleset is defined by -o xattrmap='rules...'.

This patch doesn't use the rule maps yet.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20201023165812.36028-2-dgilbert@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26 18:35:32 +00:00
Stefan Hajnoczi
06844584b6 virtiofsd: add container-friendly -o sandbox=chroot option
virtiofsd cannot run in a container because CAP_SYS_ADMIN is required to
create namespaces.

Introduce a weaker sandbox mode that is sufficient in container
environments because the container runtime already sets up namespaces.
Use chroot to restrict path traversal to the shared directory.

virtiofsd loses the following:

1. Mount namespace. The process chroots to the shared directory but
   leaves the mounts in place. Seccomp rejects mount(2)/umount(2)
   syscalls.

2. Pid namespace. This should be fine because virtiofsd is the only
   process running in the container.

3. Network namespace. This should be fine because seccomp already
   rejects the connect(2) syscall, but an additional layer of security
   is lost. Container runtime-specific network security policies can be
   used drop network traffic (except for the vhost-user UNIX domain
   socket).

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201008085534.16070-1-stefanha@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26 18:35:32 +00:00
Misono Tomohiro
800ad114f1 virtiofsd: passthrough_ll: set FUSE_LOG_INFO as default log_level
Just noticed that although help message says default log level is INFO,
it is actually 0 (EMRGE) and no mesage will be shown when error occurs.
It's better to follow help message.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Message-Id: <20201008110148.2757734-1-misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26 18:35:32 +00:00
Misono Tomohiro
cece116c93 configure: add option for virtiofsd
Currently it is unknown whether virtiofsd will be built at
configuration time. It will be automatically built when dependency
is met. Also, required libraries are not clear.

To make this clear, add configure option --{enable,disable}-virtiofsd.
The default is the same as current (enabled if available) like many
other options. When --enable-virtiofsd is given and dependency is not
met, we get:

  ERROR: Problem encountered: virtiofsd requires libcap-ng-devel and seccomp-devel
or
  ERROR: Problem encountered: virtiofsd needs tools and vhost-user support

In addition, configuration summary now includes virtiofsd entry:

  build virtiofs daemon: YES/NO

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Message-Id: <20201008103133.2722903-1-misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  Manual merge
2020-10-26 18:35:32 +00:00
Peter Maydell
a95e0396c8 * fix --disable-tcg builds (Claudio)
* Fixes for macOS --enable-modules build and OpenBSD curses/iconv detection (myself)
 * Start preparing for meson 0.56 (myself)
 * Move directory configuration to meson (myself)
 * Start untangling qemu_init (myself)
 * Windows fixes (Sunil)
 * Remove -no-kbm (Thomas)
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl+WrxEUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNQAggAqfucqEQvz6s+DCPv2u572diyMvhe
 Y7vmaQF0qYKoAvy5OLqGlqXVsn8lwf19zJWo9Z7k4qNefWl84ii0J/kEmnolzTGq
 7Z0CRSnGbNQy9YedYXuymaR3E0VY+6lsPnzIpufQISzQRdjzT8OQ51DMAhc04oQl
 saXsts7y+om+tzvW2JFGtNsfFRUjcRKqjIAVfwneBXFW9TRD2epvYxz/S0o+XJwF
 eSiINvTqDxxPyy6XJykC46xf/TTfReHv6fQgTn7Jw3TQuo4m7qXLi5Vj8W1erZJv
 t3xhZNabt813T6ztNcAAuJ0srIn55Ac7Fuq3/1ecgeVD08ntmabe4WhKRg==
 =931x
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging

* fix --disable-tcg builds (Claudio)
* Fixes for macOS --enable-modules build and OpenBSD curses/iconv detection (myself)
* Start preparing for meson 0.56 (myself)
* Move directory configuration to meson (myself)
* Start untangling qemu_init (myself)
* Windows fixes (Sunil)
* Remove -no-kbm (Thomas)

# gpg: Signature made Mon 26 Oct 2020 11:12:17 GMT
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini-gitlab/tags/for-upstream:
  machine: move SMP initialization from vl.c
  machine: move UP defaults to class_base_init
  machine: remove deprecated -machine enforce-config-section option
  win32: boot broken when bind & data dir are the same
  WHPX: Fix WHPX build break
  configure: move install_blobs from configure to meson
  configure: remove unused variable from config-host.mak
  configure: move directory options from config-host.mak to meson
  configure: allow configuring localedir
  Makefile: separate meson rerun from the rest of the ninja invocation
  Remove deprecated -no-kvm option
  replay: do not build if TCG is not available
  qtest: unbreak non-TCG builds in bios-tables-test
  hw/core/qdev-clock: add a reference on aliased clocks
  do not use colons in test names
  meson: rewrite curses/iconv test
  build: fix macOS --enable-modules build

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-10-26 15:49:11 +00:00
Paolo Bonzini
16bf7a3326 configure: move directory options from config-host.mak to meson
Since installation is not part of Makefiles anymore, Make need not
know the directories anymore.  Meson already knows them through
built-in options, do everything using them instead of the config_host
dictionary.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-26 07:08:38 -04:00
Coiby Xu
049f55502a libvhost-user: Allow vu_message_read to be replaced
Allow vu_message_read to be replaced by one which will make use of the
QIOChannel functions. Thus reading vhost-user message won't stall the
guest. For slave channel, we still use the default vu_message_read.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200918080912.321299-2-coiby.xu@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23 13:42:16 +01:00
Stefan Hajnoczi
ebf101955c virtiofsd: avoid /proc/self/fd tempdir
In order to prevent /proc/self/fd escapes a temporary directory is
created where /proc/self/fd is bind-mounted. This doesn't work on
read-only file systems.

Avoid the temporary directory by bind-mounting /proc/self/fd over /proc.
This does not affect other processes since we remounted / with MS_REC |
MS_SLAVE. /proc must exist and virtiofsd does not use it so it's safe to
do this.

Path traversal can be tested with the following function:

  static void test_proc_fd_escape(struct lo_data *lo)
  {
      int fd;
      int level = 0;
      ino_t last_ino = 0;

      fd = lo->proc_self_fd;
      for (;;) {
          struct stat st;

          if (fstat(fd, &st) != 0) {
              perror("fstat");
              return;
          }
          if (last_ino && st.st_ino == last_ino) {
              fprintf(stderr, "inode number unchanged, stopping\n");
              return;
          }
          last_ino = st.st_ino;

          fprintf(stderr, "Level %d dev %lu ino %lu\n", level,
                  (unsigned long)st.st_dev,
                  (unsigned long)last_ino);
          fd = openat(fd, "..", O_PATH | O_DIRECTORY | O_NOFOLLOW);
          level++;
      }
  }

Before and after this patch only Level 0 is displayed. Without
/proc/self/fd bind-mount protection it is possible to traverse parent
directories.

Fixes: 397ae982f4 ("virtiofsd: jail lo->proc_self_fd")
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Jens Freimann <jfreimann@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201006095826.59813-1-stefanha@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Tested-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-12 12:39:38 +01:00
Dr. David Alan Gilbert
ff3995e2f0 virtiofsd: Call qemu_init_exec_dir
Since fcb4f59c87 qemu_get_local_state_pathname relies on the
init_exec_dir, and virtiofsd asserts because we never set it.
Set it.

Reported-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20201002124015.44820-1-dgilbert@redhat.com>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-12 12:39:38 +01:00
Alex Bennée
f6698f2b03 tools/virtiofsd: add support for --socket-group
If you like running QEMU as a normal user (very common for TCG runs)
but you have to run virtiofsd as a root user you run into connection
problems. Adding support for an optional --socket-group allows the
users to keep using the command line.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

Message-Id: <20200925125147.26943-2-alex.bennee@linaro.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  dgilbert: Split long line
2020-10-12 12:39:38 +01:00
Dr. David Alan Gilbert
2acf4f8fdd virtiofsd: Silence gcc warning
Gcc worries fd might be used unset, in reality it's always set if
fi is set, and only used if fi is set so it's safe.  Initialise it to -1
just to keep gcc happy for now.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200827153657.111098-2-dgilbert@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-12 12:39:38 +01:00
Jiachen Zhang
e12a0edafe virtiofsd: Add -o allow_direct_io|no_allow_direct_io options
Due to the commit 65da453980, the O_DIRECT
open flag of guest applications will be discarded by virtiofsd. While
this behavior makes it consistent with the virtio-9p scheme when guest
applications use direct I/O, we no longer have any chance to bypass the
host page cache.

Therefore, we add a flag 'allow_direct_io' to lo_data. If '-o
 no_allow_direct_io' option is added, or none of '-o allow_direct_io' or
 '-o no_allow_direct_io' is added, the 'allow_direct_io' will be set to
 0, and virtiofsd discards O_DIRECT as before. If '-o allow_direct_io'
is added to the starting command-line, 'allow_direct_io' will be set to
1, so that the O_DIRECT flags will be retained and host page cache can
be bypassed.

Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200824105957.61265-1-zhangjiachen.jaycee@bytedance.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-09-25 12:45:58 +01:00
Vivek Goyal
04d325e86f virtiofsd: Used glib "shared" thread pool
glib offers thread pools and it seems to support "exclusive" and "shared"
thread pools.

https://developer.gnome.org/glib/stable/glib-Thread-Pools.html#g-thread-pool-new

Currently we use "exlusive" thread pools but its performance seems to be
poor. I tried using "shared" thread pools and performance seems much
better. I posted performance results here.

https://www.redhat.com/archives/virtio-fs/2020-September/msg00080.html

So lets switch to shared thread pools. We can think of making it optional
once somebody can show in what cases exclusive thread pools offer better
results. For now, my simple performance tests across the board see
better results with shared thread pools.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20200921213216.GE13362@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  With seccomp fix from Miklos
2020-09-25 12:45:58 +01:00
Marc-André Lureau
ab4c0996f8 meson: use meson datadir instead of qemu_datadir
When cross-compiling, by default qemu_datadir is 'c:\Program
Files\QEMU', which is not recognized as being an absolute path, and
meson will end up adding the prefix again.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20200826110419.528931-6-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-01 08:51:33 -04:00
Stefan Hajnoczi
fd9279ec99 virtiofsd: probe unshare(CLONE_FS) and print an error
An assertion failure is raised during request processing if
unshare(CLONE_FS) fails. Implement a probe at startup so the problem can
be detected right away.

Unfortunately Docker/Moby does not include unshare in the seccomp.json
list unless CAP_SYS_ADMIN is given. Other seccomp.json lists always
include unshare (e.g. podman is unaffected):
https://raw.githubusercontent.com/seccomp/containers-golang/master/seccomp.json

Use "docker run --security-opt seccomp=path/to/seccomp.json ..." if the
default seccomp.json is missing unshare.

Cc: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200727190223.422280-4-stefanha@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-08-28 13:34:52 +01:00
Stefan Hajnoczi
1c7cb1f52e virtiofsd: drop CAP_DAC_READ_SEARCH
virtiofsd does not need CAP_DAC_READ_SEARCH because it already has
the more powerful CAP_DAC_OVERRIDE. Drop it from the list of
capabilities.

This is important because container runtimes may not include
CAP_DAC_READ_SEARCH by default. This patch allows virtiofsd to reduce
its capabilities when running inside a Docker container.

Note that CAP_DAC_READ_SEARCH may be necessary again in the future if
virtiofsd starts using open_by_handle_at(2).

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200727190223.422280-2-stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-08-28 13:34:52 +01:00
Sergio Lopez
e9a78564a1 virtiofsd: Remove "norace" from cmdline help and docs
Commit 93bb3d8d4c ("virtiofsd: remove symlink fallbacks") removed
the implementation of the "norace" option, so remove it from the
cmdline help and the documentation too.

Signed-off-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20200717121110.50580-1-slp@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-08-28 13:34:52 +01:00
Vivek Goyal
88fc107956 virtiofsd: Disable remote posix locks by default
Right now we enable remote posix locks by default. That means when guest
does a posix lock it sends request to server (virtiofsd). But currently
we only support non-blocking posix lock and return -EOPNOTSUPP for
blocking version.

This means that existing applications which are doing blocking posix
locks get -EOPNOTSUPP and fail. To avoid this, people have been
running virtiosd with option "-o no_posix_lock". For new users it
is still a surprise and trial and error takes them to this option.

Given posix lock implementation is not complete in virtiofsd, disable
it by default. This means that posix locks will work with-in applications
in a guest but not across guests. Anyway we don't support sharing
filesystem among different guests yet in virtiofs so this should
not lead to any kind of surprise or regression and will make life
little easier for virtiofs users.

Reported-by: Aa Aa <jimbothom@yandex.com>
Suggested-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-08-28 13:34:52 +01:00
Paolo Bonzini
3f99cf5710 tools/virtiofsd: convert to Meson
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-21 06:30:09 -04:00
Dr. David Alan Gilbert
3005c099ef virtiofsd: Allow addition or removal of capabilities
Allow capabilities to be added or removed from the allowed set for the
daemon; e.g.

default:
CapPrm: 00000000880000df
CapEff: 00000000880000df

-o modcaps=+sys_admin

CapPrm: 00000000882000df
CapEff: 00000000882000df

-o modcaps=+sys_admin:-chown

CapPrm: 00000000882000de
CapEff: 00000000882000de

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200629115420.98443-4-dgilbert@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-07-03 16:23:05 +01:00
Dr. David Alan Gilbert
55b22a60cc virtiofsd: Check capability calls
Check the capability calls worked.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20200629115420.98443-3-dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-07-03 16:23:05 +01:00
Dr. David Alan Gilbert
b1288dfafb virtiofsd: Terminate capability list
capng_updatev is a varargs function that needs a -1 to terminate it,
but it was missing.

In practice what seems to have been happening is that it's added the
capabilities we asked for, then runs into junk on the stack, so if
we're unlucky it might be adding some more, but in reality it's
failing - but after adding the capabilities we asked for.

Fixes: a59feb483b ("virtiofsd: only retain file system capabilities")
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20200629115420.98443-2-dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-07-03 16:23:05 +01:00
Max Reitz
63659fe74e virtiofsd: Whitelist fchmod
lo_setattr() invokes fchmod() in a rarely used code path, so it should
be whitelisted or virtiofsd will crash with EBADSYS.

Said code path can be triggered for example as follows:

On the host, in the shared directory, create a file with the sticky bit
set and a security.capability xattr:
(1) # touch foo
(2) # chmod u+s foo
(3) # setcap '' foo

Then in the guest let some process truncate that file after it has
dropped all of its capabilities (at least CAP_FSETID):

int main(int argc, char *argv[])
{
    capng_setpid(getpid());
    capng_clear(CAPNG_SELECT_BOTH);
    capng_updatev(CAPNG_ADD, CAPNG_PERMITTED | CAPNG_EFFECTIVE, 0);
    capng_apply(CAPNG_SELECT_BOTH);

    ftruncate(open(argv[1], O_RDWR), 0);
}

This will cause the guest kernel to drop the sticky bit (i.e. perform a
mode change) as part of the truncate (where FATTR_FH is set), and that
will cause virtiofsd to invoke fchmod() instead of fchmodat().

(A similar configuration exists further below with futimens() vs.
utimensat(), but the former is not a syscall but just a wrapper for the
latter, so no further whitelisting is required.)

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1842667
Reported-by: Qian Cai <caiqian@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200608093111.14942-1-mreitz@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-06-17 17:48:38 +01:00
Miklos Szeredi
93bb3d8d4c virtiofsd: remove symlink fallbacks
Path lookup in the kernel has special rules for looking up magic symlinks
under /proc.  If a filesystem operation is instructed to follow symlinks
(e.g. via AT_SYMLINK_FOLLOW or lack of AT_SYMLINK_NOFOLLOW), and the final
component is such a proc symlink, then the target of the magic symlink is
used for the operation, even if the target itself is a symlink.  I.e. path
lookup is always terminated after following a final magic symlink.

I was erronously assuming that in the above case the target symlink would
also be followed, and so workarounds were added for a couple of operations
to handle the symlink case.  Since the symlink can be handled simply by
following the proc symlink, these workardouds are not needed.

Also remove the "norace" option, which disabled the workarounds.

Commit bdfd667883 ("virtiofsd: Fix xattr operations") already dealt with
the same issue for xattr operations.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Message-Id: <20200514140736.20561-1-mszeredi@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-06-01 18:44:27 +01:00
Stefan Hajnoczi
66502bbca3 virtiofsd: drop all capabilities in the wait parent process
All this process does is wait for its child.  No capabilities are
needed.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-05-01 20:05:37 +01:00
Stefan Hajnoczi
a59feb483b virtiofsd: only retain file system capabilities
virtiofsd runs as root but only needs a subset of root's Linux
capabilities(7).  As a file server its purpose is to create and access
files on behalf of a client.  It needs to be able to access files with
arbitrary uid/gid owners.  It also needs to be create device nodes.

Introduce a Linux capabilities(7) whitelist and drop all capabilities
that we don't need, making the virtiofsd process less powerful than a
regular uid root process.

  # cat /proc/PID/status
  ...
          Before           After
  CapInh: 0000000000000000 0000000000000000
  CapPrm: 0000003fffffffff 00000000880000df
  CapEff: 0000003fffffffff 00000000880000df
  CapBnd: 0000003fffffffff 0000000000000000
  CapAmb: 0000000000000000 0000000000000000

Note that file capabilities cannot be used to achieve the same effect on
the virtiofsd executable because mount is used during sandbox setup.
Therefore we drop capabilities programmatically at the right point
during startup.

This patch only affects the sandboxed child process.  The parent process
that sits in waitpid(2) still has full root capabilities and will be
addressed in the next patch.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20200416164907.244868-2-stefanha@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-05-01 18:57:31 +01:00
Max Reitz
ace0829c0d virtiofsd: Show submounts
Currently, setup_mounts() bind-mounts the shared directory without
MS_REC.  This makes all submounts disappear.

Pass MS_REC so that the guest can see submounts again.

Fixes: 5baa3b8e95
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200424133516.73077-1-mreitz@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  Changed Fixes to point to the commit with the problem rather than
          the commit that turned it on
2020-05-01 18:52:17 +01:00
Miklos Szeredi
397ae982f4 virtiofsd: jail lo->proc_self_fd
While it's not possible to escape the proc filesystem through
lo->proc_self_fd, it is possible to escape to the root of the proc
filesystem itself through "../..".

Use a temporary mount for opening lo->proc_self_fd, that has it's root at
/proc/self/fd/, preventing access to the ancestor directories.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Message-Id: <20200429124733.22488-1-mszeredi@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-05-01 18:46:54 +01:00
Stefan Hajnoczi
8c1d353d10 virtiofsd: stay below fs.file-max sysctl value (CVE-2020-10717)
The system-wide fs.file-max sysctl value determines how many files can
be open.  It defaults to a value calculated based on the machine's RAM
size.  Previously virtiofsd would try to set RLIMIT_NOFILE to 1,000,000
and this allowed the FUSE client to exhaust the number of open files
system-wide on Linux hosts with less than 10 GB of RAM!

Take fs.file-max into account when choosing the default RLIMIT_NOFILE
value.

Fixes: CVE-2020-10717
Reported-by: Yuval Avrahami <yavrahami@paloaltonetworks.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200501140644.220940-3-stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-05-01 18:41:56 +01:00
Stefan Hajnoczi
6dbb716877 virtiofsd: add --rlimit-nofile=NUM option
Make it possible to specify the RLIMIT_NOFILE on the command-line.
Users running multiple virtiofsd processes should allocate a certain
number to each process so that the system-wide limit can never be
exhausted.

When this option is set to 0 the rlimit is left at its current value.
This is useful when a management tool wants to configure the rlimit
itself.

The default behavior remains unchanged: try to set the limit to
1,000,000 file descriptors if the current rlimit is lower.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200501140644.220940-2-stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-05-01 18:41:55 +01:00
Philippe Mathieu-Daudé
e1cd92d95c tools/virtiofsd/passthrough_ll: Fix double close()
On success, the fdopendir() call closes fd. Later on the error
path we try to close an already-closed fd. This can lead to
use-after-free. Fix by only closing the fd if the fdopendir()
call failed.

Cc: qemu-stable@nongnu.org
Fixes: b39bce121b (add dirp_map to hide lo_dirp pointers)
Reported-by: Coverity (CID 1421933 USE_AFTER_FREE)
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200321120654.7985-1-philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-03-25 12:31:38 +00:00
Misono Tomohiro
bdfd667883 virtiofsd: Fix xattr operations
Current virtiofsd has problems about xattr operations and
they does not work properly for directory/symlink/special file.

The fundamental cause is that virtiofsd uses openat() + f...xattr()
systemcalls for xattr operation but we should not open symlink/special
file in the daemon. Therefore the function is restricted.

Fix this problem by:
 1. during setup of each thread, call unshare(CLONE_FS)
 2. in xattr operations (i.e. lo_getxattr), if inode is not a regular
    file or directory, use fchdir(proc_loot_fd) + ...xattr() +
    fchdir(root.fd) instead of openat() + f...xattr()

    (Note: for a regular file/directory openat() + f...xattr()
     is still used for performance reason)

With this patch, xfstests generic/062 passes on virtiofs.

This fix is suggested by Miklos Szeredi and Stefan Hajnoczi.
The original discussion can be found here:
  https://www.redhat.com/archives/virtio-fs/2019-October/msg00046.html

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Message-Id: <20200227055927.24566-3-misono.tomohiro@jp.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-03-03 15:13:24 +00:00
Misono Tomohiro
16e15a7308 virtiofsd: passthrough_ll: cleanup getxattr/listxattr
This is a cleanup patch to simplify the following xattr fix and
there is no functional changes.

- Move memory allocation to head of the function
- Unify fgetxattr/flistxattr call for both size == 0 and
  size != 0 case
- Remove redundant lo_inode_put call in error path
  (Note: second call is ignored now since @inode is already NULL)

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Message-Id: <20200227055927.24566-2-misono.tomohiro@jp.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-03-03 15:13:24 +00:00
Xiao Yang
285eb7a704 virtiofsd: Remove fuse.h and struct fuse_module
All code in fuse.h and struct fuse_module are not used by virtiofsd
so removing them is safe.

Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-02-21 12:53:17 +00:00
Philippe Mathieu-Daudé
09c086b2a1 tools/virtiofsd/fuse_lowlevel: Fix fuse_out_header::error value
Fix warning reported by Clang static code analyzer:

    CC      tools/virtiofsd/fuse_lowlevel.o
  tools/virtiofsd/fuse_lowlevel.c:195:9: warning: Value stored to 'error' is never read
          error = -ERANGE;
          ^       ~~~~~~~

Fixes: 3db2876
Reported-by: Clang Static Analyzer
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-02-21 12:53:17 +00:00
Philippe Mathieu-Daudé
4e1fb9e7bc tools/virtiofsd/passthrough_ll: Remove unneeded variable assignment
Fix warning reported by Clang static code analyzer:

    CC      tools/virtiofsd/passthrough_ll.o
  tools/virtiofsd/passthrough_ll.c:925:9: warning: Value stored to 'newfd' is never read
          newfd = -1;
          ^       ~~
  tools/virtiofsd/passthrough_ll.c:942:9: warning: Value stored to 'newfd' is never read
          newfd = -1;
          ^       ~~

Fixes: 7c6b66027
Reported-by: Clang Static Analyzer
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-02-21 12:53:17 +00:00
Philippe Mathieu-Daudé
d4db6f545d tools/virtiofsd/passthrough_ll: Remove unneeded variable assignment
Fix warning reported by Clang static code analyzer:

    CC      tools/virtiofsd/passthrough_ll.o
  tools/virtiofsd/passthrough_ll.c:1083:5: warning: Value stored to 'saverr' is never read
      saverr = ENOMEM;
      ^        ~~~~~~

Fixes: 7c6b66027
Reported-by: Clang Static Analyzer
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-02-21 12:53:17 +00:00
Dr. David Alan Gilbert
82c1474e68 virtiofsd: Help message fix for 'seconds'
second should be seconds.

Reported-by: Christophe de Dinechin <dinechin@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-02-21 12:53:17 +00:00
Dr. David Alan Gilbert
99ce9a7e60 virtiofsd: do_read missing NULL check
Missing a NULL check if the argument fetch fails.

Fixes: Coverity CID 1413119
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-02-10 17:24:43 +00:00
Dr. David Alan Gilbert
686391112f virtiofsd: load_capng missing unlock
Missing unlock in error path.

Fixes: Covertiy CID 1413123
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-02-10 17:24:43 +00:00
Dr. David Alan Gilbert
6fa249027f virtiofsd: fv_create_listen_socket error path socket leak
If we fail when bringing up the socket we can leak the listen_fd;
in practice the daemon will exit so it's not really a problem.

Fixes: Coverity CID 1413121
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-02-10 17:24:43 +00:00
Dr. David Alan Gilbert
988717b46b virtiofsd: Remove fuse_req_getgroups
Remove fuse_req_getgroups that's unused in virtiofsd; it came in
from libfuse but we don't actually use it.  It was called from
fuse_getgroups which we previously removed (but had left it's header
in).

Coverity had complained about null termination in it, but removing
it is the easiest answer.

Fixes: Coverity CID: 1413117 (String not null terminated)
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-02-10 17:24:43 +00:00
Masayoshi Mizuma
1d59b1b210 virtiofsd: add some options to the help message
Add following options to the help message:
- cache
- flock|no_flock
- norace
- posix_lock|no_posix_lock
- readdirplus|no_readdirplus
- timeout
- writeback|no_writeback
- xattr|no_xattr

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>

dgilbert: Split cache, norace, posix_lock, readdirplus off
  into our own earlier patches that added the options

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Eryu Guan
9883df8cca virtiofsd: stop all queue threads on exit in virtio_loop()
On guest graceful shutdown, virtiofsd receives VHOST_USER_GET_VRING_BASE
request from VMM and shuts down virtqueues by calling fv_set_started(),
which joins fv_queue_thread() threads. So when virtio_loop() returns,
there should be no thread is still accessing data in fuse session and/or
virtio dev.

But on abnormal exit, e.g. guest got killed for whatever reason,
vhost-user socket is closed and virtio_loop() breaks out the main loop
and returns to main(). But it's possible fv_queue_worker()s are still
working and accessing fuse session and virtio dev, which results in
crash or use-after-free.

Fix it by stopping fv_queue_thread()s before virtio_loop() returns,
to make sure there's no-one could access fuse session and virtio dev.

Reported-by: Qingming Su <qingming.su@linux.alibaba.com>
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Xiao Yang
a931b6861e virtiofsd/passthrough_ll: Pass errno to fuse_reply_err()
lo_copy_file_range() passes -errno to fuse_reply_err() and then fuse_reply_err()
changes it to errno again, so that subsequent fuse_send_reply_iov_nofree() catches
the wrong errno.(i.e. reports "fuse: bad error value: ...").

Make fuse_send_reply_iov_nofree() accept the correct -errno by passing errno
directly in lo_copy_file_range().

Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Reviewed-by: Eryu Guan <eguan@linux.alibaba.com>

dgilbert: Sent upstream and now Merged as aa1185e153f774f1df65
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Dr. David Alan Gilbert
fe4c15798a virtiofsd: Convert lo_destroy to take the lo->mutex lock itself
lo_destroy was relying on some implicit knowledge of the locking;
we can avoid this if we create an unref_inode that doesn't take
the lock and then grab it for the whole of the lo_destroy.

Suggested-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Stefan Hajnoczi
951b3120db virtiofsd: add --thread-pool-size=NUM option
Add an option to control the size of the thread pool.  Requests are now
processed in parallel by default.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Stefan Hajnoczi
28f7a3b026 virtiofsd: fix lo_destroy() resource leaks
Now that lo_destroy() is serialized we can call unref_inode() so that
all inode resources are freed.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Stefan Hajnoczi
cdc497c692 virtiofsd: prevent FUSE_INIT/FUSE_DESTROY races
When running with multiple threads it can be tricky to handle
FUSE_INIT/FUSE_DESTROY in parallel with other request types or in
parallel with themselves.  Serialize FUSE_INIT and FUSE_DESTROY so that
malicious clients cannot trigger race conditions.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Stefan Hajnoczi
a3d756c5ae virtiofsd: process requests in a thread pool
Introduce a thread pool so that fv_queue_thread() just pops
VuVirtqElements and hands them to the thread pool.  For the time being
only one worker thread is allowed since passthrough_ll.c is not
thread-safe yet.  Future patches will lift this restriction so that
multiple FUSE requests can be processed in parallel.

The main new concept is struct FVRequest, which contains both
VuVirtqElement and struct fuse_chan.  We now have fv_VuDev for a device,
fv_QueueInfo for a virtqueue, and FVRequest for a request.  Some of
fv_QueueInfo's fields are moved into FVRequest because they are
per-request.  The name FVRequest conforms to QEMU coding style and I
expect the struct fv_* types will be renamed in a future refactoring.

This patch series is not optimal.  fbuf reuse is dropped so each request
does malloc(se->bufsize), but there is no clean and cheap way to keep
this with a thread pool.  The vq_lock mutex is held for longer than
necessary, especially during the eventfd_write() syscall.  Performance
can be improved in the future.

prctl(2) had to be added to the seccomp whitelist because glib invokes
it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
piaojun
c465bba2c9 virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance
fuse_buf_writev() only handles the normal write in which src is buffer
and dest is fd. Specially if src buffer represents guest physical
address that can't be mapped by the daemon process, IO must be bounced
back to the VMM to do it by fuse_buf_copy().

Signed-off-by: Jun Piao <piaojun@huawei.com>
Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
piaojun
9ceaaa15cf virtiofsd: add definition of fuse_buf_writev()
Define fuse_buf_writev() which use pwritev and writev to improve io
bandwidth. Especially, the src bufs with 0 size should be skipped as
their mems are not *block_size* aligned which will cause writev failed
in direct io mode.

Signed-off-by: Jun Piao <piaojun@huawei.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Misono Tomohiro
9b610b09b4 virtiofsd: passthrough_ll: Use cache_readdir for directory open
Since keep_cache(FOPEN_KEEP_CACHE) has no effect for directory as
described in fuse_common.h, use cache_readdir(FOPNE_CACHE_DIR) for
diretory open when cache=always mode.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Misono Tomohiro
8e4e41e39e virtiofsd: Fix data corruption with O_APPEND write in writeback mode
When writeback mode is enabled (-o writeback), O_APPEND handling is
done in kernel. Therefore virtiofsd clears O_APPEND flag when open.
Otherwise O_APPEND flag takes precedence over pwrite() and write
data may corrupt.

Currently clearing O_APPEND flag is done in lo_open(), but we also
need the same operation in lo_create(). So, factor out the flag
update operation in lo_open() to update_open_flags() and call it
in both lo_open() and lo_create().

This fixes the failure of xfstest generic/069 in writeback mode
(which tests O_APPEND write data integrity).

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00