mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Markus Armbruster	52581c718c	Clean up header guards that don't match their file name Header guard symbols should match their file name to make guard collisions less likely. Cleaned up with scripts/clean-header-guards.pl, followed by some renaming of new guard symbols picked by the script to better ones. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20220506134911.2856099-2-armbru@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> [Change to generated file ebpf/rss.bpf.skeleton.h backed out]	2022-05-11 16:49:06 +02:00
Marc-André Lureau	bd2142c353	virtiofsd: replace pipe() with g_unix_open_pipe(CLOEXEC) Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2022-05-03 15:47:08 +04:00
Richard Henderson	c49abc8406	Pull request Small contrib/vhost-user-blk, contrib/vhost-user-scsi, and tools/virtiofsd improvements. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmJmYGYACgkQnKSrs4Gr c8gNIAgAgCEeBMP61cdT8DGBBw26abmrNmCCjXYL3rNcR2GNsn0x9VbedBhSPt9O z+/nej9UkRKHgQ/+V1LqWD2D/TU327nLQ74z1JJvGtjWhvM18XTTAeh1BQbVywKU z+o6WSyP22Xx87cUIuOGGMgNDDfIY2j/t5sU8eR+lxXxDuKXx3tulTV65QlNSw9z 19rb8eJkaau5YWhN5gPEI65O/YVgGUtA+c5z39AoBG85XAAhm+6+mTFfuy8J8gp/ wqr61+xB7bB3AxIOv1/0PWCl3F/+kPs7ybJRGkHMNtKyJtp34Y86kwsVEBtOMGVO wm/ht7FMy2GhnaKGjNMtvJm29ZArqA== =zZcV -----END PGP SIGNATURE----- Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging Pull request Small contrib/vhost-user-blk, contrib/vhost-user-scsi, and tools/virtiofsd improvements. # -----BEGIN PGP SIGNATURE----- # # iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmJmYGYACgkQnKSrs4Gr # c8gNIAgAgCEeBMP61cdT8DGBBw26abmrNmCCjXYL3rNcR2GNsn0x9VbedBhSPt9O # z+/nej9UkRKHgQ/+V1LqWD2D/TU327nLQ74z1JJvGtjWhvM18XTTAeh1BQbVywKU # z+o6WSyP22Xx87cUIuOGGMgNDDfIY2j/t5sU8eR+lxXxDuKXx3tulTV65QlNSw9z # 19rb8eJkaau5YWhN5gPEI65O/YVgGUtA+c5z39AoBG85XAAhm+6+mTFfuy8J8gp/ # wqr61+xB7bB3AxIOv1/0PWCl3F/+kPs7ybJRGkHMNtKyJtp34Y86kwsVEBtOMGVO # wm/ht7FMy2GhnaKGjNMtvJm29ZArqA== # =zZcV # -----END PGP SIGNATURE----- # gpg: Signature made Mon 25 Apr 2022 01:48:38 AM PDT # gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full] * tag 'block-pull-request' of https://gitlab.com/stefanha/qemu: virtiofsd: Add docs/helper for killpriv_v2/no_killpriv_v2 option contrib/vhost-user-blk: add missing GOptionEntry NULL terminator Implements Backend Program conventions for vhost-user-scsi Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2022-04-25 10:21:56 -07:00
Marc-André Lureau	1fbf2665e6	util: replace qemu_get_local_state_pathname() Simplify the function to only return the directory path. Callers are adjusted to use the GLib function to build paths, g_build_filename(). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220420132624.2439741-39-marcandre.lureau@redhat.com>	2022-04-21 17:09:09 +04:00
Marc-André Lureau	49f9522193	include: rename qemu-common.h qemu/help-texts.h Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Warner Losh <imp@bsdimp.com> Message-Id: <20220420132624.2439741-7-marcandre.lureau@redhat.com>	2022-04-21 16:58:24 +04:00
Liu Yiding	d45c83328f	virtiofsd: Add docs/helper for killpriv_v2/no_killpriv_v2 option virtiofsd has introduced killpriv_v2/no_killpriv_v2 for a while. Add description of it to docs/helper. Signed-off-by: Liu Yiding <liuyd.fnst@fujitsu.com> Message-Id: <20220421095151.2231099-1-liuyd.fnst@fujitsu.com> [Small documentation fixes: s/as client supports/as the client supports/ and s/. /. /. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-04-21 12:05:15 +02:00
Hanna Reitz	4ce7a08d3e	virtiofsd: Let meson check for statx.stx_mnt_id In virtiofsd, we assume that the presence of the STATX_MNT_ID macro implies existence of the statx.stx_mnt_id field. Unfortunately, that is not necessarily the case: glibc has introduced the macro in its commit 88a2cf6c4bab6e94a65e9c0db8813709372e9180, but the statx.stx_mnt_id field is still missing from its own headers. Let meson.build actually chek for both STATX_MNT_ID and statx.stx_mnt_id, and set CONFIG_STATX_MNT_ID if both are present. Then, use this config macro in virtiofsd. Closes: https://gitlab.com/qemu-project/qemu/-/issues/882 Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220223092340.9043-1-hreitz@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-03-02 18:12:40 +00:00
Greg Kurz	45b04ef48d	virtiofsd: Add basic support for FUSE_SYNCFS request Honor the expected behavior of syncfs() to synchronously flush all data and metadata to disk on linux systems. If virtiofsd is started with '-o announce_submounts', the client is expected to send a FUSE_SYNCFS request for each individual submount. In this case, we just create a new file descriptor on the submount inode with lo_inode_open(), call syncfs() on it and close it. The intermediary file is needed because O_PATH descriptors aren't backed by an actual file and syncfs() would fail with EBADF. If virtiofsd is started without '-o announce_submounts' or if the client doesn't have the FUSE_CAP_SUBMOUNTS capability, the client only sends a single FUSE_SYNCFS request for the root inode. The server would thus need to track submounts internally and call syncfs() on each of them. This will be implemented later. Note that syncfs() might suffer from a time penalty if the submounts are being hammered by some unrelated workload on the host. The only solution to prevent that is to avoid shared mounts. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20220215181529.164070-2-groug@kaod.org> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-02-17 17:22:26 +00:00
Vivek Goyal	963061dc11	virtiofsd: Add an option to enable/disable security label Provide an option "-o security_label/no_security_label" to enable/disable security label functionality. By default these are turned off. If enabled, server will indicate to client that it is capable of handling one security label during file creation. Typically this is expected to be a SELinux label. File server will set this label on the file. It will try to set it atomically wherever possible. But its not possible in all the cases. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20220208204813.682906-11-vgoyal@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-02-17 17:22:26 +00:00
Vivek Goyal	a675c9a600	virtiofsd: Create new file using O_TMPFILE and set security context If guest and host policies can't work with each other, then guest security context (selinux label) needs to be set into an xattr. Say remap guest security.selinux xattr to trusted.virtiofs.security.selinux. That means setting "fscreate" is not going to help as that's ony useful for security.selinux xattr on host. So we need another method which is atomic. Use O_TMPFILE to create new file, set xattr and then linkat() to proper place. But this works only for regular files. So dir, symlinks will continue to be non-atomic. Also if host filesystem does not support O_TMPFILE, we fallback to non-atomic behavior. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20220208204813.682906-10-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-02-17 17:22:26 +00:00
Vivek Goyal	0c3f81e131	virtiofsd: Create new file with security context This patch adds support for creating new file with security context as sent by client. It basically takes three paths. - If no security context enabled, then it continues to create files without security context. - If security context is enabled and but security.selinux has not been remapped, then it uses /proc/thread-self/attr/fscreate knob to set security context and then create the file. This will make sure that newly created file gets the security context as set in "fscreate" and this is atomic w.r.t file creation. This is useful and host and guest SELinux policies don't conflict and can work with each other. In that case, guest security.selinux xattr is not remapped and it is passthrough as "security.selinux" xattr on host. - If security context is enabled but security.selinux xattr has been remapped to something else, then it first creates the file and then uses setxattr() to set the remapped xattr with the security context. This is a non-atomic operation w.r.t file creation. This mode will be most versatile and allow host and guest to have their own separate SELinux xattrs and have their own separate SELinux policies. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20220208204813.682906-9-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-02-17 17:22:26 +00:00
Vivek Goyal	cb282e556a	virtiofsd: Add helpers to work with /proc/self/task/tid/attr/fscreate Soon we will be able to create and also set security context on the file atomically using /proc/self/task/tid/attr/fscreate knob. If this knob is available on the system, first set the knob with the desired context and then create the file. It will be created with the context set in fscreate. This works basically for SELinux and its per thread. This patch just introduces the helper functions. Subsequent patches will make use of these helpers. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20220208204813.682906-8-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Manually merged gettid syscall number fixup from Vivek	2022-02-17 17:22:26 +00:00
Vivek Goyal	81489726ad	virtiofsd: Move core file creation code in separate function Move core file creation bits in a separate function. Soon this is going to get more complex as file creation need to set security context also. And there will be multiple modes of file creation in next patch. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20220208204813.682906-7-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-02-17 17:22:26 +00:00
Vivek Goyal	36cfab870e	virtiofsd, fuse_lowlevel.c: Add capability to parse security context Add capability to enable and parse security context as sent by client and put into fuse_req. Filesystems now can get security context from request and set it on files during creation. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20220208204813.682906-6-vgoyal@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-02-17 17:22:26 +00:00
Vivek Goyal	4c7c393c7b	virtiofsd: Extend size of fuse_conn_info->capable and ->want fields ->capable keeps track of what capabilities kernel supports and ->wants keep track of what capabilities filesytem wants. Right now these fields are 32bit in size. But now fuse has run out of bits and capabilities can now have bit number which are higher than 31. That means 32 bit fields are not suffcient anymore. Increase size to 64 bit so that we can add newer capabilities and still be able to use existing code to check and set the capabilities. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20220208204813.682906-5-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-02-17 17:22:26 +00:00
Vivek Goyal	776dc4b165	virtiofsd: Parse extended "struct fuse_init_in" Add some code to parse extended "struct fuse_init_in". And use a local variable "flag" to represent 64 bit flags. This will make it easier to add more features without having to worry about two 32bit flags (->flags and ->flags2) in "fuse_struct_in". Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20220208204813.682906-4-vgoyal@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Fixed up long line	2022-02-17 17:22:19 +00:00
Vivek Goyal	a086d54c6f	virtiofsd: Fix breakage due to fuse_init_in size change Kernel version 5.17 has increased the size of "struct fuse_init_in" struct. Previously this struct was 16 bytes and now it has been extended to 64 bytes in size. Once qemu headers are updated to latest, it will expect to receive 64 byte size struct (for protocol version major 7 and minor > 6). But if guest is booting older kernel (older than 5.17), then it still sends older fuse_init_in of size 16 bytes. And do_init() fails. It is expecting 64 byte struct. And this results in mount of virtiofs failing. Fix this by parsing 16 bytes only for now. Separate patches will be posted which will parse rest of the bytes and enable new functionality. Right now we don't support any of the new functionality, so we don't lose anything by not parsing bytes beyond 16. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20220208204813.682906-2-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-02-17 17:21:43 +00:00
Sebastian Hasler	41af4459ac	virtiofsd: Do not support blocking flock With the current implementation, blocking flock can lead to deadlock. Thus, it's better to return EOPNOTSUPP if a user attempts to perform a blocking flock request. Signed-off-by: Sebastian Hasler <sebastian.hasler@stuvus.uni-stuttgart.de> Message-Id: <20220113153249.710216-1-sebastian.hasler@stuvus.uni-stuttgart.de> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2022-02-16 17:29:31 +00:00
Christian Ehrhardt	7b223e3860	tools/virtiofsd: Add rseq syscall to the seccomp allowlist The virtiofsd currently crashes when used with glibc 2.35. That is due to the rseq system call being added to every thread creation [1][2]. [1]: https://www.efficios.com/blog/2019/02/08/linux-restartable-sequences/ [2]: https://sourceware.org/pipermail/libc-alpha/2022-February/136040.html This happens not at daemon start, but when a guest connects /usr/lib/qemu/virtiofsd -f --socket-path=/tmp/testvfsd -o sandbox=chroot \ -o source=/var/guests/j-virtiofs --socket-group=kvm virtio_session_mount: Waiting for vhost-user socket connection... # start ok, now guest will connect virtio_session_mount: Received vhost-user socket connection virtio_loop: Entry fv_queue_set_started: qidx=0 started=1 fv_queue_set_started: qidx=1 started=1 Bad system call (core dumped) We have to put rseq on the seccomp allowlist to avoid that the daemon is crashing in this case. Reported-by: Michael Hudson-Doyle <michael.hudson@canonical.com> Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-id: 20220209111456.3328420-1-christian.ehrhardt@canonical.com [Moved rseq to its alphabetically ordered position in the seccomp allowlist. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-02-14 17:11:20 +00:00
Vivek Goyal	449e8171f9	virtiofsd: Drop membership of all supplementary groups (CVE-2022-0358) At the start, drop membership of all supplementary groups. This is not required. If we have membership of "root" supplementary group and when we switch uid/gid using setresuid/setsgid, we still retain membership of existing supplemntary groups. And that can allow some operations which are not normally allowed. For example, if root in guest creates a dir as follows. $ mkdir -m 03777 test_dir This sets SGID on dir as well as allows unprivileged users to write into this dir. And now as unprivileged user open file as follows. $ su test $ fd = open("test_dir/priviledge_id", O_RDWR\|O_CREAT\|O_EXCL, 02755); This will create SGID set executable in test_dir/. And that's a problem because now an unpriviliged user can execute it, get egid=0 and get access to resources owned by "root" group. This is privilege escalation. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2044863 Fixes: CVE-2022-0358 Reported-by: JIETAO XIAO <shawtao1125@gmail.com> Suggested-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <YfBGoriS38eBQrAb@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Fixed missing {}'s style nit	2022-01-26 10:32:05 +00:00
Dr. David Alan Gilbert	555a76e5e5	virtiofsd: Error on bad socket group name Make the '--socket-group=' option fail if the group name is unknown: ./tools/virtiofsd/virtiofsd .... --socket-group=zaphod vhost socket: unable to find group 'zaphod' Reported-by: Xiaoling Gao <xiagao@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20211014122554.34599-1-dgilbert@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-10-25 19:38:32 +01:00
Vivek Goyal	50cf6d6cb7	virtiofsd: Add a helper to stop all queues Use a helper to stop all the queues. Later in the patch series I am planning to use this helper at one more place later in the patch series. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210930153037.1194279-6-vgoyal@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-10-25 18:58:42 +01:00
Vivek Goyal	c68276556a	virtiofsd: Add a helper to send element on virtqueue We have open coded logic to take locks and push element on virtqueue at three places. Add a helper and use it everywhere. Code is easier to read and less number of lines of code. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210930153037.1194279-5-vgoyal@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-10-25 18:58:20 +01:00
Vivek Goyal	a88abc6f84	virtiofsd: Remove unused virtio_fs_config definition "struct virtio_fs_config" definition seems to be unused in fuse_virtio.c. Remove it. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210930153037.1194279-4-vgoyal@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-10-25 18:58:02 +01:00
Vivek Goyal	5afc8df46c	virtiofsd: xattr mapping add a new type "unsupported" Right now for xattr remapping, we support types of "prefix", "ok" or "bad". Type "bad" returns -EPERM on setxattr and hides xattr in listxattr. For getxattr, mapping code returns -EPERM but getxattr code converts it to -ENODATA. I need a new semantics where if an xattr is unsupported, then getxattr()/setxattr() return -ENOTSUP and listxattr() should hide the xattr. This is needed to simulate that security.selinux is not supported by virtiofs filesystem and in that case client falls back to some default label specified by policy. So add a new type "unsupported" which returns -ENOTSUP on getxattr() and setxattr() and hides xattrs in listxattr(). For example, one can use following mapping rule to not support security.selinux xattr and allow others. "-o xattrmap=/unsupported/all/security.selinux/security.selinux//ok/all///" Suggested-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <YUt9qbmgAfCFfg5t@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-10-25 18:48:23 +01:00
Peter Maydell	7adb961995	virtiofsd pull 2021-08-16 Two minor fixes; one for performance, the other seccomp on s390x. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEERfXHG0oMt/uXep+pBRYzHrxb/ecFAmFDS+oACgkQBRYzHrxb /edhFg//Wldr2jMom7TlbgkhsbjPQbdbsbdHx7rWXviq3bqMq2cB8UzcHvhUFkW4 j15ElKDO2ZFsjtjaFQLtQmuO4zoMN/V4u8c/0V93b/vaAh4IgMYiDXaZ18fpuYi9 Y02tiTcTUyUj4fv5OUoUeynNUkkzgxGrL8Q5oZK3KSHn5uwFOWgnZiAycbECKzJH wNljEpXjyMcZoIUJvoJ9oO246be3Flo82eA8UVOj+O0MMb2/tl18DL5IGOnglBYB U/9CkFoa5qHfiIQ63/OsndHBMXSTZiOqnv+S9RSbAdoZRTjVP1BjFYvNjnzz9/Nv czHfM+ecjLxNzG7WSHOrdm8D+L3E4O42Xuf6umcib1KfV6l/giQ3V9WfqfEX1JA4 V6XpZ5C25rs6AqFwbbh/Eo+wvb2syck30sXbwh7C1oDK/nfwHDgegd1EPE9VUaxi c0yoqV/ScHfo2fbK0QVkIt2mwi8lcjH3cg2gSKfZ0YiHcYBqC7RB9IonndEkIoqd mdvAdGafD27dDEsm1OkSxBDItmE+BZ7C2+7OF5x+a/rnt7L+yQszTG/A0DNH23kD ktvTjEBLx2jzhmR+4My5YnTYoK0rE3zs0Nh2zxg7Kx9Sh3LlMtdN5o2GS0USq0wm YCT4EfNAhPumPi+59mOCybgV6tayxz/ihgjAymQONAcRrTwddyo= =GMdE -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgilbert-gitlab/tags/pull-virtiofs-20210916' into staging virtiofsd pull 2021-08-16 Two minor fixes; one for performance, the other seccomp on s390x. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> # gpg: Signature made Thu 16 Sep 2021 14:51:38 BST # gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7 # gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full] # Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7 * remotes/dgilbert-gitlab/tags/pull-virtiofs-20210916: virtiofsd: Reverse req_list before processing it tools/virtiofsd: Add fstatfs64 syscall to the seccomp allowlist Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-09-19 18:53:29 +01:00
Sergio Lopez	046d91c83c	virtiofsd: Reverse req_list before processing it With the thread pool disabled, we add the requests in the queue to a GList, processing by iterating over there afterwards. For adding them, we're using "g_list_prepend()", which is more efficient but causes the requests to be processed in reverse order, breaking the read-ahead and request-merging optimizations in the host for sequential operations. According to the documentation, if you need to process the request in-order, using "g_list_prepend()" and then reversing the list with "g_list_reverse()" is more efficient than using "g_list_append()", so let's do it that way. Testing on a spinning disk (to boost the increase of read-ahead and request-merging) shows a 4x improvement on sequential write fio test: Test: fio --directory=/mnt/virtio-fs --filename=fio-file1 --runtime=20 --iodepth=16 --size=4G --direct=1 --blocksize=4K --ioengine libaio --rw write --name seqwrite-libaio Without "g_list_reverse()": ... Jobs: 1 (f=1): [W(1)][100.0%][w=22.4MiB/s][w=5735 IOPS][eta 00m:00s] seqwrite-libaio: (groupid=0, jobs=1): err= 0: pid=710: Tue Aug 24 12:58:16 2021 write: IOPS=5709, BW=22.3MiB/s (23.4MB/s)(446MiB/20002msec); 0 zone resets ... With "g_list_reverse()": ... Jobs: 1 (f=1): [W(1)][100.0%][w=84.0MiB/s][w=21.5k IOPS][eta 00m:00s] seqwrite-libaio: (groupid=0, jobs=1): err= 0: pid=716: Tue Aug 24 13:00:15 2021 write: IOPS=21.3k, BW=83.1MiB/s (87.2MB/s)(1663MiB/20001msec); 0 zone resets ... Signed-off-by: Sergio Lopez <slp@redhat.com> Message-Id: <20210824131158.39970-1-slp@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-09-16 14:50:48 +01:00
Thomas Huth	8cfd339b3d	tools/virtiofsd: Add fstatfs64 syscall to the seccomp allowlist The virtiofsd currently crashes on s390x when doing something like this in the guest: mkdir -p /mnt/myfs mount -t virtiofs myfs /mnt/myfs touch /mnt/myfs/foo.txt stat -f /mnt/myfs/foo.txt The problem is that the fstatfs64 syscall is called in this case from the virtiofsd. We have to put it on the seccomp allowlist to avoid that the daemon gets killed in this case. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2001728 Suggested-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210914123214.181885-1-thuth@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-09-16 14:50:48 +01:00
Michael Tokarev	68857f13aa	spelling: sytem => system Signed-off-By: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <fefb5f5c-82bc-05e2-b4c1-665e9d6896ff@msgid.tls.msk.ru> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-09-15 15:51:07 +02:00
Hubert Jasudowicz	7ef2408a96	virtiofsd: Add missing newline in error message Signed-off-by: Hubert Jasudowicz <hubert.jasudowicz@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <e5914ad202a13e9c1bc2a5efa267ff3bd4f48db6.1625173475.git.hubert.jasudowicz@gmail.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-07-09 18:42:46 +02:00
Vivek Goyal	65a820d292	virtiofsd: Add an option to enable/disable posix acls fuse has an option FUSE_POSIX_ACL which needs to be opted in by fuse server to enable posix acls. As of now we are not opting in for this, so posix acls are disabled on virtiofs by default. Add virtiofsd option "-o posix_acl/no_posix_acl" to let users enable/disable posix acl support. By default it is disabled as of now due to performance concerns with cache=none. Currently even if file server has not opted in for FUSE_POSIX_ACL, user can still query acl and set acl, and system.posix_acl_access and system.posix_acl_default xattrs show up listxattr response. Miklos said this is confusing. So he said lets block and filter system.posix_acl_access and system.posix_acl_default xattrs in getxattr/setxattr/listxattr if user has explicitly disabled posix acls using -o no_posix_acl. As of now continuing to keeping the existing behavior if user did not specify any option to disable acl support due to concerns about backward compatibility. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210622150852.1507204-8-vgoyal@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-05 10:51:26 +01:00
Vivek Goyal	f1aa1774df	virtiofsd: Switch creds, drop FSETID for system.posix_acl_access xattr When posix access acls are set on a file, it can lead to adjusting file permissions (mode) as well. If caller does not have CAP_FSETID and it also does not have membership of owner group, this will lead to clearing SGID bit in mode. Current fuse code is written in such a way that it expects file server to take care of chaning file mode (permission), if there is a need. Right now, host kernel does not clear SGID bit because virtiofsd is running as root and has CAP_FSETID. For host kernel to clear SGID, virtiofsd need to switch to gid of caller in guest and also drop CAP_FSETID (if caller did not have it to begin with). If SGID needs to be cleared, client will set the flag FUSE_SETXATTR_ACL_KILL_SGID in setxattr request. In that case server should kill sgid. Currently just switch to uid/gid of the caller and drop CAP_FSETID and that should do it. This should fix the xfstest generic/375 test case. We don't have to switch uid for this to work. That could be one optimization that pass a parameter to lo_change_cred() to only switch gid and not uid. Also this will not work whenever (if ever) we support idmapped mounts. In that case it is possible that uid/gid in request are 0/0 but still we need to clear SGID. So we will have to pick a non-root sgid and switch to that instead. That's an TODO item for future when idmapped mount support is introduced. This patch only adds the capability to switch creds and drop FSETID when acl xattr is set. This does not take affect yet. It can take affect when next patch adds the capability to enable posix_acl. Reported-by: Luis Henriques <lhenriques@suse.de> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210622150852.1507204-7-vgoyal@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-05 10:51:26 +01:00
Vivek Goyal	227e5d7fd5	virtiofsd: Add capability to change/restore umask When parent directory has default acl and a file is created in that directory, then umask is ignored and final file permissions are determined using default acl instead. (man 2 umask). Currently, fuse applies the umask and sends modified mode in create request accordingly. fuse server can set FUSE_DONT_MASK and tell fuse client to not apply umask and fuse server will take care of it as needed. With posix acls enabled, requirement will be that we want umask to determine final file mode if parent directory does not have default acl. So if posix acls are enabled, opt in for FUSE_DONT_MASK. virtiofsd will set umask of the thread doing file creation. And host kernel should use that umask if parent directory does not have default acls, otherwise umask does not take affect. Miklos mentioned that we already call unshare(CLONE_FS) for every thread. That means umask has now become property of per thread and it should be ok to manipulate it in file creation path. This patch only adds capability to change umask and restore it. It does not enable it yet. Next few patches will add capability to enable it based on if user enabled posix_acl or not. This should fix fstest generic/099. Reported-by: Luis Henriques <lhenriques@suse.de> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210622150852.1507204-6-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-05 10:51:26 +01:00
Vivek Goyal	6d0028b947	virtiofsd: Add umask to seccom allow list Patches in this series are going to make use of "umask" syscall. So allow it. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210622150852.1507204-5-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-05 10:51:26 +01:00
Vivek Goyal	c46ef954fa	virtiofsd: Add support for extended setxattr Add the bits to enable support for setxattr_ext if fuse offers it. Do not enable it by default yet. Let passthrough_ll opt-in. Enabling it by deafult kind of automatically means that you are taking responsibility of clearing SGID if ACL is set. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210622150852.1507204-4-vgoyal@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Fixed up double def in fuse_common.h	2021-07-05 10:51:26 +01:00
Vivek Goyal	5290fb625d	virtiofsd: Fix xattr operations overwriting errno getxattr/setxattr/removexattr/listxattr operations handle regualar and non-regular files differently. For the case of non-regular files we do fchdir(/proc/self/fd) and the xattr operation and then revert back to original working directory. After this we are saving errno and that's buggy because fchdir() will overwrite the errno. FCHDIR_NOFAIL(lo->proc_self_fd); ret = getxattr(procname, name, value, size); FCHDIR_NOFAIL(lo->root.fd); if (ret == -1) saverr = errno In above example, if getxattr() failed, we will still return 0 to caller as errno must have been written by FCHDIR_NOFAIL(lo->root.fd) call. Fix all such instances and capture "errno" early and save in "saverr" variable. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210622150852.1507204-3-vgoyal@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-05 10:51:26 +01:00
Vivek Goyal	061624455f	virtiofsd: Fix fuse setxattr() API change issue With kernel header updates fuse_setxattr_in struct has grown in size. But this new struct size only takes affect if user has opted in for fuse feature FUSE_SETXATTR_EXT otherwise fuse continues to send "fuse_setxattr_in" of older size. Older size is determined by FUSE_COMPAT_SETXATTR_IN_SIZE. Fix this. If we have not opted in for FUSE_SETXATTR_EXT, then expect that we will get fuse_setxattr_in of size FUSE_COMPAT_SETXATTR_IN_SIZE and not sizeof(struct fuse_sexattr_in). Fixes: `278f064e45` ("Update Linux headers to 5.13-rc4") Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210622150852.1507204-2-vgoyal@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-05 10:51:26 +01:00
Greg Kurz	1d03e56607	virtiofsd: Don't allow file creation with FUSE_OPEN A well behaved FUSE client uses FUSE_CREATE to create files. It isn't supposed to pass O_CREAT along a FUSE_OPEN request, as documented in the "fuse_lowlevel.h" header : /** * Open a file * * Open flags are available in fi->flags. The following rules * apply. * * - Creation (O_CREAT, O_EXCL, O_NOCTTY) flags will be * filtered out / handled by the kernel. But if the client happens to do it anyway, the server ends up passing this flag to open() without the mandatory mode_t 4th argument. Since open() is a variadic function, glibc will happily pass whatever it finds on the stack to the syscall. If this file is compiled with -D_FORTIFY_SOURCE=2, glibc will even detect that and abort: * invalid openat64 call: O_CREAT or O_TMPFILE without mode *: terminated Specifying O_CREAT with FUSE_OPEN is a protocol violation. Check this in do_open(), print out a message and return an error to the client, EINVAL like we already do when fuse_mbuf_iter_advance() fails. The FUSE filesystem doesn't currently support O_TMPFILE, but the very same would happen if O_TMPFILE was passed in a FUSE_OPEN request. Check that as well. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210624101809.48032-1-groug@kaod.org> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-05 10:51:26 +01:00
Daniel P. Berrangé	d9a801f7e9	virtiofsd: use GDateTime for formatting timestamp for debug messages The GDateTime APIs provided by GLib avoid portability pitfalls, such as some platforms where 'struct timeval.tv_sec' field is still 'long' instead of 'time_t'. When combined with automatic cleanup, GDateTime often results in simpler code too. Localtime is changed to UTC to avoid the need to grant extra seccomp permissions for GLib's access of the timezone database. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210611164319.67762-1-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-05 10:51:26 +01:00
Mahmoud Mandour	bf99f30bc3	tools/virtiofsd/fuse_opt.c: Replaced a malloc with GLib's g_try_malloc Replaced a malloc() call and its respective free() with GLib's g_try_malloc() and g_free() calls. Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com> Message-Id: <20210314032324.45142-8-ma.mandourr@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-26 18:39:32 +01:00
Mahmoud Mandour	d14d4f4f18	tools/virtiofsd/buffer.c: replaced a calloc call with GLib's g_try_new0 Replaced a call to calloc() and its respective free() call with GLib's g_try_new0() and g_free() calls. Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com> Message-Id: <20210314032324.45142-7-ma.mandourr@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-26 18:39:32 +01:00
Vivek Goyal	b5fd59cf90	virtiofsd: Set req->reply_sent right after sending reply There is no reason to set it in label "err". We should be able to set it right after sending reply. It is easier to read. Also got rid of label "err" because now only thing it was doing was return a code. We can return from the error location itself and no need to first jump to label "err". Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210518213538.693422-8-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-26 18:39:32 +01:00
Vivek Goyal	1a5fff8e63	virtiofsd: Check EOF before short read In virtio_send_data_iov() we are checking first for short read and then EOF condition. Change the order. Basically check for error and EOF first and last remaining piece is short ready which will lead to retry automatically at the end of while loop. Just that it is little simpler to read to the code. There is no need to call "continue" and also one less call of "len-=ret". Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210518213538.693422-7-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-26 18:39:32 +01:00
Vivek Goyal	bf7a3ee044	virtiofsd: Simplify skip byte logic We need to skip bytes in two cases. a. Before we start reading into in_sg, we need to skip iov_len bytes in the beginning which typically will have fuse_out_header. b. If preadv() does a short read, then we need to retry preadv() with remainig bytes and skip the bytes preadv() read in short read. For case a, there is no reason that skipping logic be inside the while loop. Move it outside. And only retain logic "b" inside while loop. Also get rid of variable "skip_size". Looks like we can do without it. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210518213538.693422-6-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-26 18:39:32 +01:00
Vivek Goyal	0106f6f234	virtiofsd: get rid of in_sg_left variable in_sg_left seems to be being used primarly for debugging purpose. It is keeping track of how many bytes are left in the scatter list we are reading into. We already have another variable "len" which keeps track how many bytes are left to be read. And in_sg_left is greater than or equal to len. We have already ensured that in the beginning of function. if (in_len < tosend_len) { fuse_log(FUSE_LOG_ERR, "%s: elem %d too small for data len %zd\n", __func__, elem->index, tosend_len); ret = E2BIG; goto err; } So in_sg_left seems like a redundant variable. It probably was useful for debugging when code was being developed. Get rid of it. It helps simplify this function. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210518213538.693422-5-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-26 18:39:32 +01:00
Vivek Goyal	97dbfc5ae6	virtiofsd: Use iov_discard_front() to skip bytes There are places where we need to skip few bytes from front of the iovec array. We have our own custom code for that. Looks like iov_discard_front() can do same thing. So use that helper instead. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210518213538.693422-4-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-26 18:39:32 +01:00
Vivek Goyal	b31ff38931	virtiofsd: Get rid of unreachable code in read pvreadv() can return following. - error - 0 in case of EOF - short read We seem to handle all the cases already. We are retrying read in case of short read. So another check for short read seems like dead code. Get rid of it. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210518213538.693422-3-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-26 18:39:32 +01:00
Vivek Goyal	04c9f7e04a	virtiofsd: Check for EINTR in preadv() and retry We don't seem to check for EINTR and retry. There are other places in code where we check for EINTR. So lets add a check. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210518213538.693422-2-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-26 18:39:32 +01:00
Greg Kurz	4962b312cd	virtiofsd: Fix check of chown()'s return value Otherwise you always get this warning when using --socket-group=users vhost socket failed to set group to users (100) While here, print out the error if chown() fails. Fixes: `f6698f2b03` ("tools/virtiofsd: add support for --socket-group") Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <162040394890.714971.15502455176528384778.stgit@bahia.lan> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-05-13 17:48:47 +02:00
Mahmoud Mandour	67a010f64c	virtiofsd/fuse_virtio.c: Changed allocations of locals to GLib Replaced the allocation of local variables from malloc() to GLib allocation functions. In one instance, dropped the usage to an assert after a malloc() call and used g_malloc() instead. Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210420154643.58439-8-ma.mandourr@gmail.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-06 19:47:44 +01:00

1 2 3 4 5 ...

254 Commits