88687 Commits

Author SHA1 Message Date
Peter Maydell
9bef7ea9d9 9pfs: misc patches
* Add link to 9p developer docs.
 
 * Fix runtime check whether client supplied relative path is the export
   root.
 
 * Performance optimization of Twalk requests.
 
 * Code cleanup.
 -----BEGIN PGP SIGNATURE-----
 
 iQJLBAABCgA1FiEEltjREM96+AhPiFkBNMK1h2Wkc5UFAmDi6V4XHHFlbXVfb3Nz
 QGNydWRlYnl0ZS5jb20ACgkQNMK1h2Wkc5WsiA//dwLfbNFE9dxAZF3JzwYPmgID
 ino0zruJxC2UdHaVYr2r0A6H+uL/K1zFTWj53R1EKo6udhsx5avqJvmuLU5np9MA
 CEflB99YvwEQWLNyM7IRT+IXa+ebe+UvqC7ouRmJeTRuBaSEu7TMj5S2zMj+AD4b
 uNGVylm6T8yAt+r8QQar+TvK9KYuPuHt/yJgzXqg8tDfly1GGzLh2uoS/0EZoJQU
 4emyYadMsBMVOh+E29sqFMBpUnHwlLl+t9JpUl3xVXM1ZShcqHBl2QbQGJIDsfh7
 HxXBMKHLiSzJrFJCc2DklsOlaRlP5nZdCVEcO4B0/Sq1kZpmf6r+4V6uYrdu8cCE
 tP33QNhC1yWQk9FgvItGkqRfAOI/KK02TE8WWJuLxbyo2n62lE0rEU1gBJOGPkxQ
 rJTGUiFgweV5Ky+NMRbrB4P8EurPLvcgFEhz7qfOEBanmZKHylzI3vwQg5e9Um/t
 ZABjIfjm95/QB6ufeCvVuktGnmyfBC/WJSZwiDyRG+tBf1V1TJF1P9EwoBOyYexv
 WGHoYfAExytKbyzgsKPZ32zTGzzKLtTBko+ATIEFPCpsf5TWiSzralzz3c2/DF3j
 5PHuPOdL+7nhUN3CSRi2MJzhg+LKW34ca1G8XRpob0u9RkKSjH6VbqKhysB8eQNm
 kE1lfhSe00ZtuZoTPto=
 =WqCY
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20210705' into staging

9pfs: misc patches

* Add link to 9p developer docs.

* Fix runtime check whether client supplied relative path is the export
  root.

* Performance optimization of Twalk requests.

* Code cleanup.

# gpg: Signature made Mon 05 Jul 2021 12:13:34 BST
# gpg:                using RSA key 96D8D110CF7AF8084F88590134C2B58765A47395
# gpg:                issuer "qemu_oss@crudebyte.com"
# gpg: Good signature from "Christian Schoenebeck <qemu_oss@crudebyte.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: ECAB 1A45 4014 1413 BA38  4926 30DB 47C3 A012 D5F4
#      Subkey fingerprint: 96D8 D110 CF7A F808 4F88  5901 34C2 B587 65A4 7395

* remotes/cschoenebeck/tags/pull-9p-20210705:
  9pfs: reduce latency of Twalk
  9pfs: drop root_qid
  9pfs: replace not_same_qid() by same_stat_id()
  9pfs: drop fid_to_qid()
  9pfs: capture root stat
  9pfs: fix not_same_qid()
  9pfs: simplify v9fs_walk()
  9pfs: add link to 9p developer docs

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-05 17:25:02 +01:00
Peter Maydell
715167a36c Migration and virtiofs pull 2021-07-01 v2
Dropped Peter Xu's migration-test fix to reenable
 most of the migration tests when uffd isn't available;
 we're seeing at least one seg in github CI (on qemu-system-i386)
 and Peter Maydell is reporting a hang on Openbsd.
 
 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEERfXHG0oMt/uXep+pBRYzHrxb/ecFAmDi2H8ACgkQBRYzHrxb
 /edgRw//cr5xtzaTP2G3zP/R4T06Hp8xVgGNz8lXuqXOctLJNxMgj2uaty1fq4AL
 vaRHhQ/DFgh5aQU2FbnoMFeaOdsGLbAla+p7bKuDswMGbQnl1sy/qRgmeJyN7McS
 2Zo/pc4xkCv3YkUpVJnvysn4vyA7xyQ6a4ndmmsI3EyqX5d3IWaTrK6vRbFpi/tI
 en6QVH1x5zXkygmFQddyhBNyS1OQGane3AhaetZs5y8FSFUsBcuixsgmSZZIhN5Y
 Y1Yk+qRTQfdXVCYaBjbxoWk7yIiw/X4xy0wwrZUDc6zScv64nZDu+V1olq0DBGsg
 Zo5lFfNx/mzq5FwjD3eU27PSaa9zHOnBmkvIhToGU+od+Ct4sB2Xbq4e2FtVYgX1
 cOhptfE2OewKbNsSHWvKL6UqgxJ4tT+6COclwrdO7bqnIuWpJWPORWYmR5fKvN42
 xnPZYopTpFJkEv2jKZAVif2tWnD+l806ebNtkAlldFwbAudzq4+qMCbMeDopyBP2
 ksrUTchmyFq8nm7cuRvEm2z9EHDLHP/RtwsUZmTiY1A+d1B5HUO2ndByWO/buoc0
 N6Y+UwTldi7ZK2bWphEpNsEqtT2vZs5X5CB3nWLOqPGoPYWnl6H58T/3kh9W/fvG
 ycVoxZ3iH5rr3lsilQcFDwQyG3nprrWj2CO0V7b+oI5BBCdpOnQ=
 =PBi+
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgilbert-gitlab/tags/pull-migration-20210705a' into staging

Migration and virtiofs pull 2021-07-01 v2

Dropped Peter Xu's migration-test fix to reenable
most of the migration tests when uffd isn't available;
we're seeing at least one seg in github CI (on qemu-system-i386)
and Peter Maydell is reporting a hang on Openbsd.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

# gpg: Signature made Mon 05 Jul 2021 11:01:35 BST
# gpg:                using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert-gitlab/tags/pull-migration-20210705a:
  migration/rdma: Use error_report to suppress errno message
  tests/migration: fix "downtime_limit" type when "migrate-set-parameters"
  tests/migration: parse the thread-id key of CpuInfoFast
  virtiofsd: Add an option to enable/disable posix acls
  virtiofsd: Switch creds, drop FSETID for system.posix_acl_access xattr
  virtiofsd: Add capability to change/restore umask
  virtiofsd: Add umask to seccom allow list
  virtiofsd: Add support for extended setxattr
  virtiofsd: Fix xattr operations overwriting errno
  virtiofsd: Fix fuse setxattr() API change issue
  virtiofsd: Don't allow file creation with FUSE_OPEN
  docs: describe the security considerations with virtiofsd xattr mapping
  virtiofsd: use GDateTime for formatting timestamp for debug messages
  migration: failover: continue to wait card unplug on error
  migration: move wait-unplug loop to its own function
  migration: Allow reset of postcopy_recover_triggered when failed
  migration: Move yank outside qemu_start_incoming_migration()
  migration: fix the memory overwriting risk in add_to_iovec
  tests: migration-test: Add dirty ring test

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-05 12:45:24 +01:00
Christian Schoenebeck
8d6cb10073 9pfs: reduce latency of Twalk
As with previous performance optimization on Treaddir handling;
reduce the overall latency, i.e. overall time spent on processing
a Twalk request by reducing the amount of thread hops between the
9p server's main thread and fs worker thread(s).

In fact this patch even reduces the thread hops for Twalk handling
to its theoritical minimum of exactly 2 thread hops:

main thread -> fs worker thread -> main thread

This is achieved by doing all the required fs driver tasks altogether
in a single v9fs_co_run_in_worker({ ... }); code block.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <1a6701674afc4f08d40396e3aa2631e18a4dbb33.1622821729.git.qemu_oss@crudebyte.com>
2021-07-05 13:03:16 +02:00
Christian Schoenebeck
66550339b7 9pfs: drop root_qid
There is no longer a user of root_qid, so drop it.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <6896dd161d3257db6b0513842a14f87ca191fdf6.1622821729.git.qemu_oss@crudebyte.com>
2021-07-05 13:03:16 +02:00
Christian Schoenebeck
f22cad4228 9pfs: replace not_same_qid() by same_stat_id()
As we are actually only comparing the filesystem ID (i.e. device number
and inode number pair) let's use the POSIX stat buffer instead of QIDs,
because resolving QIDs requires to be done on 9p server's main thread
only as it might mutate the server state if inode remapping is enabled.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <26aa465ff9cc9c07e053331554a02fdae3994417.1622821729.git.qemu_oss@crudebyte.com>
2021-07-05 13:03:16 +02:00
Christian Schoenebeck
1d0fc0d0ee 9pfs: drop fid_to_qid()
There is only one user of fid_to_qid() which is v9fs_walk(). Let's
open-code fid_to_qid() directly within v9fs_walk(), because
fid_to_qid() hides the POSIX stat buffer which we are going to need
in the subsequent patch.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <e9a4c9c7a0792ed4db6578d105a0823ea05bc324.1622821729.git.qemu_oss@crudebyte.com>
2021-07-05 13:03:16 +02:00
Christian Schoenebeck
110243750d 9pfs: capture root stat
We already capture the QID of the exported 9p root path, i.e. to
prevent client access outside the defined, exported filesystem's tree.
This is currently checked by comparing the root QID with another FID's
QID.

The problem with the latter is that resolving a QID of any given 9p path
can only be done on 9p server's main thread, that's because it might
mutate the server's state if inode remapping is enabled.

For that reason also capture the POSIX stat info of the root path for
being able to identify on any (e.g. worker) thread whether an
arbitrary given path is identical to the export root.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <eb07d6c2e9925788454cfe33d3802e4ffb23ea9a.1622821729.git.qemu_oss@crudebyte.com>
2021-07-05 13:03:16 +02:00
Christian Schoenebeck
8bf27550ef 9pfs: fix not_same_qid()
There is only one user of not_same_qid() which is v9fs_walk() and the
latter is using it for comparing a client supplied path with the 9p
export root path, for the sole purpose to prevent a Twalk request
from escaping from the exported 9p tree via "..".

However for that specific purpose the implementation of not_same_qid()
is wrong; if mtime of the 9p export root path changed between Tattach
and Twalk then not_same_qid() returns true when actually comparing
against the export root path.

To fix for the actual semantic being used, only compare QID path
members, but do not compare version or type members.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <ca0abae4a899d81c6e87f683732d6c1f56915232.1622821729.git.qemu_oss@crudebyte.com>
2021-07-05 13:03:16 +02:00
Christian Schoenebeck
232a4d2c25 9pfs: simplify v9fs_walk()
There is only one comparison between nwnames and P9_MAXWELEM required.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <E1liKiz-0006BC-Ja@lizzy.crudebyte.com>
2021-07-05 13:03:16 +02:00
Christian Schoenebeck
6f56908427 9pfs: add link to 9p developer docs
To lower the entry level for new developers, add a link to the 9p
developer docs (i.e. qemu wiki) to MAINTAINERS and to the beginning of
9p source files, that is to: https://wiki.qemu.org/Documentation/9p

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Acked-by: Greg Kurz <groug@kaod.org>
Message-Id: <E1leeDf-0008GZ-9q@lizzy.crudebyte.com>
2021-07-05 13:03:16 +02:00
Stefan Hajnoczi
023ca420ee util/async: print leaked BH name when AioContext finalizes
BHs must be deleted before the AioContext is finalized. If not, it's a
bug and probably indicates that some part of the program still expects
the BH to run in the future. That can lead to memory leaks, inconsistent
state, or just hangs.

Unfortunately the assert(flags & BH_DELETED) call in aio_ctx_finalize()
is difficult to debug because the assertion failure contains no
information about the BH!

Use the QEMUBH name field added in the previous patch to show a useful
error when a leaked BH is detected.

Suggested-by: Eric Ernst <eric.g.ernst@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210414200247.917496-3-stefanha@redhat.com>
2021-07-05 11:40:32 +01:00
Stefan Hajnoczi
0f08586c71 util/async: add a human-readable name to BHs for debugging
It can be difficult to debug issues with BHs in production environments.
Although BHs can usually be identified by looking up their ->cb()
function pointer, this requires debug information for the program. It is
also not possible to print human-readable diagnostics about BHs because
they have no identifier.

This patch adds a name to each BH. The name is not unique per instance
but differentiates between cb() functions, which is usually enough. It's
done by changing aio_bh_new() and friends to macros that stringify cb.

The next patch will use the name field when reporting leaked BHs.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210414200247.917496-2-stefanha@redhat.com>
2021-07-05 11:40:32 +01:00
Li Zhijian
e5f607913c migration/rdma: Use error_report to suppress errno message
Since the prior calls are successful, in this case a errno doesn't
indicate a real error which would just make us confused.

before:
(qemu) migrate -d rdma:192.168.22.23:8888
source_resolve_host RDMA Device opened: kernel name rxe_eth0 uverbs device name uverbs2, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs2, infiniband class device path /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet
rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect: No space left on device

Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Message-Id: <20210628071959.23455-1-lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05 10:51:26 +01:00
Hyman Huang(黄勇)
fa264f4266 tests/migration: fix "downtime_limit" type when "migrate-set-parameters"
migrate-set-parameters parse "downtime_limit" as integer type when
execute "migrate-set-parameters" before migration, and, the unit
dowtime_limit is milliseconds, fix this two so that test can go
smoothly.

Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Message-Id: <31d82df24cc0c468dbe4d2d86730158ebf248071.1622729934.git.huangy81@chinatelecom.cn>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05 10:51:26 +01:00
Hyman Huang(黄勇)
c99fb3a50d tests/migration: parse the thread-id key of CpuInfoFast
thread_id in CpuInfoFast is deprecated, parse thread-id instead
after execute qmp query-cpus-fast. fix this so that test can
go smoothly.

Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Message-Id: <584578c0a0dd781cee45f72ddf517f6e6a41c504.1622729934.git.huangy81@chinatelecom.cn>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05 10:51:26 +01:00
Vivek Goyal
65a820d292 virtiofsd: Add an option to enable/disable posix acls
fuse has an option FUSE_POSIX_ACL which needs to be opted in by fuse
server to enable posix acls. As of now we are not opting in for this,
so posix acls are disabled on virtiofs by default.

Add virtiofsd option "-o posix_acl/no_posix_acl" to let users enable/disable
posix acl support. By default it is disabled as of now due to performance
concerns with cache=none.

Currently even if file server has not opted in for FUSE_POSIX_ACL, user can
still query acl and set acl, and system.posix_acl_access and
system.posix_acl_default xattrs show up listxattr response.

Miklos said this is confusing. So he said lets block and filter
system.posix_acl_access and system.posix_acl_default xattrs in
getxattr/setxattr/listxattr if user has explicitly disabled
posix acls using -o no_posix_acl.

As of now continuing to keeping the existing behavior if user did not
specify any option to disable acl support due to concerns about backward
compatibility.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20210622150852.1507204-8-vgoyal@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05 10:51:26 +01:00
Vivek Goyal
f1aa1774df virtiofsd: Switch creds, drop FSETID for system.posix_acl_access xattr
When posix access acls are set on a file, it can lead to adjusting file
permissions (mode) as well. If caller does not have CAP_FSETID and it
also does not have membership of owner group, this will lead to clearing
SGID bit in mode.

Current fuse code is written in such a way that it expects file server
to take care of chaning file mode (permission), if there is a need.
Right now, host kernel does not clear SGID bit because virtiofsd is
running as root and has CAP_FSETID. For host kernel to clear SGID,
virtiofsd need to switch to gid of caller in guest and also drop
CAP_FSETID (if caller did not have it to begin with).

If SGID needs to be cleared, client will set the flag
FUSE_SETXATTR_ACL_KILL_SGID in setxattr request. In that case server
should kill sgid.

Currently just switch to uid/gid of the caller and drop CAP_FSETID
and that should do it.

This should fix the xfstest generic/375 test case.

We don't have to switch uid for this to work. That could be one optimization
that pass a parameter to lo_change_cred() to only switch gid and not uid.

Also this will not work whenever (if ever) we support idmapped mounts. In
that case it is possible that uid/gid in request are 0/0 but still we
need to clear SGID. So we will have to pick a non-root sgid and switch
to that instead. That's an TODO item for future when idmapped mount
support is introduced.

This patch only adds the capability to switch creds and drop FSETID
when acl xattr is set. This does not take affect yet. It can take
affect when next patch adds the capability to enable posix_acl.

Reported-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20210622150852.1507204-7-vgoyal@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05 10:51:26 +01:00
Vivek Goyal
227e5d7fd5 virtiofsd: Add capability to change/restore umask
When parent directory has default acl and a file is created in that
directory, then umask is ignored and final file permissions are
determined using default acl instead. (man 2 umask).

Currently, fuse applies the umask and sends modified mode in create
request accordingly. fuse server can set FUSE_DONT_MASK and tell
fuse client to not apply umask and fuse server will take care of
it as needed.

With posix acls enabled, requirement will be that we want umask
to determine final file mode if parent directory does not have
default acl.

So if posix acls are enabled, opt in for FUSE_DONT_MASK. virtiofsd
will set umask of the thread doing file creation. And host kernel
should use that umask if parent directory does not have default
acls, otherwise umask does not take affect.

Miklos mentioned that we already call unshare(CLONE_FS) for
every thread. That means umask has now become property of per
thread and it should be ok to manipulate it in file creation path.

This patch only adds capability to change umask and restore it. It
does not enable it yet. Next few patches will add capability to enable it
based on if user enabled posix_acl or not.

This should fix fstest generic/099.

Reported-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210622150852.1507204-6-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05 10:51:26 +01:00
Vivek Goyal
6d0028b947 virtiofsd: Add umask to seccom allow list
Patches in this series  are going to make use of "umask" syscall.
So allow it.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210622150852.1507204-5-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05 10:51:26 +01:00
Vivek Goyal
c46ef954fa virtiofsd: Add support for extended setxattr
Add the bits to enable support for setxattr_ext if fuse offers it. Do not
enable it by default yet. Let passthrough_ll opt-in. Enabling it by deafult
kind of automatically means that you are taking responsibility of clearing
SGID if ACL is set.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20210622150852.1507204-4-vgoyal@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  Fixed up double def in fuse_common.h
2021-07-05 10:51:26 +01:00
Vivek Goyal
5290fb625d virtiofsd: Fix xattr operations overwriting errno
getxattr/setxattr/removexattr/listxattr operations handle regualar
and non-regular files differently. For the case of non-regular files
we do fchdir(/proc/self/fd) and the xattr operation and then revert
back to original working directory. After this we are saving errno
and that's buggy because fchdir() will overwrite the errno.

FCHDIR_NOFAIL(lo->proc_self_fd);
ret = getxattr(procname, name, value, size);
FCHDIR_NOFAIL(lo->root.fd);

if (ret == -1)
    saverr = errno

In above example, if getxattr() failed, we will still return 0 to caller
as errno must have been written by FCHDIR_NOFAIL(lo->root.fd) call.
Fix all such instances and capture "errno" early and save in "saverr"
variable.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20210622150852.1507204-3-vgoyal@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05 10:51:26 +01:00
Vivek Goyal
061624455f virtiofsd: Fix fuse setxattr() API change issue
With kernel header updates fuse_setxattr_in struct has grown in size.
But this new struct size only takes affect if user has opted in
for fuse feature FUSE_SETXATTR_EXT otherwise fuse continues to
send "fuse_setxattr_in" of older size. Older size is determined
by FUSE_COMPAT_SETXATTR_IN_SIZE.

Fix this. If we have not opted in for FUSE_SETXATTR_EXT, then
expect that we will get fuse_setxattr_in of size FUSE_COMPAT_SETXATTR_IN_SIZE
and not sizeof(struct fuse_sexattr_in).

Fixes: 278f064e4524 ("Update Linux headers to 5.13-rc4")
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20210622150852.1507204-2-vgoyal@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05 10:51:26 +01:00
Greg Kurz
1d03e56607 virtiofsd: Don't allow file creation with FUSE_OPEN
A well behaved FUSE client uses FUSE_CREATE to create files. It isn't
supposed to pass O_CREAT along a FUSE_OPEN request, as documented in
the "fuse_lowlevel.h" header :

    /**
     * Open a file
     *
     * Open flags are available in fi->flags. The following rules
     * apply.
     *
     *  - Creation (O_CREAT, O_EXCL, O_NOCTTY) flags will be
     *    filtered out / handled by the kernel.

But if the client happens to do it anyway, the server ends up passing
this flag to open() without the mandatory mode_t 4th argument. Since
open() is a variadic function, glibc will happily pass whatever it
finds on the stack to the syscall. If this file is compiled with
-D_FORTIFY_SOURCE=2, glibc will even detect that and abort:

*** invalid openat64 call: O_CREAT or O_TMPFILE without mode ***: terminated

Specifying O_CREAT with FUSE_OPEN is a protocol violation. Check this
in do_open(), print out a message and return an error to the client,
EINVAL like we already do when fuse_mbuf_iter_advance() fails.

The FUSE filesystem doesn't currently support O_TMPFILE, but the very
same would happen if O_TMPFILE was passed in a FUSE_OPEN request. Check
that as well.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210624101809.48032-1-groug@kaod.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05 10:51:26 +01:00
Daniel P. Berrangé
3399bca451 docs: describe the security considerations with virtiofsd xattr mapping
Different guest xattr prefixes have distinct access control rules applied
by the guest. When remapping a guest xattr care must be taken that the
remapping does not allow the a guest user to bypass guest kernel access
control rules.

For example if 'trusted.*' which requires CAP_SYS_ADMIN is remapped
to 'user.virtiofs.trusted.*', an unprivileged guest user which can
write to 'user.*' can bypass the CAP_SYS_ADMIN control. Thus the
target of any remapping must be explicitly blocked from read/writes
by the guest, to prevent access control bypass.

The examples shown in the virtiofsd man page already do the right
thing and ensure safety, but the security implications of getting
this wrong were not made explicit. This could lead to host admins
and apps unwittingly creating insecure configurations.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210611120427.49736-1-berrange@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05 10:51:26 +01:00
Daniel P. Berrangé
d9a801f7e9 virtiofsd: use GDateTime for formatting timestamp for debug messages
The GDateTime APIs provided by GLib avoid portability pitfalls, such
as some platforms where 'struct timeval.tv_sec' field is still 'long'
instead of 'time_t'. When combined with automatic cleanup, GDateTime
often results in simpler code too.

Localtime is changed to UTC to avoid the need to grant extra seccomp
permissions for GLib's access of the timezone database.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210611164319.67762-1-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05 10:51:26 +01:00
Laurent Vivier
944bc52842 migration: failover: continue to wait card unplug on error
If the user cancels the migration in the unplug-wait state,
QEMU will try to plug back the card and this fails because the card
is partially unplugged.
To avoid the problem, continue to wait the card unplug, but to
allow the migration to be canceled if the card never finishes to unplug
use a timeout.

Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1976852
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210629155007.629086-3-lvivier@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05 10:51:26 +01:00
Laurent Vivier
fde93d99d9 migration: move wait-unplug loop to its own function
The loop is used in migration_thread() and bg_migration_thread(),
so we can move it to its own function and call it from these both places.

Moreover, in migration_thread() we have a wrong state transition from
SETUP to ACTIVE while state could be WAIT_UNPLUG. This is correctly
managed in bg_migration_thread() so use this code instead.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20210629155007.629086-2-lvivier@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05 10:51:26 +01:00
Peter Xu
b7f9afd48e migration: Allow reset of postcopy_recover_triggered when failed
It's possible qemu_start_incoming_migration() failed at any point, when it
happens we should reset postcopy_recover_triggered to false so that the user
can still retry with a saner incoming port.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210629181356.217312-3-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05 10:51:26 +01:00
Peter Xu
cc48c587d2 migration: Move yank outside qemu_start_incoming_migration()
Starting from commit b5eea99ec2f5c, qmp_migrate_recover() calls unregister
before calling qemu_start_incoming_migration(). I believe it wanted to mitigate
the next call to yank_register_instance(), but I think that's wrong.

Firstly, if during recover, we should keep the yank instance there, not
"quickly removing and adding it back".

Meanwhile, calling qmp_migrate_recover() twice with b5eea99ec2f5c will directly
crash the dest qemu (right now it can't; but it'll start to work right after
the next patch) because the 1st call of qmp_migrate_recover() will unregister
permanently when the channel failed to establish, then the 2nd call of
qmp_migrate_recover() crashes at yank_unregister_instance().

This patch fixes it by moving yank ops out of qemu_start_incoming_migration()
into qmp_migrate_incoming.  For qmp_migrate_recover(), drop the unregister of
yank instance too since we keep it there during the recovery phase.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210629181356.217312-2-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05 10:51:26 +01:00
Feng Lin
c00d434ac6 migration: fix the memory overwriting risk in add_to_iovec
When testing migration, a Segmentation fault qemu core is generated.
0  error_free (err=0x1)
1  0x00007f8b862df647 in qemu_fclose (f=f@entry=0x55e06c247640)
2  0x00007f8b8516d59a in migrate_fd_cleanup (s=s@entry=0x55e06c0e1ef0)
3  0x00007f8b8516d66c in migrate_fd_cleanup_bh (opaque=0x55e06c0e1ef0)
4  0x00007f8b8626a47f in aio_bh_poll (ctx=ctx@entry=0x55e06b5a16d0)
5  0x00007f8b8626e71f in aio_dispatch (ctx=0x55e06b5a16d0)
6  0x00007f8b8626a33d in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>)
7  0x00007f8b866bdba4 in g_main_context_dispatch ()
8  0x00007f8b8626cde9 in glib_pollfds_poll ()
9  0x00007f8b8626ce62 in os_host_main_loop_wait (timeout=<optimized out>)
10 0x00007f8b8626cffd in main_loop_wait (nonblocking=nonblocking@entry=0)
11 0x00007f8b862ef01f in main_loop ()
Using gdb print the struct QEMUFile f = {
  ...,
  iovcnt = 65, last_error = 21984,
  last_error_obj = 0x1, shutdown = true
}
Well iovcnt is overflow, because the max size of MAX_IOV_SIZE is 64.
struct QEMUFile {
    ...;
    struct iovec iov[MAX_IOV_SIZE];
    unsigned int iovcnt;
    int last_error;
    Error *last_error_obj;
    bool shutdown;
};
iovcnt and last_error is overwrited by add_to_iovec().
Right now, add_to_iovec() increase iovcnt before check the limit.
And it seems that add_to_iovec() assumes that iovcnt will set to zero
in qemu_fflush(). But qemu_fflush() will directly return when f->shutdown
is true.

The situation may occur when libvirtd restart during migration, after
f->shutdown is set, before calling qemu_file_set_error() in
qemu_file_shutdown().

So the safiest way is checking the iovcnt before increasing it.

Signed-off-by: Feng Lin <linfeng23@huawei.com>
Message-Id: <20210625062138.1899-1-linfeng23@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  Fix typo in 'writeable' which is actually misnamed 'writable'
2021-07-05 10:51:26 +01:00
Peter Xu
1f546b709d tests: migration-test: Add dirty ring test
Add dirty ring test if kernel supports it.  Add the dirty ring parameter on
source should be mostly enough, but let's change the dest too to make them
match always.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210615175523.439830-3-peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05 10:51:26 +01:00
Peter Maydell
4fb2820854 PVRDMA queue
Several CVE fixes for the PVRDMA device.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJg4hJVAAoJEDbUwPDPL+RtGEkIAJs7yas9LdaP80scaEQ3UiOJ
 Mr9M2+RwoUm19YhrM2MqjIByluB5Jk2bO1ZmpuJmeg03V15I8AV8MozKGAzZaThx
 8LVwduqlmdnrprBYKVRRn/SLSqbMkRSXGli9BjX2PoF97VeqvhNOxH1hL6z8Ytn6
 zzpsOSq7NmC87hLdvL2UIkpYy1Z4bUjAWr1wn7K6jIhM163pek61rmO9eWwc7W3y
 M1lXtGtrLgE0AuLm0oEJwz0xqlOuN8ld5kwIEu33RnL8phLwT0hDjHoLPqIX79bS
 UdIL1zNjwXKI0DkR5EoVLRMdF87b3LMfVEvij25PNLPf6gyyh5FHmdqQi0r3sZU=
 =MrNO
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/marcel/tags/pvrdma-04-07-2021-v2' into staging

PVRDMA queue

Several CVE fixes for the PVRDMA device.

# gpg: Signature made Sun 04 Jul 2021 20:56:05 BST
# gpg:                using RSA key 36D4C0F0CF2FE46D
# gpg: Good signature from "Marcel Apfelbaum <marcel.apfelbaum@zoho.com>" [marginal]
# gpg:                 aka "Marcel Apfelbaum <marcel@redhat.com>" [marginal]
# gpg:                 aka "Marcel Apfelbaum <marcel.apfelbaum@gmail.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: B1C6 3A57 F92E 08F2 640F  31F5 36D4 C0F0 CF2F E46D

* remotes/marcel/tags/pvrdma-04-07-2021-v2:
  pvrdma: Fix the ring init error flow (CVE-2021-3608)
  pvrdma: Ensure correct input on ring init (CVE-2021-3607)
  hw/rdma: Fix possible mremap overflow in the pvrdma device (CVE-2021-3582)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-05 09:58:00 +01:00
Marcel Apfelbaum
66ae37d8cc pvrdma: Fix the ring init error flow (CVE-2021-3608)
Do not unmap uninitialized dma addresses.

Fixes: CVE-2021-3608
Reviewed-by: VictorV (Kunlun Lab) <vv474172261@gmail.com>
Tested-by: VictorV (Kunlun Lab) <vv474172261@gmail.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Message-Id: <20210630115246.2178219-1-marcel@redhat.com>
Tested-by: Yuval Shaia <yuval.shaia.ml@gmail.com>
Reviewed-by: Yuval Shaia <yuval.shaia.ml@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2021-07-04 22:47:51 +03:00
Marcel Apfelbaum
32e5703cfe pvrdma: Ensure correct input on ring init (CVE-2021-3607)
Check the guest passed a non zero page count
for pvrdma device ring buffers.

Fixes: CVE-2021-3607
Reported-by: VictorV (Kunlun Lab) <vv474172261@gmail.com>
Reviewed-by: VictorV (Kunlun Lab) <vv474172261@gmail.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Message-Id: <20210630114634.2168872-1-marcel@redhat.com>
Reviewed-by: Yuval Shaia <yuval.shaia.ml@gmail.com>
Tested-by: Yuval Shaia <yuval.shaia.ml@gmail.com>
Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2021-07-04 22:47:51 +03:00
Marcel Apfelbaum
284f191b4a hw/rdma: Fix possible mremap overflow in the pvrdma device (CVE-2021-3582)
Ensure mremap boundaries not trusting the guest kernel to
pass the correct buffer length.

Fixes: CVE-2021-3582
Reported-by: VictorV (Kunlun Lab) <vv474172261@gmail.com>
Tested-by: VictorV (Kunlun Lab) <vv474172261@gmail.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Message-Id: <20210616110600.20889-1-marcel.apfelbaum@gmail.com>
Reviewed-by: Yuval Shaia <yuval.shaia.ml@gmail.com>
Tested-by: Yuval Shaia <yuval.shaia.ml@gmail.com>
Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2021-07-04 22:47:51 +03:00
Peter Maydell
711c0418c8 MIPS patches queue
- Extract nanoMIPS, microMIPS, Code Compaction from translate.c
 - Allow PCI config accesses smaller than 32-bit on Bonito64 device
 - Fix migration of g364fb device on Jazz Magnum
 - Fix dp8393x PROM checksum on Jazz Magnum and Quadra 800
 - Map the UART devices unconditionally on Jazz Magnum
 - Add functional test booting Linux on the Fuloong 2E
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmDfMnMACgkQ4+MsLN6t
 wN71nhAArpyoJ5mTkt54wAxZwxvqjWAyesvogV4pLIvOyNJmQcExY/Ly8Qb5dbDg
 2PEhCpDU7GlT7oCfgh7O5KrEjnqVfmHQzzbvQ0Ygq9kL5hdjaSSlHB/yeirU7CR1
 cMQXfj9kvRVa5Oayt3L+kiKgTA0f1vbGmnveiFxJKJupyVDtursstD3nSZCThSVL
 FWdFJbqLzvTY3cQqLBVq7jEnN7LzSYeYnq8Tvri0nuwoBwLJY382IljqsQZrGHzr
 Bbya3KFInMrQK5VAM0pOkfvPYXZmtJ8i8VuR6S+IdICiZ+61sknKRUq5z09/4NXA
 HaxarWO/fv07qd7q6Z2+i5Q6fiDrV4p2qfHeddM4Xwqvu8O98EPhaBE3veLOiNgr
 VxgkJFslI1gstje31qqvNjFxB+cOIBYjWTlIVu1xOuOKGWMjMT+9XcVLseweA2rT
 H/nTKnWTAiJ/mxT4KIv59SS0ZQa4QJ3CjYr26AcQ9YrJ9vCMay8rLPEn0iVlhB2H
 ZyW4Be84ynEvuAvvuWt1AXvFjT41Zqj4Px1M6Pa15e9eV6guiW1KV13Thb45gJqt
 LTQtoME83r3gUhQOcoZaowYh6ffCbjtW3eAcTZP3Zu7fOeBjSye+CvS9NJMtK3Vj
 0/YjtqGPUvjNy4sR9VqkRRPMZpHftMTRILxaFm8Y5tlThkAydw0=
 =rR+B
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/philmd/tags/mips-20210702' into staging

MIPS patches queue

- Extract nanoMIPS, microMIPS, Code Compaction from translate.c
- Allow PCI config accesses smaller than 32-bit on Bonito64 device
- Fix migration of g364fb device on Jazz Magnum
- Fix dp8393x PROM checksum on Jazz Magnum and Quadra 800
- Map the UART devices unconditionally on Jazz Magnum
- Add functional test booting Linux on the Fuloong 2E

# gpg: Signature made Fri 02 Jul 2021 16:36:19 BST
# gpg:                using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full]
# Primary key fingerprint: FAAB E75E 1291 7221 DCFD  6BB2 E3E3 2C2C DEAD C0DE

* remotes/philmd/tags/mips-20210702:
  hw/mips/jazz: Map the UART devices unconditionally
  hw/mips/jazz: specify correct endian for dp8393x device
  hw/m68k/q800: fix PROM checksum and MAC address storage
  qemu/bitops.h: add bitrev8 implementation
  dp8393x: remove onboard PROM containing MAC address and checksum
  hw/m68k/q800: move PROM and checksum calculation from dp8393x device to board
  hw/mips/jazz: move PROM and checksum calculation from dp8393x device to board
  dp8393x: convert to trace-events
  dp8393x: checkpatch fixes
  g364fb: add VMStateDescription for G364SysBusState
  g364fb: use RAM memory region for framebuffer
  tests/acceptance: Test Linux on the Fuloong 2E machine
  hw/pci-host/bonito: Allow PCI config accesses smaller than 32-bit
  hw/pci-host/bonito: Trace PCI config accesses smaller than 32-bit
  target/mips: Extract nanoMIPS ISA translation routines
  target/mips: Extract the microMIPS ISA translation routines
  target/mips: Extract Code Compaction ASE translation routines
  target/mips: Add declarations for generic TCG helpers

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-04 14:04:12 +01:00
Peter Maydell
73c8bf4ccf target-arm queue:
* more MVE instructions
  * hw/gpio/gpio_pwr: use shutdown function for reboot
  * target/arm: Check NaN mode before silencing NaN
  * tests: Boot and halt a Linux guest on the Raspberry Pi 2 machine
  * hw/arm: Add basic power management to raspi.
  * docs/system/arm: Add quanta-gbs-bmc, quanta-q7l1-bmc
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmDfDacZHHBldGVyLm1h
 eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3hLREACwXLyfUWdqWpbW2wAHU4n9
 SlNJjaKval707CBYQL03xvQyrSGinjaLmbljjE4Y3KV+XK/cRWXe/1bimBokk9Xt
 8blX2OlEkBViqyTqPcSLU0AMMzoTYCRHMlR9tVbz4h39Z/yfcxzDKvWjjpor/vtw
 +0G8N/Z/O3TL2gSDAb7fQcdhS4pwsTBFOkOgGv06WxWUAidHxuguJOn0DF+mr1Tl
 hroz8fivG2RfQL/OVE6GJgD+VIlw+Ea4Znnuue+gdoCCvY4TE79K6kGSV9RZVg4d
 VGtzNd5m9UGFaeU+jof/45hD4ex4x+c5ggH6fDwx6pJhKpxYwsJBMvE8U4I2TfsG
 Hf7EjODxvJ5I78/qCyYJ+be2ZnqMcfqPVhU3MwqBwGMIuYMJRWH2b4eD3PmRWPBf
 fxrlEZ9PvzaONzwFDHJ06XiMhxByVF+kK4XNtBZZMWH0Jte8e1TPnY26PfNtvnfE
 lQYf4qETx3A2aYIG5zjXiIhGrH5/OevIBOkVKWvdEHXbVN0dt0JcjUbZee6MG9Q1
 BN0xXlRQ2btsgYTvGVg08mkMi6LWl55j0CK/sx1qZPjnbgEyPq90kk6FKyflJhR5
 fUdW1qCn+xRERMH+m8xIvkGzYTlmPal86BRwxBxyGrVRHpXDgWE3sw/m6p6wMsQZ
 Szx8eMJEAX5GNbF7M+KmCg==
 =mvVV
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20210702' into staging

target-arm queue:
 * more MVE instructions
 * hw/gpio/gpio_pwr: use shutdown function for reboot
 * target/arm: Check NaN mode before silencing NaN
 * tests: Boot and halt a Linux guest on the Raspberry Pi 2 machine
 * hw/arm: Add basic power management to raspi.
 * docs/system/arm: Add quanta-gbs-bmc, quanta-q7l1-bmc

# gpg: Signature made Fri 02 Jul 2021 13:59:19 BST
# gpg:                using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg:                issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [ultimate]
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>" [ultimate]
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [ultimate]
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* remotes/pmaydell/tags/pull-target-arm-20210702: (24 commits)
  target/arm: Implement MVE shifts by register
  target/arm: Implement MVE shifts by immediate
  target/arm: Implement MVE long shifts by register
  target/arm: Implement MVE long shifts by immediate
  target/arm: Implement MVE VADDLV
  target/arm: Implement MVE VSHLC
  target/arm: Implement MVE saturating narrowing shifts
  target/arm: Implement MVE VSHRN, VRSHRN
  target/arm: Implement MVE VSRI, VSLI
  target/arm: Implement MVE VSHLL
  target/arm: Implement MVE vector shift right by immediate insns
  target/arm: Implement MVE vector shift left by immediate insns
  target/arm: Implement MVE logical immediate insns
  target/arm: Use dup_const() instead of bitfield_replicate()
  target/arm: Use asimd_imm_const for A64 decode
  target/arm: Make asimd_imm_const() public
  target/arm: Fix bugs in MVE VRMLALDAVH, VRMLSLDAVH
  target/arm: Fix MVE widening/narrowing VLDR/VSTR offset calculation
  hw/gpio/gpio_pwr: use shutdown function for reboot
  target/arm: Check NaN mode before silencing NaN
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-03 22:34:37 +01:00
Vincent Bernat
9e2423ef58 docs: add slot when adding new PCIe root port
Without providing a specific slot, QEMU won't be able to create the
second additional PCIe root port with the following error:

    $ qemu-system-x86_64 [...] -machine q35 \
    >    -device pcie-root-port,bus=pcie.0,id=rp1 \
    >    -device pcie-root-port,bus=pcie.0,id=rp2
    qemu-system-x86_64: -device pcie-root-port,bus=pcie.0,id=rp2:
    Can't add chassis slot, error -16

This is due to the fact they both try to use slot 0. Update the
documentation to specify a slot for each new PCIe root port.

Signed-off-by: Vincent Bernat <vincent@bernat.ch>
Message-Id: <20210614114357.1146725-1-vincent@bernat.ch>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 03:12:35 -04:00
Gerd Hoffmann
ee80f5ba22 acpi/ged: fix reset cause
Reset requests should use SHUTDOWN_CAUSE_GUEST_RESET not
SHUTDOWN_CAUSE_GUEST_SHUTDOWN.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20210624110057.2398779-1-kraxel@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 03:12:35 -04:00
Igor Mammedov
40f23e4e52 tests: acpi: pc: update expected DSDT blobs
@@ -930,20 +930,20 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS ", "BXPC    ", 0x00000001)
             Device (S00)
             {
                 Name (_ADR, Zero)  // _ADR: Address
-                Name (_SUN, Zero)  // _SUN: Slot User Number
+                Name (ASUN, Zero)
                 Method (_DSM, 4, Serialized)  // _DSM: Device-Specific Method
                 {
-                    Return (PDSM (Arg0, Arg1, Arg2, Arg3, BSEL, _SUN))
+                    Return (PDSM (Arg0, Arg1, Arg2, Arg3, BSEL, ASUN))
                 }
             }

             Device (S10)
             {
                 Name (_ADR, 0x00020000)  // _ADR: Address
-                Name (_SUN, 0x02)  // _SUN: Slot User Number
+                Name (ASUN, 0x02)
                 Method (_DSM, 4, Serialized)  // _DSM: Device-Specific Method
                 {
-                    Return (PDSM (Arg0, Arg1, Arg2, Arg3, BSEL, _SUN))
+                    Return (PDSM (Arg0, Arg1, Arg2, Arg3, BSEL, ASUN))
                 }

                 Method (_S1D, 0, NotSerialized)  // _S1D: S1 Device State

with a hank per bridge:

@@ -965,10 +965,10 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS ", "BXPC    ", 0x00000001)
             Device (S18)
             {
                 Name (_ADR, 0x00030000)  // _ADR: Address
-                Name (_SUN, 0x03)  // _SUN: Slot User Number
+                Name (ASUN, 0x03)
                 Method (_DSM, 4, Serialized)  // _DSM: Device-Specific Method
                 {
-                    Return (PDSM (Arg0, Arg1, Arg2, Arg3, BSEL, _SUN))
+                    Return (PDSM (Arg0, Arg1, Arg2, Arg3, BSEL, ASUN))
                 }
             }

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20210624204229.998824-4-imammedo@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: John Sucaet <john.sucaet@ekinops.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 03:12:35 -04:00
Igor Mammedov
7193d7cdd9 acpi: pc: revert back to v5.2 PCI slot enumeration
Commit [1] moved _SUN variable from only hot-pluggable to
all devices. This made linux kernel enumerate extra slots
that weren't present before. If extra slot happens to be
be enumerated first and there is a device in th same slot
but on other bridge, linux kernel will add -N suffix to
slot name of the later, thus changing NIC name compared to
QEMU 5.2. This in some case confuses systemd, if it is
using SLOT NIC naming scheme and interface name becomes
not the same as it was under QEMU-5.2.

Reproducer QEMU CLI:
  -M pc-i440fx-5.2 -nodefaults \
  -device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x3 \
  -device virtio-net-pci,id=nic1,bus=pci.1,addr=0x1 \
  -device virtio-net-pci,id=nic2,bus=pci.1,addr=0x2 \
  -device virtio-net-pci,id=nic3,bus=pci.1,addr=0x3

with RHEL8 guest produces following results:
  v5.2:
     kernel: virtio_net virtio0 ens1: renamed from eth0
     kernel: virtio_net virtio2 ens3: renamed from eth2
     kernel: virtio_net virtio1 enp1s2: renamed from eth1
      (slot 2 is assigned to empty bus 0 slot and virtio1
       is assigned to 2-2 slot, and renaming falls back,
       for some reason, to path based naming scheme)

  v6.0:
     kernel: virtio_net virtio0 ens1: renamed from eth0
     kernel: virtio_net virtio2 ens3: renamed from eth2
     systemd-udevd[299]: Error changing net interface name 'eth1' to 'ens3': File exists
     systemd-udevd[299]: could not rename interface '3' from 'eth1' to 'ens3': File exists
      (with commit [1] kernel assigns virtio2 to 3-2 slot
       since bridge advertises _SUN=0x3 and kernel assigns
       slot 3 to bridge. Still it manages to rename virtio2
       correctly to ens3, however systemd gets confused with virtio1
       where slot allocation exactly the same (2-2) as in 5.2 case
       and tries to rename it to ens3 which is rightfully taken by
       virtio2)

I'm not sure what breaks in systemd interface renaming (it probably
should be investigated), but on QEMU side we can safely revert
_SUN to 5.2 behavior (i.e. avoid cold-plugged bridges and non
hot-pluggable device classes), without breaking acpi-index, which uses
slot numbers but it doesn't have to use _SUN, it could use an arbitrary
variable name that has the same slot value).
It will help existing VMs to keep networking with non trivial
configs in working order since systemd will do its interface
renaming magic as it used to do.

1)
Fixes: b7f23f62e40 (pci: acpi: add _DSM method to PCI devices)
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20210624204229.998824-3-imammedo@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: John Sucaet <john.sucaet@ekinops.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 03:12:35 -04:00
Igor Mammedov
a4344574fd tests: acpi: prepare for changing DSDT tables
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20210624204229.998824-2-imammedo@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: John Sucaet <john.sucaet@ekinops.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 03:12:35 -04:00
Laurent Vivier
109c20ea28 migration: failover: reset partially_hotplugged
When the card is plugged back, reset the partially_hotplugged flag to false

Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1787194
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20210629152937.619193-1-lvivier@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 03:12:35 -04:00
Andrew Melnychenko
df07a8f8cb virtio-pci: Changed return values for "notify", "device" and "isr" read.
At some point, after unplugging virtio-pci the virtio device may be unrealised,
but the memory regions may be present in flatview. So, it's a possible situation
when memory region's callbacks are called for "unplugged" device.

Previous two patches made sure this case does not cause QEMU to crash.
This patch adds check for "notify" memory region. Now reads will return "-1" if a virtio
device is not present on a virtio bus.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1938042
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1743098

Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Message-Id: <20210609095843.141378-4-andrew@daynix.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 01:39:33 -04:00
Andrew Melnychenko
bf697371db virtio-pci: Added check for virtio device in PCI config cbs.
Now, if virtio device is not present on virtio-bus - pci config callbacks
will not lead to possible crush. The read will return "-1" which should be
interpreted by a driver that pci device may be unplugged.

Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Message-Id: <20210609095843.141378-3-andrew@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 01:39:33 -04:00
Andrew Melnychenko
80ebfd69b9 virtio-pci: Added check for virtio device presence in mm callbacks.
During unplug the virtio device is unplugged from virtio-bus on pci. In some cases,
requests to virtio-pci mm may acquire during/after unplug. Added check that virtio
device is on the bus, for "common" memory region.

Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Message-Id: <20210609095843.141378-2-andrew@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 01:39:33 -04:00
Philippe Mathieu-Daudé
9b0ca75e01 hw/pci-host/q35: Ignore write of reserved PCIEXBAR LENGTH field
libFuzzer triggered the following assertion:

  cat << EOF | qemu-system-i386 -M pc-q35-5.0 \
    -nographic -monitor none -serial none \
    -qtest stdio -d guest_errors -trace pci\*
  outl 0xcf8 0xf2000060
  outl 0xcfc 0x8400056e
  EOF
  pci_cfg_write mch 00:0 @0x60 <- 0x8400056e
  Aborted (core dumped)

This is because guest wrote MCH_HOST_BRIDGE_PCIEXBAR_LENGTH_RVD
(reserved value) to the PCIE XBAR register.

There is no indication on the datasheet about what occurs when
this value is written. Simply ignore it on QEMU (and report an
guest error):

  pci_cfg_write mch 00:0 @0x60 <- 0x8400056e
  Q35: Reserved PCIEXBAR LENGTH
  pci_cfg_read mch 00:0 @0x0 -> 0x8086
  pci_cfg_read mch 00:0 @0x0 -> 0x29c08086
  ...

Cc: qemu-stable@nongnu.org
Reported-by: Alexander Bulekov <alxndr@bu.edu>
BugLink: https://bugs.launchpad.net/qemu/+bug/1878641
Fixes: df2d8b3ed4 ("q35: Introduce q35 pc based chipset emulator")
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210526142438.281477-1-f4bug@amsat.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Alexander Bulekov <alxndr@bu.edu>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03 01:39:33 -04:00
Philippe Mathieu-Daudé
a13bfa5a05 hw/mips/jazz: Map the UART devices unconditionally
When using the Magnum ARC firmware we can see accesses to the
UART1 being rejected, because the device is not mapped:

  $ qemu-system-mips64el -M magnum -d guest_errors,unimp -bios NTPROM.RAW
  Invalid access at addr 0x80007004, size 1, region '(null)', reason: rejected
  Invalid access at addr 0x80007001, size 1, region '(null)', reason: rejected
  Invalid access at addr 0x80007002, size 1, region '(null)', reason: rejected
  Invalid access at addr 0x80007003, size 1, region '(null)', reason: rejected
  Invalid access at addr 0x80007004, size 1, region '(null)', reason: rejected

Since both UARTs are present (soldered on the board) regardless
of whether there are character devices connected, map them
unconditionally.

(This code pre-dated commit 12051d82f004 which made it safe to pass
NULL in as a chardev to serial devices.)

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210629053704.2584504-1-f4bug@amsat.org>
2021-07-02 17:35:08 +02:00
Mark Cave-Ayland
b1600ff195 hw/mips/jazz: specify correct endian for dp8393x device
The MIPS magnum machines are available in both big endian (mips64) and little
endian (mips64el) configurations. Ensure that the dp893x big_endian property
is set accordingly using logic similar to that used for the MIPS malta
machines.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Finn Thain <fthain@linux-m68k.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210625065401.30170-11-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-07-02 17:35:08 +02:00
Mark Cave-Ayland
846feac2ae hw/m68k/q800: fix PROM checksum and MAC address storage
The checksum used by MacOS to validate the PROM content is an exclusive-OR
rather than a sum over the corresponding bytes. In addition the MAC address
must be stored in bit-reversed format as indicated in comments in Linux's
macsonic.c.

With the PROM contents fixed MacOS starts to probe the device registers
when AppleTalk is enabled in the Control Panel.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Finn Thain <fthain@linux-m68k.org>
Message-Id: <20210625065401.30170-8-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-07-02 17:35:08 +02:00