When an IOThread is configured, the ctrl virtqueue is processed in the
IOThread. TMFs that reset SCSI devices are currently called directly
from the IOThread and trigger an assertion failure in blk_drain() from
the following call stack:
virtio_scsi_handle_ctrl_req -> virtio_scsi_do_tmf -> device_code_reset
-> scsi_disk_reset -> scsi_device_purge_requests -> blk_drain
../block/block-backend.c:1780: void blk_drain(BlockBackend *): Assertion `qemu_in_main_thread()' failed.
The blk_drain() function is not designed to be called from an IOThread
because it needs the Big QEMU Lock (BQL).
This patch defers TMFs that reset SCSI devices to a Bottom Half (BH)
that runs in the main loop thread under the BQL. This way it's safe to
call blk_drain() and the assertion failure is avoided.
Introduce s->tmf_bh_list for tracking TMF requests that have been
deferred to the BH. When the BH runs it will grab the entire list and
process all requests. Care must be taken to clear the list when the
virtio-scsi device is reset or unrealized. Otherwise deferred TMF
requests could execute later and lead to use-after-free or other
undefined behavior.
The s->resetting counter that's used by TMFs that reset SCSI devices is
accessed from multiple threads. This patch makes that explicit by using
atomic accessor functions. With this patch applied the counter is only
modified by the main loop thread under the BQL but can be read by any
thread.
Reported-by: Qing Wang <qinwang@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230221212218.1378734-4-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
dma_blk_cb() only takes the AioContext lock around ->io_func(). That
means the rest of dma_blk_cb() is not protected. In particular, the
DMAAIOCB field accesses happen outside the lock.
There is a race when the main loop thread holds the AioContext lock and
invokes scsi_device_purge_requests() -> bdrv_aio_cancel() ->
dma_aio_cancel() while an IOThread executes dma_blk_cb(). The dbs->acb
field determines how cancellation proceeds. If dma_aio_cancel() sees
dbs->acb == NULL while dma_blk_cb() is still running, the request can be
completed twice (-ECANCELED and the actual return value).
The following assertion can occur with virtio-scsi when an IOThread is
used:
../hw/scsi/scsi-disk.c:368: scsi_dma_complete: Assertion `r->req.aiocb != NULL' failed.
Fix the race by holding the AioContext across dma_blk_cb(). Now
dma_aio_cancel() under the AioContext lock will not see
inconsistent/intermediate states.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230221212218.1378734-3-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If requests are being processed in the IOThread when a SCSIDevice is
unplugged, scsi_device_purge_requests() -> scsi_req_cancel_async() races
with I/O completion callbacks. Both threads load and store req->aiocb.
This can lead to assert(r->req.aiocb == NULL) failures and undefined
behavior.
Protect r->req.aiocb with the AioContext lock to prevent the race.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230221212218.1378734-2-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The QERR_ macros are leftovers from the days of "rich" error objects.
We've been trying to reduce their remaining use.
The stubbed out Rocker monitor commands are the last remaining users
of QERR_FEATURE_DISABLED. They fail like this:
(qemu) info rocker mumble
Error: The feature 'rocker' is not enabled
The real rocker commands fail like this when the named object doesn't
exist:
Error: rocker mumble not found
If that's good enough when Rocker is enabled, then it's good enough
when it's disabled, so replace QERR_FEATURE_DISABLED with that, and
drop the macro.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230207075115.1525-13-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
The QERR_ macros are leftovers from the days of "rich" error objects.
We've been trying to reduce their remaining use.
Get rid of a use of QERR_FEATURE_DISABLED, and improve the slightly
awkward error message
(qemu) info hotpluggable-cpus
Error: The feature 'query-hotpluggable-cpus' is not enabled
to
Error: machine does not support hot-plugging CPUs
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230207075115.1525-11-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The QERR_ macros are leftovers from the days of "rich" error objects.
We've been trying to reduce their remaining use.
qmp_query_vm_generation_id() in stubs/vmgenid.c is the last user of
QERR_UNSUPPORTED outside qga/. Unlike the stubs we just dropped, it
is actually reachable, namely when CONFIG_ACPI_VMGENID is off. It
always fails like
(qemu) info vm-generation-id
Error: this feature or command is not currently supported
Turns out the real qmp_query_vm_generation_id() doesn't actually
depend on CONFIG_ACPI_VMGENID, and fails safely when it's off. Move
it to hw/core/machine-qmp-cmds.c, and drop the stub. The error
message becomes
Error: VM Generation ID device not found
Feels like an improvement to me.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230207075115.1525-8-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
The QERR_ macros are leftovers from the days of "rich" error objects.
We've been trying to reduce their remaining use.
acpi_table_add() is only ever called on behalf of CLI option
-acpitable. Since qemu-options.hx sets @arch_mask to QEMU_ARCH_I386,
it is reachable only for these targets. Since they provide a real
acpi_table_add(), the stub is unreachable.
There's no point in unreachable code keeping QERR_UNSUPPORTED alive.
Dumb it down to g_assert_not_reached().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230207075115.1525-7-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
The QERR_ macros are leftovers from the days of "rich" error objects.
We've been trying to reduce their remaining use.
smbios_entry_add() is only ever called on behalf of CLI option
-smbios. Since qemu-options.hx sets @arch_mask to QEMU_ARCH_I386 |
QEMU_ARCH_ARM, it is reachable only for these targets. Since they
provide a real smbios_entry_add(), the stub is unreachable.
There's no point in unreachable code keeping QERR_UNSUPPORTED alive.
Dumb it down to g_assert_not_reached().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230207075115.1525-6-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
The QERR_ macros are leftovers from the days of "rich" error objects.
We've been trying to reduce their remaining use.
Get rid of a use of QERR_UNSUPPORTED, and improve the rather vague
error message
(qemu) nmi
Error: this feature or command is not currently supported
to
Error: machine does not provide NMIs
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230207075115.1525-5-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230207075115.1525-2-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Konstantin Kostiuk <kkostiuk@redhat.com>
Version: GnuPG v1
iQEcBAABAgAGBQJj7xKYAAoJEO8Ells5jWIRDZQH/Rao24sq3j97qE5RzekvANzq
GnHUyLnl3yeOSNumv2BJInZTvgUpYL2etGQr3DtGRwOrr7w1vKB3zhY3V3jQefkh
f4rsEGkamL/qM2N2cGUIUSqevo7OGnP8aQojpEi4MWWZ30B3L6jqd4NqyA1gyndV
1eBkpR+BY2PjcLbgvFUZEXeAn/vapE5NKULXUGhg5mMvgwYH3CgZXpqqkxr876za
S4rZMtReXKNeid14Z35SUjJdV2WKYmo/lN9+GQxF2YNLmDC3RtuFQVm038erSqvs
uLVSg8tiIlCyOcSDpR/BARNrxVwzlJp5X6ocapHubS/i0Rp/Zo7ezSk/XWH1gfU=
=UbzF
-----END PGP SIGNATURE-----
Merge tag 'net-pull-request' of https://github.com/jasowang/qemu into staging
# -----BEGIN PGP SIGNATURE-----
# Version: GnuPG v1
#
# iQEcBAABAgAGBQJj7xKYAAoJEO8Ells5jWIRDZQH/Rao24sq3j97qE5RzekvANzq
# GnHUyLnl3yeOSNumv2BJInZTvgUpYL2etGQr3DtGRwOrr7w1vKB3zhY3V3jQefkh
# f4rsEGkamL/qM2N2cGUIUSqevo7OGnP8aQojpEi4MWWZ30B3L6jqd4NqyA1gyndV
# 1eBkpR+BY2PjcLbgvFUZEXeAn/vapE5NKULXUGhg5mMvgwYH3CgZXpqqkxr876za
# S4rZMtReXKNeid14Z35SUjJdV2WKYmo/lN9+GQxF2YNLmDC3RtuFQVm038erSqvs
# uLVSg8tiIlCyOcSDpR/BARNrxVwzlJp5X6ocapHubS/i0Rp/Zo7ezSk/XWH1gfU=
# =UbzF
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 17 Feb 2023 05:37:28 GMT
# gpg: using RSA key EF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* tag 'net-pull-request' of https://github.com/jasowang/qemu:
vdpa: fix VHOST_BACKEND_F_IOTLB_ASID flag check
net: stream: add a new option to automatically reconnect
vmnet: stop recieving events when VM is stopped
net: Increase L2TPv3 buffer to fit jumboframes
hw/net/vmxnet3: allow VMXNET3_MAX_MTU itself as a value
hw/net/lan9118: log [read|write]b when mode_16bit is enabled rather than abort
net: Replace "Supported NIC models" with "Available NIC models"
net: Restore printing of the help text with "-nic help"
net: Move the code to collect available NIC models to a separate function
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Now the fuzzers will reboot the guest between inputs.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE+tTiv4cTddY0BRfETmYd3lg6lk4FAmPu/LoACgkQTmYd3lg6
lk6RHg/7BRGI5ZPXb1MmTNCC+SroQ6TT++lO4b0hbkN2HO6U+WVvfuA6+0wg+8qC
4bp+G1Tabpcq1MTYUuim6DBtWswgpqr0AbWNwn1eF7hya+3W9woH2POVYY2wwc7m
S3EdwXCCKo9gGXlaNrotnbwIk+o8B4BzXOXLIlRtg26wGYhT5fkJA/BQcHKDXz37
ctyWxlyjIM8pNCgfybMvjC7MYtp8DufPsv/rrKx9t0TM7f1jPVgXLek7t0+ZwjeY
qz2Om2jiij1INgK9hTieWs4eHwpwre6vH2a+JKRkZ3sS7WYcj1auNKVJb3GvDqmc
wy+Nz5Lz4+aPP19pkCYjfz5w3CqEEsSlSDn5UVRbfl2fbENSceoNwo9huMXsF1pB
oO6NK2NxbOygmNpYxp+JEt45KFIXzUcIFQwbn8aCDODIl+0H2yu7/ll6XgELf1Pa
P83THOaVxIxfcI9VOdt/FwDq1ZzmV5nk/BkIGJeIWNYMbU4Gze6YoaL3U8AHDxKH
f6f3qDzcVJjqD0wKhvYcQ3kSPq+vHc/ioh6mYwos6VUEVYz/SLOY876MaSB/K4PE
ofBV7y6HvJ6AMwg1TBg4YtOP08gWK+4sYH+I09oU40U3UcwEpkbkQTF72lPQHxFs
8UVRJrgWv/xzrwzXTX5ruQ633F8zuhqQTeERqksj1pPHJ3NdHps=
=F6qI
-----END PGP SIGNATURE-----
Merge tag 'pr-2023-02-16' of https://gitlab.com/a1xndr/qemu into staging
Replace fork-based fuzzing with reboots.
Now the fuzzers will reboot the guest between inputs.
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCgAdFiEE+tTiv4cTddY0BRfETmYd3lg6lk4FAmPu/LoACgkQTmYd3lg6
# lk6RHg/7BRGI5ZPXb1MmTNCC+SroQ6TT++lO4b0hbkN2HO6U+WVvfuA6+0wg+8qC
# 4bp+G1Tabpcq1MTYUuim6DBtWswgpqr0AbWNwn1eF7hya+3W9woH2POVYY2wwc7m
# S3EdwXCCKo9gGXlaNrotnbwIk+o8B4BzXOXLIlRtg26wGYhT5fkJA/BQcHKDXz37
# ctyWxlyjIM8pNCgfybMvjC7MYtp8DufPsv/rrKx9t0TM7f1jPVgXLek7t0+ZwjeY
# qz2Om2jiij1INgK9hTieWs4eHwpwre6vH2a+JKRkZ3sS7WYcj1auNKVJb3GvDqmc
# wy+Nz5Lz4+aPP19pkCYjfz5w3CqEEsSlSDn5UVRbfl2fbENSceoNwo9huMXsF1pB
# oO6NK2NxbOygmNpYxp+JEt45KFIXzUcIFQwbn8aCDODIl+0H2yu7/ll6XgELf1Pa
# P83THOaVxIxfcI9VOdt/FwDq1ZzmV5nk/BkIGJeIWNYMbU4Gze6YoaL3U8AHDxKH
# f6f3qDzcVJjqD0wKhvYcQ3kSPq+vHc/ioh6mYwos6VUEVYz/SLOY876MaSB/K4PE
# ofBV7y6HvJ6AMwg1TBg4YtOP08gWK+4sYH+I09oU40U3UcwEpkbkQTF72lPQHxFs
# 8UVRJrgWv/xzrwzXTX5ruQ633F8zuhqQTeERqksj1pPHJ3NdHps=
# =F6qI
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 17 Feb 2023 04:04:10 GMT
# gpg: using RSA key FAD4E2BF871375D6340517C44E661DDE583A964E
# gpg: Good signature from "Alexander Bulekov <alxndr@bu.edu>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: FAD4 E2BF 8713 75D6 3405 17C4 4E66 1DDE 583A 964E
* tag 'pr-2023-02-16' of https://gitlab.com/a1xndr/qemu:
docs/fuzz: remove mentions of fork-based fuzzing
fuzz: remove fork-fuzzing scaffolding
fuzz/i440fx: remove fork-based fuzzer
fuzz/virtio-blk: remove fork-based fuzzer
fuzz/virtio-net: remove fork-based fuzzer
fuzz/virtio-scsi: remove fork-based fuzzer
fuzz/generic-fuzz: add a limit on DMA bytes written
fuzz/generic-fuzz: use reboots instead of forks to reset state
fuzz: add fuzz_reset API
hw/sparse-mem: clear memory on reset
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Currently, VMXNET3_MAX_MTU itself (being 9000) is not considered a
valid value for the MTU, but a guest running ESXi 7.0 might try to
set it and fail the assert [0].
In the Linux kernel, dev->max_mtu itself is a valid value for the MTU
and for the vmxnet3 driver it's 9000, so a guest running Linux will
also fail the assert when trying to set an MTU of 9000.
VMXNET3_MAX_MTU and s->mtu don't seem to be used in relation to buffer
allocations/accesses, so allowing the upper limit itself as a value
should be fine.
[0]: https://forum.proxmox.com/threads/114011/
Fixes: d05dcd94ae ("net: vmxnet3: validate configuration values during activate (CVE-2021-20203)")
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This patch replaces hw_error to guest error log for [read|write]b
accesses when mode_16bit is enabled. This avoids aborting qemu.
Fixes: 1248f8d4cb ("hw/lan9118: Add basic 16-bit mode support.")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1433
Reported-by: Qiang Liu <cyruscyliu@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Qiang Liu <cyruscyliu@gmail.com>
Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The code that collects the available NIC models is not really specific
to PCI anymore and will be required in the next patch, too, so let's
move this into a new separate function in net.c instead.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
We use sparse-mem for fuzzing. For long-running fuzzing processes, we
eventually end up with many allocated sparse-mem pages. To avoid this,
clear the allocated pages on system-reset.
Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Sort the migration section of VFIO trace events file alphabetically
and move two misplaced traces to common.c section.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Link: https://lore.kernel.org/r/20230216143630.25610-11-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Now that v2 protocol implementation has been added, remove the
deprecated v1 implementation.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Link: https://lore.kernel.org/r/20230216143630.25610-10-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Implement the basic mandatory part of VFIO migration protocol v2.
This includes all functionality that is necessary to support
VFIO_MIGRATION_STOP_COPY part of the v2 protocol.
The two protocols, v1 and v2, will co-exist and in the following patches
v1 protocol code will be removed.
There are several main differences between v1 and v2 protocols:
- VFIO device state is now represented as a finite state machine instead
of a bitmap.
- Migration interface with kernel is now done using VFIO_DEVICE_FEATURE
ioctl and normal read() and write() instead of the migration region.
- Pre-copy is made optional in v2 protocol. Support for pre-copy will be
added later on.
Detailed information about VFIO migration protocol v2 and its difference
compared to v1 protocol can be found here [1].
[1]
https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com/
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>.
Link: https://lore.kernel.org/r/20230216143630.25610-9-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
To avoid name collisions, rename functions and structs related to VFIO
migration protocol v1. This will allow the two protocols to co-exist
when v2 protocol is added, until v1 is removed. No functional changes
intended.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Link: https://lore.kernel.org/r/20230216143630.25610-8-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Move vfio_dev_get_region_info() logic from vfio_migration_probe() to
vfio_migration_init(). This logic is specific to v1 protocol and moving
it will make it easier to add the v2 protocol implementation later.
No functional changes intended.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Link: https://lore.kernel.org/r/20230216143630.25610-7-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Currently VFIO migration doesn't implement some kind of intermediate
quiescent state in which P2P DMAs are quiesced before stopping or
running the device. This can cause problems in multi-device migration
where the devices are doing P2P DMAs, since the devices are not stopped
together at the same time.
Until such support is added, block migration of multiple devices.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Link: https://lore.kernel.org/r/20230216143630.25610-6-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
vfio_devices_all_running_and_saving() is used to check if migration is
in pre-copy phase. This is done by checking if migration is in setup or
active states and if all VFIO devices are in pre-copy state, i.e.
_SAVING | _RUNNING.
In VFIO migration protocol v2 pre-copy support is made optional. Hence,
a matching v2 protocol pre-copy state can't be used here.
As preparation for adding v2 protocol, change
vfio_devices_all_running_and_saving() logic such that it doesn't use the
VFIO pre-copy state.
The new equivalent logic checks if migration is in active state and if
all VFIO devices are in running state [1]. No functional changes
intended.
[1] Note that checking if migration is in setup or active states and if
all VFIO devices are in running state doesn't guarantee that we are in
pre-copy phase, thus we check if migration is only in active state.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Link: https://lore.kernel.org/r/20230216143630.25610-5-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Currently, if IOMMU of a VFIO container doesn't support dirty page
tracking, migration is blocked. This is because a DMA-able VFIO device
can dirty RAM pages without updating QEMU about it, thus breaking the
migration.
However, this doesn't mean that migration can't be done at all.
In such case, allow migration and let QEMU VFIO code mark all pages
dirty.
This guarantees that all pages that might have gotten dirty are reported
back, and thus guarantees a valid migration even without VFIO IOMMU
dirty tracking support.
The motivation for this patch is the introduction of iommufd [1].
iommufd can directly implement the /dev/vfio/vfio container IOCTLs by
mapping them into its internal ops, allowing the usage of these IOCTLs
over iommufd. However, VFIO IOMMU dirty tracking is not supported by
this VFIO compatibility API.
This patch will allow migration by hosts that use the VFIO compatibility
API and prevent migration regressions caused by the lack of VFIO IOMMU
dirty tracking support.
[1]
https://lore.kernel.org/kvm/0-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com/
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Link: https://lore.kernel.org/r/20230216143630.25610-4-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
As part of its error flow, vfio_vmstate_change() accesses
MigrationState->to_dst_file without any checks. This can cause a NULL
pointer dereference if the error flow is taken and
MigrationState->to_dst_file is not set.
For example, this can happen if VM is started or stopped not during
migration and vfio_vmstate_change() error flow is taken, as
MigrationState->to_dst_file is not set at that time.
Fix it by checking that MigrationState->to_dst_file is set before using
it.
Fixes: 02a7e71b1e ("vfio: Add VM state change handler to know state of VM")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Link: https://lore.kernel.org/r/20230216143630.25610-3-avihaih@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Now that the cortex-a15 is under CONFIG_TCG, use as default CPU for a
KVM-only build the 'max' cpu.
Note that we cannot use 'host' here because the qtests can run without
any other accelerator (than qtest) and 'host' depends on KVM being
enabled.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Addresses targeting the second translation table (TTB1) in the SMMU have
all upper bits set (except for the top byte when TBI is enabled). Fix
the TTB1 check.
Reported-by: Ola Hugosson <ola.hugosson@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-id: 20230214171921.1917916-3-jean-philippe@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Addresses targeting the second translation table (TTB1) in the SMMU have
all upper bits set. Ensure the IOMMU region covers all 64 bits.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20230214171921.1917916-2-jean-philippe@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Nuvoton's PSPI is a general purpose SPI module which enables
connections to SPI-based peripheral devices.
Signed-off-by: Hao Wu <wuhaotsh@google.com>
Reviewed-by: Chris Rauer <crauer@google.com>
Reviewed-by: Philippe Mathieu-Daude <philmd@linaro.org>
Message-id: 20230208235433.3989937-3-wuhaotsh@google.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Just use current_accel_name() directly.
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Since commit acc0b8b05a when running the ZynqMP ZCU102 board with
a QEMU configured using --without-default-devices, we get:
$ qemu-system-aarch64 -M xlnx-zcu102
qemu-system-aarch64: missing object type 'usb_dwc3'
Abort trap: 6
Fix by adding the missing Kconfig dependency.
Fixes: acc0b8b05a ("hw/arm/xlnx-zynqmp: Connect ZynqMP's USB controllers")
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20230216092327.2203-1-philmd@linaro.org
Reviewed-by: Francisco Iglesias <francisco.iglesias@amd.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
GBPA register can be used to globally abort all
transactions.
It is described in the SMMU manual in "6.3.14 SMMU_GBPA".
ABORT reset value is IMPLEMENTATION DEFINED, it is chosen to
be zero(Do not abort incoming transactions).
Other fields have default values of Use Incoming.
If UPDATE is not set, the write is ignored. This is the only permitted
behavior in SMMUv3.2 and later.(6.3.14.1 Update procedure)
As this patch adds a new state to the SMMU (GBPA), it is added
in a new subsection for forward migration compatibility.
GBPA is only migrated if its value is different from the reset value.
It does this to be backward migration compatible if SW didn't write
the register.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20230214094009.2445653-1-smostafa@google.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
There is no point in using a void pointer to access the NVIC.
Use the real type to avoid casting it while debugging.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230206223502.25122-11-philmd@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The only remaining caller is riscv_load_kernel_and_initrd() which
belongs to the same file.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Bin Meng <bmeng@tinylab.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230206140022.2748401-4-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The microchip_icicle_kit, sifive_u, spike and virt boards are now doing
the same steps when '-kernel' is used:
- execute load_kernel()
- load init_rd()
- write kernel_cmdline
Let's fold everything inside riscv_load_kernel() to avoid code
repetition. To not change the behavior of boards that aren't calling
riscv_load_init(), add an 'load_initrd' flag to riscv_load_kernel() and
allow these boards to opt out from initrd loading.
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Reviewed-by: Bin Meng <bmeng@tinylab.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230206140022.2748401-3-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Next patch will move all calls to riscv_load_initrd() to
riscv_load_kernel(). Machines that want to load initrd will be able to
do via an extra flag to riscv_load_kernel().
This change will expose a sign-extend behavior that is happening in
load_elf_ram_sym() when running 32 bit guests [1]. This is currently
obscured by the fact that riscv_load_initrd() is using the return of
riscv_load_kernel(), defined as target_ulong, and this return type will
crop the higher 32 bits that would be padded with 1s by the sign
extension when running in 32 bit targets. The changes to be done will
force riscv_load_initrd() to use an uint64_t instead, exposing it to the
padding when dealing with 32 bit CPUs.
There is a discussion about whether load_elf_ram_sym() should or should
not sign extend the value returned by 'lowaddr'. What we can do is to
prevent the behavior change that the next patch will end up doing.
riscv_load_initrd() wasn't dealing with 64 bit kernel entries when
running 32 bit CPUs, and we want to keep it that way.
One way of doing it is to use target_ulong in 'kernel_entry' in
riscv_load_kernel() and rely on the fact that this var will not be sign
extended for 32 bit targets. Another way is to explictly clear the
higher 32 bits when running 32 bit CPUs for all possibilities of
kernel_entry.
We opted for the later. This will allow us to be clear about the design
choices made in the function, while also allowing us to add a small
comment about what load_elf_ram_sym() is doing. With this change, the
consolation patch can do its job without worrying about unintended
behavioral changes.
[1] https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg02281.html
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230206140022.2748401-2-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Once that res_compatible is removed, they don't make sense anymore.
We remove the _only preffix. And to make things clearer we rename
them to must_precopy and can_postcopy.
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Nothing assigns to it after previous commit.
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Use the SCLP_EVENT() QOM type-checking macro to avoid DO_UPCAST().
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230212225144.58660-16-philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Include it in the .c files instead that use the error reporting
functions.
Message-Id: <20230210111931.1115489-1-thuth@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Include "hw/registerfields.h" in the .c files instead (if needed).
Message-Id: <20230210112315.1116966-1-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
It's been deprecated since QEMU v6.2, so it should be OK to
finally remove this now.
Message-Id: <20230209161540.1054669-1-thuth@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
virtio_blk_update_config() calls blk_get_geometry and blk_getlength,
and both functions eventually end up calling bdrv_poll_co when not
running in a coroutine:
- blk_getlength is a co_wrapper_mixed function
- blk_get_geometry calls bdrv_get_geometry -> bdrv_nb_sectors, a
co_wrapper_mixed function too
Since we are not running in a coroutine, we need to take s->blk
AioContext lock, otherwise bdrv_poll_co will inevitably call
AIO_WAIT_WHILE and therefore try to un unlock() an AioContext lock
that was never acquired.
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=2167838
Steps to reproduce the issue: simply boot a VM with
-object '{"qom-type":"iothread","id":"iothread1"}' \
-blockdev '{"driver":"file","filename":"$QCOW2","aio":"native","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage"}' \
-device virtio-blk-pci,iothread=iothread1,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on
and observe that it will fail not manage to boot with "qemu_mutex_unlock_impl: Operation not permitted"
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lukáš Doktor <ldoktor@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230208111148.1040083-1-eesposit@redhat.com>
vhost_dev_cleanup() clears vhost_dev so back up its vqs member to free
the memory pointed by the member.
Fixes: 98fc1ada4c ("virtio: add vhost-user-fs base device")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230130140225.77964-1-akihiko.odaki@daynix.com>
Tracked down with the help of scripts/clean-includes.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230202133830.2152150-21-armbru@redhat.com>
This commit was created with scripts/clean-includes.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20230202133830.2152150-19-armbru@redhat.com>
This commit was created with scripts/clean-includes.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230202133830.2152150-18-armbru@redhat.com>
This commit was created with scripts/clean-includes.
All .c should include qemu/osdep.h first. The script performs three
related cleanups:
* Ensure .c files include qemu/osdep.h first.
* Including it in a .h is redundant, since the .c already includes
it. Drop such inclusions.
* Likewise, including headers qemu/osdep.h includes is redundant.
Drop these, too.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20230202133830.2152150-9-armbru@redhat.com>
* various small cleanups and fixes
* new variant of the supermicrox11-bmc machine using an ast2500-a1 SoC
* at24c_eeprom extension to define eeprom contents with static arrays
* ast10x0 model and test improvements
* avocado update of images to use the latest
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmPiByEACgkQUaNDx8/7
7KF1nw/7BxVb8bxO5T00AnGDFNahDq3ItyisrbOkElDw18oN1eULrtZFH1UopjDE
3HKwR2nb4X7MfcLirVXXxwO1GgIxUkeCsVEY6hpg3TxDPRhPW2toNpNt/WCfFKgq
ZdYdaKgkON/xHQPv6kgQzU2n9Zpuznj0CE9A3k1mAyBcCSitsvu4TW6AQBKmLgUR
9lu61onfX9XoPxZv3abuY3c3UyzevOc6BUT67dmr8naAhHLyBU+DWAW6Kg0Dtc9j
p+bwxIDRimK50DJt9l13OLSAJyhrW1gMsPPGb+48OClpEOhHwq8oqRuMFpbHaQ0/
2MMtMbavXtzBScfmLzR3yw2IwohxSXKMe+7irkJiG/hc8/gtpRATaaS+zfvS0rla
QybWYtJyjmW+QUOnmBsKGwT0PWJcOd3bKtVPgPd7WGeHGVtTBOqU/svExaO+gIv8
uX1gOelEgLmLenUjc/Wp4cHgnePTBK8vG1g3IrEtcCblhwpr0e3/aJgHGgO3cQzH
X9P2buwHyLzjsie9S1ebG9Ceg/VsGQpxNGISZdG+Z4c3+GYu5gcGQcqIAuFmwBnE
QHSNHJXITyWjo7UuqL7e1J7vROUKn0S15V9MO/yOmZgkqubu4Gt3jGcJtIGqIBlu
MFra7SiVjKBnt6PD3aKEdD9uahbqFUfmX9411ZmYUUzpfflKnCQ=
=IY/i
-----END PGP SIGNATURE-----
Merge tag 'pull-aspeed-20230207' of https://github.com/legoater/qemu into staging
aspeed queue:
* various small cleanups and fixes
* new variant of the supermicrox11-bmc machine using an ast2500-a1 SoC
* at24c_eeprom extension to define eeprom contents with static arrays
* ast10x0 model and test improvements
* avocado update of images to use the latest
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmPiByEACgkQUaNDx8/7
# 7KF1nw/7BxVb8bxO5T00AnGDFNahDq3ItyisrbOkElDw18oN1eULrtZFH1UopjDE
# 3HKwR2nb4X7MfcLirVXXxwO1GgIxUkeCsVEY6hpg3TxDPRhPW2toNpNt/WCfFKgq
# ZdYdaKgkON/xHQPv6kgQzU2n9Zpuznj0CE9A3k1mAyBcCSitsvu4TW6AQBKmLgUR
# 9lu61onfX9XoPxZv3abuY3c3UyzevOc6BUT67dmr8naAhHLyBU+DWAW6Kg0Dtc9j
# p+bwxIDRimK50DJt9l13OLSAJyhrW1gMsPPGb+48OClpEOhHwq8oqRuMFpbHaQ0/
# 2MMtMbavXtzBScfmLzR3yw2IwohxSXKMe+7irkJiG/hc8/gtpRATaaS+zfvS0rla
# QybWYtJyjmW+QUOnmBsKGwT0PWJcOd3bKtVPgPd7WGeHGVtTBOqU/svExaO+gIv8
# uX1gOelEgLmLenUjc/Wp4cHgnePTBK8vG1g3IrEtcCblhwpr0e3/aJgHGgO3cQzH
# X9P2buwHyLzjsie9S1ebG9Ceg/VsGQpxNGISZdG+Z4c3+GYu5gcGQcqIAuFmwBnE
# QHSNHJXITyWjo7UuqL7e1J7vROUKn0S15V9MO/yOmZgkqubu4Gt3jGcJtIGqIBlu
# MFra7SiVjKBnt6PD3aKEdD9uahbqFUfmX9411ZmYUUzpfflKnCQ=
# =IY/i
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 07 Feb 2023 08:09:05 GMT
# gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1
* tag 'pull-aspeed-20230207' of https://github.com/legoater/qemu: (25 commits)
aspeed/sdmc: Drop unnecessary scu include
tests/avocado: Test Aspeed Zephyr SDK v00.01.08 on AST1030 board
hw/arm/aspeed_ast10x0: Add TODO comment to use Cortex-M4F
hw/arm/aspeed_ast10x0: Map HACE peripheral
hw/arm/aspeed_ast10x0: Map the secure SRAM
hw/arm/aspeed_ast10x0: Map I3C peripheral
hw/arm/aspeed_ast10x0: Add various unimplemented peripherals
hw/misc/aspeed_hace: Do not crash if address_space_map() failed
hw/watchdog/wdt_aspeed: Log unimplemented registers as UNIMP level
hw/watchdog/wdt_aspeed: Extend MMIO range to cover more registers
hw/watchdog/wdt_aspeed: Rename MMIO region size as 'iosize'
hw/nvram/eeprom_at24c: Make reset behavior more like hardware
hw/arm/aspeed: Add aspeed_eeprom.c
hw/nvram/eeprom_at24c: Add init_rom field and at24c_eeprom_init_rom helper
hw/arm/aspeed: Replace aspeed_eeprom_init with at24c_eeprom_init
hw/arm: Extract at24c_eeprom_init helper from Aspeed and Nuvoton boards
hw/core/loader: Remove declarations of option_rom_has_mr/rom_file_has_mr
tests/avocado/machine_aspeed.py: Mask systemd services to speed up SDK boot
tests/avocado/machine_aspeed.py: update buildroot tests
m25p80: Add the is25wp256 SFPD table
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In this try
- rebase to latest upstream
- same than previous patch
- fix compilation on non linux (userfaultfd.h) (me)
- query-migrationthreads (jiang)
- fix race on reading MultiFDPages_t.block (zhenzhong)
- fix flush of zero copy page send reuest (zhenzhong)
Please apply.
Previous try:
It includes:
- David Hildenbrand fixes for virtio-men
- David Gilbert canary to detect problems
- Fix for rdma return values (Fiona)
- Peter Xu uffd_open fixes
- Peter Xu show right downtime for postcopy
- manish.mishra msg fix fixes
- my vfio changes.
Please apply.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEGJn/jt6/WMzuA0uC9IfvGFhy1yMFAmPhobYACgkQ9IfvGFhy
1yMNaA/9EHDPqrI1HL/VkJG4nNOOsQR7RbburXEberZOzvLjnqpjUD3Ls9qV6rx+
ieHa5T4imYJFk72Wa5vx4r1/dCjtJD2W6jg5+/0nTvYAHrs1U1VRqpuTr0HiXdbJ
ZLLCnW5eDyO3eMaOX0MUkgHgL0FNkc/Lq5ViCTFsMu9O9xMuDLLdAC3cdvslKuOu
X1gKByr9jT817Y9e36amYmRaJKC6Cr/PIekNVFu12HBW79pPusLX8KWEf4RBw4HR
sPwTvMCR/BwZ0+2Lppan60G5rt/ZxDu40oU7y+RHlfWqevl4hDM84/nhjMvEgzc5
a4Ahe2ERGLwwnC8z3l7v9+pEzSGzDoPcnRGvZcpUpk68wTDtxd5Bdq8CwmNUfL07
VzWcYpH0yvmwjBba9jfn9fAVgnG5rVp558XcYLIII3wEToty3UDtm43wSdj2CGr6
cu+IPAp+n/I5G9SRYBTU9ozJz45ttnEe0hxUtZ4I3MuhzHi1VEDAqTWM/X0LyS41
TB3Y5B2KKpJYbPyZEH4nyTeetR2k7alTFzahCgKqVfOgL0nJx54petjS1K+B1P72
g6lhP9WnQ33W+M8S7J/aGEaDJd1lFyFB2Rdjn2ZZnASH/fR9j0mFmXWvulXtjFNp
Sfim3887+Iv4Uzw4VWEe3mM5Ypi/Ba2CmuTjy/pM08Ey8X1Qs5o=
=ZQbR
-----END PGP SIGNATURE-----
Merge tag 'migration-20230206-pull-request' of https://gitlab.com/juan.quintela/qemu into staging
Migration Pull request
In this try
- rebase to latest upstream
- same than previous patch
- fix compilation on non linux (userfaultfd.h) (me)
- query-migrationthreads (jiang)
- fix race on reading MultiFDPages_t.block (zhenzhong)
- fix flush of zero copy page send reuest (zhenzhong)
Please apply.
Previous try:
It includes:
- David Hildenbrand fixes for virtio-men
- David Gilbert canary to detect problems
- Fix for rdma return values (Fiona)
- Peter Xu uffd_open fixes
- Peter Xu show right downtime for postcopy
- manish.mishra msg fix fixes
- my vfio changes.
Please apply.
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEGJn/jt6/WMzuA0uC9IfvGFhy1yMFAmPhobYACgkQ9IfvGFhy
# 1yMNaA/9EHDPqrI1HL/VkJG4nNOOsQR7RbburXEberZOzvLjnqpjUD3Ls9qV6rx+
# ieHa5T4imYJFk72Wa5vx4r1/dCjtJD2W6jg5+/0nTvYAHrs1U1VRqpuTr0HiXdbJ
# ZLLCnW5eDyO3eMaOX0MUkgHgL0FNkc/Lq5ViCTFsMu9O9xMuDLLdAC3cdvslKuOu
# X1gKByr9jT817Y9e36amYmRaJKC6Cr/PIekNVFu12HBW79pPusLX8KWEf4RBw4HR
# sPwTvMCR/BwZ0+2Lppan60G5rt/ZxDu40oU7y+RHlfWqevl4hDM84/nhjMvEgzc5
# a4Ahe2ERGLwwnC8z3l7v9+pEzSGzDoPcnRGvZcpUpk68wTDtxd5Bdq8CwmNUfL07
# VzWcYpH0yvmwjBba9jfn9fAVgnG5rVp558XcYLIII3wEToty3UDtm43wSdj2CGr6
# cu+IPAp+n/I5G9SRYBTU9ozJz45ttnEe0hxUtZ4I3MuhzHi1VEDAqTWM/X0LyS41
# TB3Y5B2KKpJYbPyZEH4nyTeetR2k7alTFzahCgKqVfOgL0nJx54petjS1K+B1P72
# g6lhP9WnQ33W+M8S7J/aGEaDJd1lFyFB2Rdjn2ZZnASH/fR9j0mFmXWvulXtjFNp
# Sfim3887+Iv4Uzw4VWEe3mM5Ypi/Ba2CmuTjy/pM08Ey8X1Qs5o=
# =ZQbR
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 07 Feb 2023 00:56:22 GMT
# gpg: using RSA key 1899FF8EDEBF58CCEE034B82F487EF185872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>" [full]
# gpg: aka "Juan Quintela <quintela@trasno.org>" [full]
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723
* tag 'migration-20230206-pull-request' of https://gitlab.com/juan.quintela/qemu: (30 commits)
migration: save/delete migration thread info
migration: Introduce interface query-migrationthreads
multifd: Fix flush of zero copy page send request
multifd: Fix a race on reading MultiFDPages_t.block
migration: check magic value for deciding the mapping of channels
io: Add support for MSG_PEEK for socket channel
migration/dirtyrate: Show sample pages only in page-sampling mode
migration: Perform vmsd structure check during tests
migration: Add canary to VMSTATE_END_OF_LIST
migration/rdma: fix return value for qio_channel_rdma_{readv,writev}
migration: Show downtime during postcopy phase
virtio-mem: Proper support for preallocation with migration
virtio-mem: Migrate immutable properties early
virtio-mem: Fail if a memory backend with "prealloc=on" is specified
migration/ram: Factor out check for advised postcopy
migration/vmstate: Introduce VMSTATE_WITH_TMP_TEST() and VMSTATE_BITMAP_TEST()
migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM)
migration/savevm: Prepare vmdesc json writer in qemu_savevm_state_setup()
migration/savevm: Move more savevm handling into vmstate_save()
migration/ram: Optimize ram_write_tracking_start() for RamDiscardManager
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The model includes aspeed_scu.h but doesn't appear to require it.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230124062022.298230-1-joel@jms.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
This SoC uses a Cortex-M4F. QEMU only implements a M4,
which is good enough. Add a TODO note in case the M4F
is added.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Delevoryas <peter@pjd.dev>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Some SRAM appears to be used by the Secure Boot unit and
crypto accelerators. Name it 'secure sram'.
Note, the SRAM base address was already present but unused
(the 'SBC' index is used for the MMIO peripheral).
Interestingly using CFLAGS=-Winitializer-overrides reports:
../hw/arm/aspeed_ast10x0.c:32:30: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
[ASPEED_DEV_SBC] = 0x7E6F2000,
^~~~~~~~~~
../hw/arm/aspeed_ast10x0.c:24:30: note: previous initialization is here
[ASPEED_DEV_SBC] = 0x79000000,
^~~~~~~~~~
This fixes with Zephyr:
uart:~$ rsa test
rsa test vector[0]:
[00:00:26.156,000] <err> os: ***** BUS FAULT *****
[00:00:26.157,000] <err> os: Precise data bus error
[00:00:26.157,000] <err> os: BFAR Address: 0x79000000
[00:00:26.158,000] <err> os: r0/a1: 0x79000000 r1/a2: 0x00000000 r2/a3: 0x00001800
[00:00:26.158,000] <err> os: r3/a4: 0x79001800 r12/ip: 0x00000800 r14/lr: 0x0001098d
[00:00:26.158,000] <err> os: xpsr: 0x81000000
[00:00:26.158,000] <err> os: Faulting instruction address (r15/pc): 0x0001e1bc
[00:00:26.158,000] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:00:26.158,000] <err> os: Current thread: 0x38248 (shell_uart)
[00:00:26.165,000] <err> os: Halting system
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Delevoryas <peter@pjd.dev>
[ clg: Fixed size of Secure Boot Controller Memory ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Since I don't have access to the datasheet, the relevant
values were found in:
https://github.com/AspeedTech-BMC/zephyr/blob/v00.01.08/dts/arm/aspeed/ast10x0.dtsi
Reviewed-by: Peter Delevoryas <peter@pjd.dev>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Add more Aspeed watchdog registers from [*].
Since guests can righteously access them, log the access at
'unimplemented' level instead of 'guest-errors'.
[*] https://github.com/AspeedTech-BMC/zephyr/blob/v00.01.08/drivers/watchdog/wdt_aspeed.c#L31
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Delevoryas <peter@pjd.dev>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Avoid confusing two different things:
- the WDT I/O region size ('iosize')
- at which offset the SoC map the WDT ('offset')
While it is often the same, we can map smaller region sizes
at larger offsets.
Here we are interested in the I/O region size, so rename as
'iosize'.
Reviewed-by: Peter Delevoryas <peter@pjd.dev>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
[ clg: Introduced temporary wdt_offset variable ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
EEPROM's are a form of non-volatile memory. After power-cycling an EEPROM,
I would expect the I2C state machine to be reset to default values, but I
wouldn't really expect the memory to change at all.
The current implementation of the at24c EEPROM resets its internal memory on
reset. This matches the specification in docs/devel/reset.rst:
Cold reset is supported by every resettable object. In QEMU, it means we reset
to the initial state corresponding to the start of QEMU; this might differ
from what is a real hardware cold reset. It differs from other resets (like
warm or bus resets) which may keep certain parts untouched.
But differs from my intuition. For example, if someone writes some information
to an EEPROM, then AC power cycles their board, they would expect the EEPROM to
retain that information. It's very useful to be able to test things like this
in QEMU as well, to verify software instrumentation like determining the cause
of a reboot.
Fixes: 5d8424dbd3 ("nvram: add AT24Cx i2c eeprom")
Signed-off-by: Peter Delevoryas <peter@pjd.dev>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Corey Minyard <cminyard@mvista.com>
Link: https://lore.kernel.org/r/20230128060543.95582-6-peter@pjd.dev
Signed-off-by: Cédric Le Goater <clg@kaod.org>
- Create aspeed_eeprom.c and aspeed_eeprom.h
- Include aspeed_eeprom.c in CONFIG_ASPEED meson source files
- Include aspeed_eeprom.h in aspeed.c
- Add fby35_bmc_fruid data
- Use new at24c_eeprom_init_rom helper to initialize BMC FRUID EEPROM with data
from aspeed_eeprom.c
wget https://github.com/facebook/openbmc/releases/download/openbmc-e2294ff5d31d/fby35.mtd
qemu-system-aarch64 -machine fby35-bmc -nographic -mtdblock fby35.mtd
...
user: root
pass: 0penBmc
...
root@bmc-oob:~# fruid-util bb
FRU Information : Baseboard
--------------- : ------------------
Chassis Type : Rack Mount Chassis
Chassis Part Number : N/A
Chassis Serial Number : N/A
Board Mfg Date : Fri Jan 7 10:30:00 2022
Board Mfg : XXXXXX
Board Product : Management Board wBMC
Board Serial : XXXXXXXXXXXXX
Board Part Number : XXXXXXXXXXXXXX
Board FRU ID : 1.0
Board Custom Data 1 : XXXXXXXXX
Board Custom Data 2 : XXXXXXXXXXXXXXXXXX
Product Manufacturer : XXXXXX
Product Name : Yosemite V3.5 EVT2
Product Part Number : XXXXXXXXXXXXXX
Product Version : EVT2
Product Serial : XXXXXXXXXXXXX
Product Asset Tag : XXXXXXX
Product FRU ID : 1.0
Product Custom Data 1 : XXXXXXXXX
Product Custom Data 2 : N/A
root@bmc-oob:~# fruid-util bmc
FRU Information : BMC
--------------- : ------------------
Board Mfg Date : Mon Jan 10 21:42:00 2022
Board Mfg : XXXXXX
Board Product : BMC Storage Module
Board Serial : XXXXXXXXXXXXX
Board Part Number : XXXXXXXXXXXXXX
Board FRU ID : 1.0
Board Custom Data 1 : XXXXXXXXX
Board Custom Data 2 : XXXXXXXXXXXXXXXXXX
Product Manufacturer : XXXXXX
Product Name : Yosemite V3.5 EVT2
Product Part Number : XXXXXXXXXXXXXX
Product Version : EVT2
Product Serial : XXXXXXXXXXXXX
Product Asset Tag : XXXXXXX
Product FRU ID : 1.0
Product Custom Data 1 : XXXXXXXXX
Product Custom Data 2 : Config A
root@bmc-oob:~# fruid-util nic
FRU Information : NIC
--------------- : ------------------
Board Mfg Date : Tue Nov 2 08:51:00 2021
Board Mfg : XXXXXXXX
Board Product : Mellanox ConnectX-6 DX OCP3.0
Board Serial : XXXXXXXXXXXXXXXXXXXXXXXX
Board Part Number : XXXXXXXXXXXXXXXXXXXXX
Board FRU ID : FRU Ver 0.02
Product Manufacturer : XXXXXXXX
Product Name : Mellanox ConnectX-6 DX OCP3.0
Product Part Number : XXXXXXXXXXXXXXXXXXXXX
Product Version : A9
Product Serial : XXXXXXXXXXXXXXXXXXXXXXXX
Product Custom Data 3 : ConnectX-6 DX
Signed-off-by: Peter Delevoryas <peter@pjd.dev>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Corey Minyard <cminyard@mvista.com>
Link: https://lore.kernel.org/r/20230128060543.95582-5-peter@pjd.dev
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Allows users to specify binary data to initialize an EEPROM, allowing users to
emulate data programmed at manufacturing time.
- Added init_rom and init_rom_size attributes to TYPE_AT24C_EE
- Added at24c_eeprom_init_rom helper function to initialize attributes
- If -drive property is provided, it overrides init_rom data
Signed-off-by: Peter Delevoryas <peter@pjd.dev>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Corey Minyard <cminyard@mvista.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Ninad Palsule <ninadpalsule@us.ibm.com>
Link: https://lore.kernel.org/r/20230128060543.95582-4-peter@pjd.dev
Signed-off-by: Cédric Le Goater <clg@kaod.org>
aspeed_eeprom_init is an exact copy of at24c_eeprom_init, not needed.
Signed-off-by: Peter Delevoryas <peter@pjd.dev>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Corey Minyard <cminyard@mvista.com>
Link: https://lore.kernel.org/r/20230128060543.95582-3-peter@pjd.dev
Signed-off-by: Cédric Le Goater <clg@kaod.org>
This helper is useful in board initialization because lets users initialize and
realize an EEPROM on an I2C bus with a single function call.
Signed-off-by: Peter Delevoryas <peter@pjd.dev>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Corey Minyard <cminyard@mvista.com>
Link: https://lore.kernel.org/r/20230128060543.95582-2-peter@pjd.dev
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Generated from hardware using the following command and then padding
with 0xff to fill out a power-of-2:
xxd -p /sys/bus/spi/devices/spi0.0/spi-nor/sfdp
Cc: Michael Walle <michael@walle.cc>
Cc: Tudor Ambarus <tudor.ambarus@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Message-Id: <20221221122213.1458540-1-linux@roeck-us.net>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
With the `size += 4` before the call to `crc32`, the CRC calculation
would overrun the buffer. Size is used in the while loop starting on
line 1009 to determine how much data to write back, with the last
four bytes coming from `crc_ptr`, so do need to increase it, but should
do this after the computation.
I'm unsure why this use of uninitialized memory in the CRC doesn't
result in CRC errors, but it seems clear to me that it should not be
included in the calculation.
Signed-off-by: Stephen Longfield <slongfield@google.com>
Reviewed-by: Hao Wu <wuhaotsh@google.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Message-Id: <20221220221437.3303721-1-slongfield@google.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
supermicrox11-bmc is configured with ast2400-a1 SoC. This does not match
the Supermicro documentation for X11 BMCs, and it does not match the
devicetree file in the Linux kernel.
As it turns out, some Supermicro X11 motherboards use AST2400 SoCs,
while others use AST2500.
Introduce new machine type supermicrox11-spi-bmc with AST2500 SoC
to match the devicetree description in the Linux kernel. Hardware
configuration details for this machine type are guesswork and taken
from defaults as well as from the Linux kernel devicetree file.
The new machine type was tested with aspeed-bmc-supermicro-x11spi.dts
from the Linux kernel and with Linux versions 6.0.3 and 6.1-rc2.
Linux booted successfully from initrd and from both SPI interfaces.
Ethernet interfaces were confirmed to be operational.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20221025165109.1226001-1-linux@roeck-us.net
[ clg: Renamed machine to 'supermicro-x11spi-bmc' ]
Message-Id: <20221025165109.1226001-1-linux@roeck-us.net>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The M2S-FG484 SOM uses a 16 MiB SPI flash (Spansion
S25FL128SDPBHICO). Since the test asset is bigger,
truncate it to the correct size to avoid when running
the test_arm_emcraft_sf2 test:
qemu-system-arm: device requires 16777216 bytes, block backend provides 67108864 bytes
Add comment regarding the M2S-FG484 SOM hardware in
hw/arm/msf2-som.c.
Reported-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
There is no need to declare an intermediate "MachineState *ms".
Signed-off-by: Bin Meng <bmeng@tinylab.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230206085007.3618715-1-bmeng@tinylab.org>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
As it is now, riscv_compute_fdt_addr() is receiving a dram_base, a
mem_size (which is defaulted to MachineState::ram_size in all boards)
and the FDT pointer. And it makes a very important assumption: the DRAM
interval dram_base + mem_size is contiguous. This is indeed the case for
most boards that use a FDT.
The Icicle Kit board works with 2 distinct RAM banks that are separated
by a gap. We have a lower bank with 1GiB size, a gap follows, then at
64GiB the high memory starts. MachineClass::default_ram_size for this
board is set to 1.5Gb, and machine_init() is enforcing it as minimal RAM
size, meaning that there we'll always have at least 512 MiB in the Hi
RAM area.
Using riscv_compute_fdt_addr() in this board is weird because not only
the board has sparse RAM, and it's calling it using the base address of
the Lo RAM area, but it's also using a mem_size that we have guarantees
that it will go up to the Hi RAM. All the function assumptions doesn't
work for this board.
In fact, what makes the function works at all in this case is a
coincidence. Commit 1a475d39ef introduced a 3GB boundary for the FDT,
down from 4Gb, that is enforced if dram_base is lower than 3072 MiB. For
the Icicle Kit board, memmap[MICROCHIP_PFSOC_DRAM_LO].base is 0x80000000
(2 Gb) and it has a 1Gb size, so it will fall in the conditions to put
the FDT under a 3Gb address, which happens to be exactly at the end of
DRAM_LO. If the base address of the Lo area started later than 3Gb this
function would be unusable by the board. Changing any assumptions inside
riscv_compute_fdt_addr() can also break it by accident as well.
Let's change riscv_compute_fdt_addr() semantics to be appropriate to the
Icicle Kit board and for future boards that might have sparse RAM
topologies to worry about:
- relieve the condition that the dram_base + mem_size area is contiguous,
since this is already not the case today;
- receive an extra 'dram_size' size attribute that refers to a contiguous
RAM block that the board wants the FDT to reside on.
Together with 'mem_size' and 'fdt', which are now now being consumed by a
MachineState pointer, we're able to make clear assumptions based on the
DRAM block and total mem_size available to ensure that the FDT will be put
in a valid RAM address.
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230201171212.1219375-4-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
A common trend in other archs is to calculate the fdt address, which is
usually straightforward, and then calling a function that loads the
fdt/dtb by using that address.
riscv_load_fdt() is doing a bit too much in comparison. It's calculating
the fdt address via an elaborated heuristic to put the FDT at the bottom
of DRAM, and "bottom of DRAM" will vary across boards and
configurations, then it's actually loading the fdt, and finally it's
returning the fdt address used to the caller.
Reduce the existing complexity of riscv_load_fdt() by splitting its code
into a new function, riscv_compute_fdt_addr(), that will take care of
all fdt address logic. riscv_load_fdt() can then be a simple function
that just loads a fdt at the given fdt address.
We're also taken the opportunity to clarify the intentions and
assumptions made by these functions. riscv_load_fdt() is now receiving a
hwaddr as fdt_addr because there is no restriction of having to load the
fdt in higher addresses that doesn't fit in an uint32_t.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-Id: <20230201171212.1219375-3-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
fdt_pack() can change the fdt size, meaning that fdt_totalsize() can
contain a now deprecated (bigger) value.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-Id: <20230201171212.1219375-2-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Follow the QEMU convention of naming MachineState pointers as 'ms' by
renaming the instances where we're calling it 'mc'.
Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-Id: <20230124212234.412630-4-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
We have a convention in other QEMU boards/archs to name MachineState
pointers as either 'machine' or 'ms'. MachineClass pointers are usually
called 'mc'.
The 'virt' RISC-V machine has a lot of instances where MachineState
pointers are named 'mc'. There is nothing wrong with that, but we gain
more compatibility with the rest of the QEMU code base, and easier
reviews, if we follow QEMU conventions.
Rename all 'mc' MachineState pointers to 'ms'. This is a very tedious
and mechanical patch that was produced by doing the following:
- find/replace all 'MachineState *mc' to 'MachineState *ms';
- find/replace all 'mc->fdt' to 'ms->fdt';
- find/replace all 'mc->smp.cpus' to 'ms->smp.cpus';
- replace any remaining occurrences of 'mc' that the compiler complained
about.
Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-Id: <20230124212234.412630-3-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
riscv_socket_count() returns either ms->numa_state->num_nodes or 1
depending on NUMA support. In any case the value can be retrieved only
once and used in the rest of the function.
This will also alleviate the rename we're going to do next by reducing
the instances of MachineState 'mc' inside hw/riscv/virt.c.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-Id: <20230124212234.412630-2-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
If the CSRs and CSR instructions are disabled because the Zicsr
extension isn't enabled then we want to make sure we don't run any CSR
instructions in the boot ROM.
This patches removes the CSR instructions from the reset-vec if the
extension isn't enabled. We replace the instruction with a NOP instead.
Note that we don't do this for the SiFive U machine, as we are modelling
the hardware in that case.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1447
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-Id: <20230123035754.75553-1-alistair.francis@opensource.wdc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Updates the opentitan IRQs to match the latest supported commit of
Opentitan from TockOS.
OPENTITAN_SUPPORTED_SHA := 565e4af39760a123c59a184aa2f5812a961fde47
Memory layout as per [1]
[1] 565e4af397/hw/top_earlgrey/sw/autogen/top_earlgrey_memory.h
Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20230123063619.222459-1-wilfred.mallawa@opensource.wdc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Ordinary memory preallocation runs when QEMU starts up and creates the
memory backends, before processing the incoming migration stream. With
virtio-mem, we don't know which memory blocks to preallocate before
migration started. Now that we migrate the virtio-mem bitmap early, before
migrating any RAM content, we can safely preallocate memory for all plugged
memory blocks before migrating any RAM content.
This is especially relevant for the following cases:
(1) User errors
With hugetlb/files, if we don't have sufficient backend memory available on
the migration destination, we'll crash QEMU (SIGBUS) during RAM migration
when running out of backend memory. Preallocating memory before actual
RAM migration allows for failing gracefully and informing the user about
the setup problem.
(2) Excluded memory ranges during migration
For example, virtio-balloon free page hinting will exclude some pages
from getting migrated. In that case, we won't crash during RAM
migration, but later, when running the VM on the destination, which is
bad.
To fix this for new QEMU machines that migrate the bitmap early,
preallocate the memory early, before any RAM migration. Warn with old
QEMU machines.
Getting postcopy right is a bit tricky, but we essentially now implement
the same (problematic) preallocation logic as ordinary preallocation:
preallocate memory early and discard it again before precopy starts. During
ordinary preallocation, discarding of RAM happens when postcopy is advised.
As the state (bitmap) is loaded after postcopy was advised but before
postcopy starts listening, we have to discard memory we preallocated
immediately again ourselves.
Note that nothing (not even hugetlb reservations) guarantees for postcopy
that backend memory (especially, hugetlb pages) are still free after they
were freed ones while discarding RAM. Still, allocating that memory at
least once helps catching some basic setup problems.
Before this change, trying to restore a VM when insufficient hugetlb
pages are around results in the process crashing to to a "Bus error"
(SIGBUS). With this change, QEMU fails gracefully:
qemu-system-x86_64: qemu_prealloc_mem: preallocating memory failed: Bad address
qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-mem-device-early'
qemu-system-x86_64: load of migration failed: Cannot allocate memory
And we can even introspect the early migration data, including the
bitmap:
$ ./scripts/analyze-migration.py -f STATEFILE
{
"ram (2)": {
"section sizes": {
"0000:00:03.0/mem0": "0x0000000780000000",
"0000:00:04.0/mem1": "0x0000000780000000",
"pc.ram": "0x0000000100000000",
"/rom@etc/acpi/tables": "0x0000000000020000",
"pc.bios": "0x0000000000040000",
"0000:00:02.0/e1000.rom": "0x0000000000040000",
"pc.rom": "0x0000000000020000",
"/rom@etc/table-loader": "0x0000000000001000",
"/rom@etc/acpi/rsdp": "0x0000000000001000"
}
},
"0000:00:03.0/virtio-mem-device-early (51)": {
"tmp": "00 00 00 01 40 00 00 00 00 00 00 07 80 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00",
"size": "0x0000000040000000",
"bitmap": "ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [...]
},
"0000:00:04.0/virtio-mem-device-early (53)": {
"tmp": "00 00 00 08 c0 00 00 00 00 00 00 07 80 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00",
"size": "0x00000001fa400000",
"bitmap": "ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [...]
},
[...]
Reported-by: Jing Qi <jinqi@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>S
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The bitmap and the size are immutable while migration is active: see
virtio_mem_is_busy(). We can migrate this information early, before
migrating any actual RAM content. Further, all information we need for
sanity checks is immutable as well.
Having this information in place early will, for example, allow for
properly preallocating memory before touching these memory locations
during RAM migration: this way, we can make sure that all memory was
actually preallocated and that any user errors (e.g., insufficient
hugetlb pages) can be handled gracefully.
In contrast, usable_region_size and requested_size can theoretically
still be modified on the source while the VM is running. Keep migrating
these properties the usual, late, way.
Use a new device property to keep behavior of compat machines
unmodified.
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>S
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
"prealloc=on" for the memory backend does not work as expected, as
virtio-mem will simply discard all preallocated memory immediately again.
In the best case, it's an expensive NOP. In the worst case, it's an
unexpected allocation error.
Instead, "prealloc=on" should be specified for the virtio-mem device only,
such that virtio-mem will try preallocating memory before plugging
memory dynamically to the guest. Fail if such a memory backend is
provided.
Tested-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>S
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Until previous commit, save_live_pending() was used for ram. Now with
the split into state_pending_estimate() and state_pending_exact() it
is not needed anymore, so remove them.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
We split the function into to:
- state_pending_estimate: We estimate the remaining state size without
stopping the machine.
- state pending_exact: We calculate the exact amount of remaining
state.
The only "device" that implements different functions for _estimate()
and _exact() is ram.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Add a way to set a backing store for the mac_nvram. Use -drive
file=nvram.img,format=raw,if=mtd to specify backing file where
nvram.img must be MACIO_NVRAM_SIZE which is 8192 bytes.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <1aadee8f0ca0f56cf1b7c45c3944676a07d91de9.1675297286.git.balaton@eik.bme.hu>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Add a way to set a backing store for the mac_nvram similar to what
spapr_nvram or mac_via PRAM already does to allow to save its contents
between runs.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <4b1605a9e484cc95f6e141f297487a070fd418ac.1675297286.git.balaton@eik.bme.hu>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Use the convention to return bool from functions which take an error
pointer which allows for callers to pass through their error pointer
without needing a local.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <bfce0751e82b031f5e6fb3c32cfbce6325434400.1674001242.git.balaton@eik.bme.hu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Drop some local variables that could just be substituted at the single
place they were used. This makes the code shorter and simpler.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <165a4ea190af7c09832f50f02004fad82f704898.1674001242.git.balaton@eik.bme.hu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Some functions use sysbus_dev while others sbd name for local variable
storing a sysbus device pointer. Standardise on the shorter name to be
consistent and make the code easier to read as short name is less
distracting and needs less line breaks.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <6c79d6903fc11e153f8050a374904c2b5d5db585.1674001242.git.balaton@eik.bme.hu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
At several places we already have the object pointer with the right
type so we don't need to cast it back and forth. Avoiding these casts
improves readability.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <67b2d4700879c3b4cd574f1faa1a0d1950b3d0ee.1674001242.git.balaton@eik.bme.hu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
We already have machine in a local variable so no need to use
qdev_get_machine(), also remove now unneeded line break.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <719299533b89aa4516966065eae05c75744f50d3.1672868854.git.balaton@eik.bme.hu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
The header hw/input/adb.h is included by some files that don't need
it. Clean it up and include only where necessary.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <f46bc751e8426f9d937c9540f2e67d2f0b2cc582.1672868854.git.balaton@eik.bme.hu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
This is not needed in C.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <58f599387dd0739ea1880bfb678872c0be26bf1b.1674333199.git.balaton@eik.bme.hu>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
No need to wrap constants in parenthesis.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <9194546b73b05e7098761ec62b2dfd0699b97b65.1674333199.git.balaton@eik.bme.hu>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reported-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230203194312.33834745712@zero.eik.bme.hu>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
The name is for the region mapping the PHB xscom registers. It was
apparently a bad cut-and-paste from the per-stack pci xscom area just
above, so we had two regions with the same name.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20230127122848.550083-5-fbarrat@linux.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Makes the unimplemented region move together with the CCSR address space
if moved by a bootloader. Moving the CCSR address space isn't
implemented yet but this patch is a preparation for it.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230125130024.158721-5-shentey@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
The "platform" node is available through data->node, so use that instead
of making assumptions about the parent device.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20230125130024.158721-4-shentey@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
This is a follow-up on commit 47a0b1dff7 'hw/ppc/mpc8544ds: Add
platform bus': Both mpc85xx boards now have a platform bus
unconditionally.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20230125130024.158721-3-shentey@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
This enables support for the 'dumpdtb' QMP/HMP command for all
e500 machines.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20230125130024.158721-2-shentey@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
mv64361_pcihost_map_irq() is a reimplementation of
pci_swizzle_map_irq_fn(). Resolve this redundancy.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <20230106113927.8603-1-shentey@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Slightly improve readability of creating the south btidge by cnamging
type of a local variable to avoid some casts within function arguments
which makes some lines shorter and easier to read.
Also remove an unneded line break.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230117214545.5E191746369@zero.eik.bme.hu>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
This moves the command from MAINTAINERS section "QMP" to section
"ACPI/SMBIOS)".
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230124121946.1139465-25-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
This moves these commands from MAINTAINERS section "Human
Monitor (HMP)" to "virtio".
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230124121946.1139465-20-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
This moves these commands from MAINTAINERS section "Human
Monitor (HMP)" to "Rocker" and "Network devices".
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230124121946.1139465-14-armbru@redhat.com>
This moves these commands from MAINTAINERS section "Human
Monitor (HMP)" to "Machine core".
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230124121946.1139465-11-armbru@redhat.com>
This moves these commands from MAINTAINERS section "QMP" to "Machine
core".
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230124121946.1139465-10-armbru@redhat.com>
* Fix physical address resolution for Stage2
* pl011: refactoring, implement reset method
* Support GICv3 with hvf acceleration
* sbsa-ref: remove cortex-a76 from list of supported cpus
* Correct syndrome for ATS12NSO* traps at Secure EL1
* Fix priority of HSTR_EL2 traps vs UNDEFs
* Implement FEAT_FGT for '-cpu max'
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmPdGisZHHBldGVyLm1h
eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3iTND/4qnI00PcqPhdZAD083admx
Tn+7OaTd8aaWHDMvbnV3fNsvAEt//j8DdzeBGDLbgfhBuOCPB8z7oDSr7oqczmys
Yjnh25o6IDUYtMnKR+dBwFKGvAqWwM4UdEllkHJvvM+QpnlH7iu9lCkgYr6PvBYA
h4ajfZ5J7C2OmFJZqsKa2Ot3mveFxos1QzgWSmsWNGTJiZTOCiD7AvuCnEsBBaVP
pESY+5eGjVmjv6ocHxcHG4LA456bHAf6JiCgKqgwowRBlJenpsnNgKleIN4gQA/J
wtfLALNe6FkTV9tzK/MgtO1qOhxkUHrnTrYTtTLmk4H1VryFdDvomYB34zBIgfMY
l1LmMba6UCoxtck13D5jv1xkE56o7Z3kqrhyOvP+aHFdi+dvYQ/z+b8pqUeYeSiu
EbVWa/270JwVdbBT08vfW33Ci9n7fxZtRCrvj2viMgOiQOKwXYEb5AVxM9TRZSKC
Y+1m5frW2HQ+KNvjEyHdMJ8q4nFhaS5Bq2A2RMaQCV2QBuBJvFkGL3ul6M0lw/eq
cAZDKN6H/8N2l2DPcPHUy6RMiqUPSnemvFI814ElKeHGa1V1c7Iw9C4lWAV5Ue5E
gotHC1ros89xV0Eg0gaB9UgX8TgbQUfc3g1g6YUvTCfQdvxL0H1rY+wUWU1h1V2r
VdhxI95gUkgmoVnk8KnwIw==
=hk0j
-----END PGP SIGNATURE-----
Merge tag 'pull-target-arm-20230203' of https://git.linaro.org/people/pmaydell/qemu-arm into staging
target-arm queue:
* Fix physical address resolution for Stage2
* pl011: refactoring, implement reset method
* Support GICv3 with hvf acceleration
* sbsa-ref: remove cortex-a76 from list of supported cpus
* Correct syndrome for ATS12NSO* traps at Secure EL1
* Fix priority of HSTR_EL2 traps vs UNDEFs
* Implement FEAT_FGT for '-cpu max'
# -----BEGIN PGP SIGNATURE-----
#
# iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmPdGisZHHBldGVyLm1h
# eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3iTND/4qnI00PcqPhdZAD083admx
# Tn+7OaTd8aaWHDMvbnV3fNsvAEt//j8DdzeBGDLbgfhBuOCPB8z7oDSr7oqczmys
# Yjnh25o6IDUYtMnKR+dBwFKGvAqWwM4UdEllkHJvvM+QpnlH7iu9lCkgYr6PvBYA
# h4ajfZ5J7C2OmFJZqsKa2Ot3mveFxos1QzgWSmsWNGTJiZTOCiD7AvuCnEsBBaVP
# pESY+5eGjVmjv6ocHxcHG4LA456bHAf6JiCgKqgwowRBlJenpsnNgKleIN4gQA/J
# wtfLALNe6FkTV9tzK/MgtO1qOhxkUHrnTrYTtTLmk4H1VryFdDvomYB34zBIgfMY
# l1LmMba6UCoxtck13D5jv1xkE56o7Z3kqrhyOvP+aHFdi+dvYQ/z+b8pqUeYeSiu
# EbVWa/270JwVdbBT08vfW33Ci9n7fxZtRCrvj2viMgOiQOKwXYEb5AVxM9TRZSKC
# Y+1m5frW2HQ+KNvjEyHdMJ8q4nFhaS5Bq2A2RMaQCV2QBuBJvFkGL3ul6M0lw/eq
# cAZDKN6H/8N2l2DPcPHUy6RMiqUPSnemvFI814ElKeHGa1V1c7Iw9C4lWAV5Ue5E
# gotHC1ros89xV0Eg0gaB9UgX8TgbQUfc3g1g6YUvTCfQdvxL0H1rY+wUWU1h1V2r
# VdhxI95gUkgmoVnk8KnwIw==
# =hk0j
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 03 Feb 2023 14:28:59 GMT
# gpg: using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg: issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [ultimate]
# gpg: aka "Peter Maydell <pmaydell@gmail.com>" [ultimate]
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [ultimate]
# gpg: aka "Peter Maydell <peter@archaic.org.uk>" [ultimate]
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE
* tag 'pull-target-arm-20230203' of https://git.linaro.org/people/pmaydell/qemu-arm: (33 commits)
target/arm: Enable FEAT_FGT on '-cpu max'
target/arm: Implement MDCR_EL2.TDCC and MDCR_EL3.TDCC traps
target/arm: Implement the HFGITR_EL2.SVC_EL0 and SVC_EL1 traps
target/arm: Implement the HFGITR_EL2.ERET trap
target/arm: Mark up sysregs for HFGITR bits 48..63
target/arm: Mark up sysregs for HFGITR bits 18..47
target/arm: Mark up sysregs for HFGITR bits 12..17
target/arm: Mark up sysregs for HFGITR bits 0..11
target/arm: Mark up sysregs for HDFGRTR bits 12..63
target/arm: Mark up sysregs for HDFGRTR bits 0..11
target/arm: Mark up sysregs for HFGRTR bits 36..63
target/arm: Mark up sysregs for HFGRTR bits 24..35
target/arm: Mark up sysregs for HFGRTR bits 12..23
target/arm: Mark up sysregs for HFGRTR bits 0..11
target/arm: Implement FGT trapping infrastructure
target/arm: Define the FEAT_FGT registers
target/arm: Disable HSTR_EL2 traps if EL2 is not enabled
target/arm: Make HSTR_EL2 traps take priority over UNDEF-at-EL1
target/arm: All UNDEF-at-EL0 traps take priority over HSTR_EL2 traps
target/arm: Move do_coproc_insn() syndrome calculation earlier
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Mark up the sysreg definitions for the registers trapped
by HFGRTR/HFGWTR bits 36..63.
Of these, some correspond to RAS registers which we implement as
always-UNDEF: these don't need any extra handling for FGT because the
UNDEF-to-EL1 always takes priority over any theoretical
FGT-trap-to-EL2.
Bit 50 (NACCDATA_EL1) is for the ACCDATA_EL1 register which is part
of the FEAT_LS64_ACCDATA feature which we don't yet implement.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Fuad Tabba <tabba@google.com>
Message-id: 20230130182459.3309057-14-peter.maydell@linaro.org
Message-id: 20230127175507.2895013-14-peter.maydell@linaro.org
Cortex-A76 supports 40bits of address space. sbsa-ref's memory
starts above this limit.
Signed-off-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230126114416.2447685-1-marcin.juszkiewicz@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Let's explicitly list out all accelerators that we support when trying to
determine the supported set of GIC versions. KVM was already separate, so
the only missing one is HVF which simply reuses all of TCG's emulation
code and thus has the same compatibility matrix.
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20221223090107.98888-3-agraf@csgraf.de
[PMM: Added qtest to the list of accelerators]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Up to now, the finalize_gic_version() code open coded what is essentially
a support bitmap match between host/emulation environment and desired
target GIC type.
This open coding leads to undesirable side effects. For example, a VM with
KVM and -smp 10 will automatically choose GICv3 while the same command
line with TCG will stay on GICv2 and fail the launch.
This patch combines the TCG and KVM matching code paths by making
everything a 2 pass process. First, we determine which GIC versions the
current environment is able to support, then we go through a single
state machine to determine which target GIC mode that means for us.
After this patch, the only user noticable changes should be consolidated
error messages as well as TCG -M virt supporting -smp > 8 automatically.
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Message-id: 20221223090107.98888-2-agraf@csgraf.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We currently only support GICv2 emulation. To also support GICv3, we will
need to pass a few system registers into their respective handler functions.
This patch adds support for HVF to call into the TCG callbacks for GICv3
system register handlers. This is safe because the GICv3 TCG code is generic
as long as we limit ourselves to EL0 and EL1 - which are the only modes
supported by HVF.
To make sure nobody trips over that, we also annotate callbacks that don't
work in HVF mode, such as EL state change hooks.
With GICv3 support in place, we can run with more than 8 vCPUs.
Signed-off-by: Alexander Graf <agraf@csgraf.de>
Message-id: 20230128224459.70676-1-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Current FIFO handling code does not reset RXFE/RXFF flags when guest
resets FIFO by writing to UARTLCR register, although internal FIFO state
is reset to 0 read count. Actual guest-visible flag update will happen
only on next data read or write attempt. As a result of that any guest
that expects RXFE flag to be set (and RXFF to be cleared) after resetting
FIFO will never see that happen.
Signed-off-by: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20230123162304.26254-5-eiakovlev@linux.microsoft.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
PL011 currently lacks a reset method. Implement it.
Signed-off-by: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20230123162304.26254-4-eiakovlev@linux.microsoft.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Previous change slightly modified the way we handle data writes when
FIFO is disabled. Previously we kept incrementing read_pos and were
storing data at that position, although we only have a
single-register-deep FIFO now. Then we changed it to always store data
at pos 0.
If guest disables FIFO and the proceeds to read data, it will work out
fine, because we still read from current read_pos before setting it to
0.
However, to make code less fragile, introduce a post_load hook for
PL011State and move fixup read FIFO state when FIFO is disabled. Since
we are introducing a post_load hook, also do some sanity checking on
untrusted incoming input state.
Signed-off-by: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>
Message-id: 20230123162304.26254-3-eiakovlev@linux.microsoft.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
PL011 can be in either of 2 modes depending guest config: FIFO and
single register. The last mode could be viewed as a 1-element-deep FIFO.
Current code open-codes a bunch of depth-dependent logic. Refactor FIFO
depth handling code to isolate calculating current FIFO depth.
One functional (albeit guest-invisible) side-effect of this change is
that previously we would always increment s->read_pos in UARTDR read
handler even if FIFO was disabled, now we are limiting read_pos to not
exceed FIFO depth (read_pos itself is reset to 0 if user disables FIFO).
Signed-off-by: Evgeny Iakovlev <eiakovlev@linux.microsoft.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20230123162304.26254-2-eiakovlev@linux.microsoft.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Use the macro instead of two explicit string literals.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20230124232059.4017615-1-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
- qemu-img info: Show protocol-level information
- Move more functions to coroutines
- Make coroutine annotations ready for static analysis
- qemu-img: Fix exit code for errors closing the image
- qcow2 bitmaps: Fix theoretical corruption in error path
- pflash: Only load non-zero parts of backend image to save memory
- Code cleanup and test case improvements
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmPajLURHGt3b2xmQHJl
ZGhhdC5jb20ACgkQfwmycsiPL9aLjg//bk2uodtEZ1X1y/vU3Lmcqd2wh9gv4f9L
csFFf17rrxce/m+4daVISHAzS+Zrwpgixt+vMm2dP+jQTZOg0G7/rcaRYYAYa29Y
Lepr2Qsz0V6HnNpuvUE5hrXiJXU7w5InikLlnoTnwa2H2Nr/wMlzkPX1wh4OdaBy
5KG/sjGVsaotrIdYjI3HnTvU/eytn1IcvLwqcTP2M7u8UMNyZkALyDjbC5QxBkwh
TPVXNGCeDrD6atDOvsmBCkNM3kTmfsGoP5mYyJK5V6iARYV19Nt8tdmt094EFmHk
VBgeY9y+Q6BctcDe31961+oFqGrsLnT3J7mHDhAoaO0BM8wwWCHfCA7yasmGjCj5
HGE7/UJ8DYwGQ9T9N8gsx8NmsfyWgIcyRQGuzld72B4FTzES9NXS1JTUFAZHrDUl
IIaL5bh8aycBKprDBTwvz07a6sDkvmxiR2G0TuS7kFev5O7+qW9dH517PWOWbsRA
3+ICzsHCUE2GLi83KkRkBEqRW0CnNmA9qzWNdPdQ0egsEAtNqmJGaFPRLYqQ0ZwR
gbu7+eK4kUyfqpqieeFxBY53THLE4yxZ3lcg4yFoQWQfKdTCYo69qUNK5AV1hvKY
TzNAuNbOsipL06dRWy4jInbhzenbiYechyEuoqFv0PpHe1D+JrL8QA2hI/JHDwls
enNpKYXdkn4=
=Wf8w
-----END PGP SIGNATURE-----
Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging
Block layer patches
- qemu-img info: Show protocol-level information
- Move more functions to coroutines
- Make coroutine annotations ready for static analysis
- qemu-img: Fix exit code for errors closing the image
- qcow2 bitmaps: Fix theoretical corruption in error path
- pflash: Only load non-zero parts of backend image to save memory
- Code cleanup and test case improvements
# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmPajLURHGt3b2xmQHJl
# ZGhhdC5jb20ACgkQfwmycsiPL9aLjg//bk2uodtEZ1X1y/vU3Lmcqd2wh9gv4f9L
# csFFf17rrxce/m+4daVISHAzS+Zrwpgixt+vMm2dP+jQTZOg0G7/rcaRYYAYa29Y
# Lepr2Qsz0V6HnNpuvUE5hrXiJXU7w5InikLlnoTnwa2H2Nr/wMlzkPX1wh4OdaBy
# 5KG/sjGVsaotrIdYjI3HnTvU/eytn1IcvLwqcTP2M7u8UMNyZkALyDjbC5QxBkwh
# TPVXNGCeDrD6atDOvsmBCkNM3kTmfsGoP5mYyJK5V6iARYV19Nt8tdmt094EFmHk
# VBgeY9y+Q6BctcDe31961+oFqGrsLnT3J7mHDhAoaO0BM8wwWCHfCA7yasmGjCj5
# HGE7/UJ8DYwGQ9T9N8gsx8NmsfyWgIcyRQGuzld72B4FTzES9NXS1JTUFAZHrDUl
# IIaL5bh8aycBKprDBTwvz07a6sDkvmxiR2G0TuS7kFev5O7+qW9dH517PWOWbsRA
# 3+ICzsHCUE2GLi83KkRkBEqRW0CnNmA9qzWNdPdQ0egsEAtNqmJGaFPRLYqQ0ZwR
# gbu7+eK4kUyfqpqieeFxBY53THLE4yxZ3lcg4yFoQWQfKdTCYo69qUNK5AV1hvKY
# TzNAuNbOsipL06dRWy4jInbhzenbiYechyEuoqFv0PpHe1D+JrL8QA2hI/JHDwls
# enNpKYXdkn4=
# =Wf8w
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 01 Feb 2023 16:00:53 GMT
# gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg: issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* tag 'for-upstream' of https://repo.or.cz/qemu/kevin: (38 commits)
qemu-img: Change info key names for protocol nodes
qemu-img: Let info print block graph
iotests/106, 214, 308: Read only one size line
iotests: Filter child node information
block/qapi: Add indentation to bdrv_node_info_dump()
block/qapi: Introduce BlockGraphInfo
block/qapi: Let bdrv_query_image_info() recurse
qemu-img: Use BlockNodeInfo
block: Split BlockNodeInfo off of ImageInfo
block/vmdk: Change extent info type
block/file: Add file-specific image info
block: Improve empty format-specific info dump
block/nbd: Add missing <qemu/bswap.h> include
block: Rename bdrv_load/save_vmstate() to bdrv_co_load/save_vmstate()
block: Convert bdrv_debug_event() to co_wrapper_mixed
block: Convert bdrv_lock_medium() to co_wrapper
block: Convert bdrv_eject() to co_wrapper
block: Convert bdrv_get_info() to co_wrapper_mixed
block: Convert bdrv_get_allocated_file_size() to co_wrapper
block: use bdrv_co_refresh_total_sectors when possible
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* Remove the deprecated OTP config of sifive_u
* Add libfdt to some of our CI jobs that were still missing it
* Use __builtin_bswap() everywhere (all compiler versions support it now)
* Deprecate the HAXM accelerator
* Document PCI devices handling on s390x
* Make Audiodev introspectable
* Improve the runtime of some CI jobs
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmPY59YRHHRodXRoQHJl
ZGhhdC5jb20ACgkQLtnXdP5wLbXzhxAAmoq2j2sbAf2Vr9tz6Ez2p9oKNYnzUEWb
NGXdvQMcVFKIdjvSYt5ozLC53OFIzuS74X7oHKbdLvGzez3nMCijZIbzN6vNnvd9
HNGum4blNwHEfQcY9hr9y30Iurc7CQu6VtwGF+XXdzQZDbPz1Z4AWvtPTLcTbkxa
PskYJfFvow/oaTHDA/7t+90cxCOixKvQMKXL5ATCtMRGnjlbOAEoPbXUB+yM24mk
9qp1L/8h8pvXfeXlFj+KETmu+eE5ETEOQtqc2KhQqqze2+VMKYxSX2H+sNkJBPDP
En8Mpy+fEdefu8Jcu+M2kMLhf1f3LVf9uARhLZY4/xmOYFg+F3xzwpshnH1bs+Kw
IzWP84uHjE77jSy/wKvYiCx2hdCDwO0G+zym67D1fPzvjzKzUNprV4OIuRzTWah3
6Zli5uuaLrBNjR8SJB1HDmLGKDFgToH9dzfLPtDmW8UPJGkAGcBbPKktLTe5y/4E
del99NqpTx5SAqMmbSMRPZ/vZ7ITdfB0Av3a0GdO8j7eSPb9BOsoZOVD2/iUzab/
P0dBuNqMM8fwywVKqcK+0CJ/npWIJvOqqlwSDqhY1A78G/uRuapOqUwsB/LWRFv5
/1VvHfA2rv4l9o66N5jssS5/D1v5p/UBB6JvlTUvuoJMFTXa9de9XFxYxfkyiaAz
LJl+Dh+aeWk=
=uq7y
-----END PGP SIGNATURE-----
Merge tag 'pull-request-2023-01-31' of https://gitlab.com/thuth/qemu into staging
* qtest improvements
* Remove the deprecated OTP config of sifive_u
* Add libfdt to some of our CI jobs that were still missing it
* Use __builtin_bswap() everywhere (all compiler versions support it now)
* Deprecate the HAXM accelerator
* Document PCI devices handling on s390x
* Make Audiodev introspectable
* Improve the runtime of some CI jobs
# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmPY59YRHHRodXRoQHJl
# ZGhhdC5jb20ACgkQLtnXdP5wLbXzhxAAmoq2j2sbAf2Vr9tz6Ez2p9oKNYnzUEWb
# NGXdvQMcVFKIdjvSYt5ozLC53OFIzuS74X7oHKbdLvGzez3nMCijZIbzN6vNnvd9
# HNGum4blNwHEfQcY9hr9y30Iurc7CQu6VtwGF+XXdzQZDbPz1Z4AWvtPTLcTbkxa
# PskYJfFvow/oaTHDA/7t+90cxCOixKvQMKXL5ATCtMRGnjlbOAEoPbXUB+yM24mk
# 9qp1L/8h8pvXfeXlFj+KETmu+eE5ETEOQtqc2KhQqqze2+VMKYxSX2H+sNkJBPDP
# En8Mpy+fEdefu8Jcu+M2kMLhf1f3LVf9uARhLZY4/xmOYFg+F3xzwpshnH1bs+Kw
# IzWP84uHjE77jSy/wKvYiCx2hdCDwO0G+zym67D1fPzvjzKzUNprV4OIuRzTWah3
# 6Zli5uuaLrBNjR8SJB1HDmLGKDFgToH9dzfLPtDmW8UPJGkAGcBbPKktLTe5y/4E
# del99NqpTx5SAqMmbSMRPZ/vZ7ITdfB0Av3a0GdO8j7eSPb9BOsoZOVD2/iUzab/
# P0dBuNqMM8fwywVKqcK+0CJ/npWIJvOqqlwSDqhY1A78G/uRuapOqUwsB/LWRFv5
# /1VvHfA2rv4l9o66N5jssS5/D1v5p/UBB6JvlTUvuoJMFTXa9de9XFxYxfkyiaAz
# LJl+Dh+aeWk=
# =uq7y
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 31 Jan 2023 10:05:10 GMT
# gpg: using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg: issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg: aka "Thomas Huth <thuth@redhat.com>" [full]
# gpg: aka "Thomas Huth <huth@tuxfamily.org>" [full]
# gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5
* tag 'pull-request-2023-01-31' of https://gitlab.com/thuth/qemu: (27 commits)
gitlab-ci.d/buildtest: Merge the --without-default-* jobs
tests/qtest/display-vga-test: Add proper checks if a device is available
gitlab-ci.d/buildtest: Remove ppc-softmmu from the clang-system job
qapi, audio: Make introspection reflect build configuration more closely
qapi, audio: add query-audiodev command
docs/s390x/pcidevices: document pci devices on s390x
tests/qtest/boot-serial-test: Constify tests[] array
tests/qtest/vnc-display-test: Disable on Darwin
tests/qtest/vnc-display-test: Use the 'none' machine
tests/qtest/vnc-display-test: Suppress build warnings on Windows
tests/tcg: Do not build/run TCG tests if TCG is disabled
docs/about/deprecated: Mark HAXM in QEMU as deprecated
MAINTAINERS: Abort HAXM maintenance
qemu/bswap: Use compiler __builtin_bswap() on NetBSD
qemu/bswap: Use compiler __builtin_bswap() on FreeBSD
qemu/bswap: Use compiler __builtin_bswap() on Haiku
qemu/bswap: Remove <byteswap.h> dependency
qemu/bswap: Replace bswapXXs() by compiler __builtin_bswap()
qemu/bswap: Replace bswapXX() by compiler __builtin_bswap()
tests/docker/dockerfiles: Add libfdt to the i386 and to the riscv64 container
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
BlockDriver->bdrv_getlength is categorized as IO callback, and it
currently doesn't run in a coroutine. We should let it take a graph
rdlock since the callback traverses the block nodes graph, which however
is only possible in a coroutine.
Therefore turn it into a co_wrapper to move the actual function into a
coroutine where the lock can be taken.
Because now this function creates a new coroutine and polls, we need to
take the AioContext lock where it is missing, for the only reason that
internally co_wrapper calls AIO_WAIT_WHILE and it expects to release the
AioContext lock.
This is especially messy when a co_wrapper creates a coroutine and polls
in bdrv_open_driver, because this function has so many callers in so
many context that it can easily lead to deadlocks. Therefore the new
rule for bdrv_open_driver is that the caller must always hold the
AioContext lock of the given bs (except if it is a coroutine), because
the function calls bdrv_refresh_total_sectors() which is now a
co_wrapper.
Once the rwlock is ultimated and placed in every place it needs to be,
we will poll using AIO_WAIT_WHILE_UNLOCKED and remove the AioContext
lock.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20230113204212.359076-7-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This reverts commit a7f523c7d1.
The nested event loop is broken by design. It's only user was removed.
Drop the code as well so that nobody ever tries to use it again.
I had to fix a couple of trivial conflicts around return values because
of 025faa872b ("vhost-user: stick to -errno error return convention").
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20230119172424.478268-3-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
This reverts commit db8a3772e3.
Motivation : this is breaking vhost-user with DPDK as reported in [0].
Received unexpected msg type. Expected 22 received 40
Fail to update device iotlb
Received unexpected msg type. Expected 40 received 22
Received unexpected msg type. Expected 22 received 11
Fail to update device iotlb
Received unexpected msg type. Expected 11 received 22
vhost VQ 1 ring restore failed: -71: Protocol error (71)
Received unexpected msg type. Expected 22 received 11
Fail to update device iotlb
Received unexpected msg type. Expected 11 received 22
vhost VQ 0 ring restore failed: -71: Protocol error (71)
unable to start vhost net: 71: falling back on userspace virtio
The failing sequence that leads to the first error is :
- QEMU sends a VHOST_USER_GET_STATUS (40) request to DPDK on the master
socket
- QEMU starts a nested event loop in order to wait for the
VHOST_USER_GET_STATUS response and to be able to process messages from
the slave channel
- DPDK sends a couple of legitimate IOTLB miss messages on the slave
channel
- QEMU processes each IOTLB request and sends VHOST_USER_IOTLB_MSG (22)
updates on the master socket
- QEMU assumes to receive a response for the latest VHOST_USER_IOTLB_MSG
but it gets the response for the VHOST_USER_GET_STATUS instead
The subsequent errors have the same root cause : the nested event loop
breaks the order by design. It lures QEMU to expect responses to the
latest message sent on the master socket to arrive first.
Since this was only needed for DAX enablement which is still not merged
upstream, just drop the code for now. A working solution will have to
be merged later on. Likely protect the master socket with a mutex
and service the slave channel with a separate thread, as discussed with
Maxime in the mail thread below.
[0] https://lore.kernel.org/qemu-devel/43145ede-89dc-280e-b953-6a2b436de395@redhat.com/
Reported-by: Yanghang Liu <yanghliu@redhat.com>
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2155173
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20230119172424.478268-2-groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Use the proper QOM type definition instead of magic string.
This also helps during eventual refactor while using git-grep.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230117193014.83502-1-philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
The VHOST_USER_ADD/REM_MEM_REG requests should be categorized into
non-vring specific messages, and should be sent only once.
Signed-off-by: Minghao Yuan <yuanmh12@chinatelecom.cn>
Message-Id: <20230123122119.194347-1-yuanmh12@chinatelecom.cn>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Generating slots descriptions populated by non-hotpluggable devices
is akward at best and complicates hotplug path (build_append_pcihp_slots)
needlessly, and builds only dynamic _DSM for such slots which is overlkill.
Clean it up and let non-hotplug path (build_append_pci_bus_devices)
to handle that task.
Such clean up effectively drops dynamic _DSM methods on non-hotpluggable
slots (even though bus itself is hotpluggable), but in practice it
affects only built-in devices (ide controllers/various bridges) that don't
use acpi-index anyways so effectively it doesn't matter (NICs are hotpluggble).
Follow up series will add static _DSM for non-hotpluggble devices/buses
that will not depend on ACPI PCI hotplug at all, and potentially would
allows us to reuse non-hotplug path elsewhere (PBX/microvm/arm-virt),
including new support for acpi-index for non-hotpluggable devices.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230112140312.3096331-40-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
coldplugged bridges are not unpluggable, so there is no need
to describe slots where they are plugged as hotpluggable. To
that effect we have a condition that marks slot as non-hotpluggable
if it's populated by coldplugged bridge and prevents generation
_SUN/_EJ0 objects for it. That leaves dynamic _DSM method on
such slot (which also depends on BSEL and pcihp hardware).
This _DSM method provides only dynamic acpi-index support so far,
which is not actually used/supported by linux kernel for bridges
and it's doubtful there will be need for it at all.
So it's rather pointless to generate acpi-index related AML
for bridges and we can simplify hotplug slots generator a bit
more by completely ignoring coldplugged bridges on hotplug path.
Another point in favor of dropping dynamic _DSM support, is
that we can replace it with static _DSM if necessary since
a slot with bridge can't change during VM runtime and without
any dependency on ACPI PCI hotplug at that.
Later I plan to implement bridge specific static _DSM
PCI Firmware Specification 3.2
4.6.5. _DSM for Ignoring PCI Boot Configurations
part of spec, to fix longstanding issue with fixed IO/MEM
resource assignment that often leads to hotplugged device
being in-operational within the guest due limited IO/MEM
windows programmed on bridge at boot time.
Expected change when coldplugged bridge is ignored by hotplug
code, should look like:
- Scope (S18)
- {
- Name (ASUN, 0x03)
- Method (_DSM, 4, Serialized) // _DSM: Device-Specific Method
- {
- Local0 = Package (0x02)
- {
- BSEL,
- ASUN
- }
- Return (PDSM (Arg0, Arg1, Arg2, Arg3, Local0))
- }
- }
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230112140312.3096331-37-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Split build_append_pci_bus_devices() onto generic part that builds
AML descriptions only for populated slots which is applicable to
both hotplug disabled and enabled bridges. And a hotplug only
part that complements generic AML with hotplug depended bits
(that depend on BSEL), like _SUN/_EJ0 entries, dynamic _DSM.
Hotplug part, will generate full 'Device' descriptors for
non-populated slots (like it used to be) and complementary
'Scope' descriptors for populated slots that are hotplug capable.
i.e. something like this:
- ...
+ Name (BSEL, 0x03)
+ Scope (S00)
+ {
+ Name (ASUN, Zero)
+ Method (_DSM, 4, Serialized) // _DSM: Device-Specific Method
+ {
+ Local0 = Package (0x02)
+ {
+ BSEL,
+ ASUN
+ }
+ Return (PDSM (Arg0, Arg1, Arg2, Arg3, Local0))
+ }
+ [ ... other hotplug depended bits ]
+ }
While generic build_append_pci_bus_devices() still calls hotplug part at
its end it doesn't really depend on any hotplug bits anymore and later
both could be completely separated when it's necessary.
Main benefit though is that both build_append_pci_bus_devices() and
build_append_pcihp_slots() become more readable and it makes easier
to modify them with less risk of affecting another part. Also it opens
possibility to re-use generic part elsewhere (microvm, arm/virt).
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230112140312.3096331-34-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230112140312.3096331-32-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
function doesn't need RW aceess to passed in bus pointer,
make it const.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230112140312.3096331-31-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
simplify build_append_pci_bus_devices() a bit by handling bridge
specific logic in bridge dedicated AcpiDevAmlIfClass::build_dev_aml
callback.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230112140312.3096331-30-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
... so that the concrete impl. won't has to duplicate it
every time. By default it doesn't do anything unless leaf class
defines and sets AcpiDevAmlIfClass::build_dev_aml handler.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230112140312.3096331-29-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Before switching pci bridges to AcpiDevAmlIf interface, ensure that
ignored slots are handled correctly.
(existing rule works but only if bridge doesn't have AcpiDevAmlIf interface).
While at it rewrite related comments to be less confusing (hopefully).
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230112140312.3096331-28-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
count number of PCNT methods that actually call Notify
and if there aren't any, drop PCNT altogether.
It mostly affects 'Q35' tests where there is no root-ports
/bridges attached and 'PC' machine when ACPI PCI hotplug is
completely disabled.
Expected ASL change:
- Method (PCNT, 0, NotSerialized)
- {
- }
...
Method (_E01, 0, NotSerialized) // _Exx: Edge-Triggered GPE
{
- Acquire (\_SB.PCI0.BLCK, 0xFFFF)
- \_SB.PCI0.PCNT ()
- Release (\_SB.PCI0.BLCK)
}
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230112140312.3096331-23-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
it's a stepping stone to making build_append_pci_bus_devices() suitable
for AcpiDevAmlIfClass:build_dev_aml callback and lets further simplify
it by separating PCNT generation from slots descriptions.
It also makes PCNT callchain ASL much more readable since callchain
not longer cluttered by slots descriptors.
Plus, move will let next patch easily drop empty PCNT (pc/q35)
when there is nothing hotpluggable.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230112140312.3096331-22-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
.. and use only BSEL presence to decide on how PCNT should be composed.
That simplifies possible combinations to consider, but mainly it makes
PCIHP AML be governed only by BSEL, which is property of PCIBus
(aka part of bridge) and as result it opens possibility to convert
build_append_pci_bus_devices() into AcpiDevAmlIf::build_dev_aml
callback to make bridges self describing.
PS:
used approach leaves unused PCNT, when ACPI hotplug is completely
disabled but that's harmless and followup commits will get rid of
it later.
Scope (PCI0)
...
Method (PCNT, 0, NotSerialized)
{
}
...
}
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230112140312.3096331-19-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When QEMU is started with hotplugged bridges (think migration):
QEMU -S -monitor stdio \
-device pci-bridge,chassis_nr=1 \
-device pci-bridge,bus=pci.1,addr=1.0,chassis_nr=2
(qemu) device_add pci-bridge,id=hpbr,bus=pci.1,addr=2.0,chassis_nr=3
(qemu) cont
it will generate AML calls to hpbr's PCNT, which doesn't exists
since it's hotplugged bridge. As result DSDT becomes malformed,
with consequences that hotplug might stop working at best or
crash guest OS at worst, when it attempts to call non existing
PCNT method or during OS guest reboot when parsing DSDT again.
IASL de-compiles malformed AML of above config DSDT as:
+ External (_SB_.PCI0.S18_.S10_.PCNT, MethodObj) // Warning: Unknown method, guessing 1 arguments
+ External (_SB_.PCI0.S18_.S19_.PCNT, MethodObj) // Warning: Unknown method, guessing 2 arguments
...
BNUM = One
DVNT (PCIU, One)
DVNT (PCID, 0x03)
- ^S08.PCNT ()
+ ^S19.PCNT (^S10.PCNT (^S08.PCNT ()))
}
}
With BSEL assignment limited only to coldplugged bridges [1],
it should be possible to add PCNT call to a child bridge only
if the child has BSEL property, otherwise ignore it since it's
hotplugged. Which should fix the issue.
1) ("pci: acpihp: assign BSEL only to coldplugged bridges")
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230112140312.3096331-13-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
ACPI PCI hotplug would broken after bridge hotplug and then migration
if hotplugged bridge were specified on target at command line.
Currently it's not possible since, 'hotplugged' property was made
read-only for some time now.
The issue would happen due to BSEL being assigned to all bridges
during 1st 'reset':
source seq:
1. start 'pc' machine => sets BSEL to 0 on pci.0 (host-bridge)
2. hotplug bridge, no bsel is assigned (so far is ok)
target seq:
1. start 'pc' machine with
-S -device pci-bridge,id=hp_br,hotplugged=on
BSEL gets assigned to as follows
hp_br: 0
pci.0: 1
as result hotplug requests with migrated AML generated on source
would be misdirected to 'hp_br' instead of intended pci.0
While it's not issue at the moment, it's based on implicit assumptions
* 'hotplugged' property is read-only
* 1st reset happens before QEMU drops into monitor mode
which lets add hotplugged on source bridges as hotplugged ones
(anything added at that stage counts as hotplugged
(yet another assumption))
All of it looks too fragile to me, so lets restrict BSEL only
to cold-plugged bridges explicitly.
Migration wise it shouldn't break anything since assignment order
stays the same:
* user can't specify 'hotplugged=on' on CLI
* user can't specify 'hotplugged=off' at monitor stage or later
on older QEMU versions where 'hotplugged' is RW, hotplug is broken
after migration anyways and we cannot do anything to fix that.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230112140312.3096331-12-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
piix4_pm_reset() is calling acpi_pcihp_reset() when ACPI PCI hotplug
is disabled, which leads to assigning BSEL properties to bridges on path
acpi_set_bsel()
...
if (qbus_is_hotpluggable(BUS(bus))) {
// above happens to be true by default (though it's SHPC hotplug handler)
// set BSEL
}
At the moment the issue is masked by the fact that we use not only BSEL,
to decide if we should generated hoplug AML but also pcihp_bridge_en knob.
However the later patches will drop dependency on pcihp_bridge_en,
and use only BSEL exclusively to decide if hotplug AML for slots should be built,
which exposes issue.
We should not ever call acpi_pcihp_reset() if ACPI PCI hotplug is disabled,
make it so.
PS:
* Q35 does the right thing (i.e. it calls acpi_pcihp_reset only when pcihp is enabled)
* the issue also makes acpi_pcihp_update() logic run on SHPC enabled bridges,
which seems to be harmless
Fixes: 3d7e78aa77 ("Introduce a new flag for i440fx to disable PCI hotplug on the root bus")
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230112140312.3096331-11-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When ACPI PCI hotplug for Q35 was introduced (6.1), it was implemented
by hiding HPC capability on PCIE slot. That however led to a number of
regressions and to fix it, it was decided to keep HPC cap exposed
in ACPI PCI hotplug case and force guest in ACPI PCI hotplug mode
by other means [1].
That reduced meaning of x-native-hotplug to a compat knob [2] for
broken 6.1 machine type.
Rename property to match its current purpose.
1) 211afe5c69 (hw/i386/acpi-build: Deny control on PCIe Native Hot-plug in _OSC)
2) c318bef762 (hw/acpi/ich9: Add compat prop to keep HPC bit set for 6.1 machine type)
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230112140312.3096331-10-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230112140312.3096331-9-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20230112140312.3096331-8-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The setup_data links are appended to the compressed kernel image. Since
the kernel image is typically loaded at 0x100000, setup_data lives at
`0x100000 + compressed_size`, which does not get relocated during the
kernel's boot process.
The kernel typically decompresses the image starting at address
0x1000000 (note: there's one more zero there than the compressed image
above). This usually is fine for most kernels.
However, if the compressed image is actually quite large, then
setup_data will live at a `0x100000 + compressed_size` that extends into
the decompressed zone at 0x1000000. In other words, if compressed_size
is larger than `0x1000000 - 0x100000`, then the decompression step will
clobber setup_data, resulting in crashes.
Visually, what happens now is that QEMU appends setup_data to the kernel
image:
kernel image setup_data
|--------------------------||----------------|
0x100000 0x100000+l1 0x100000+l1+l2
The problem is that this decompresses to 0x1000000 (one more zero). So
if l1 is > (0x1000000-0x100000), then this winds up looking like:
kernel image setup_data
|--------------------------||----------------|
0x100000 0x100000+l1 0x100000+l1+l2
d e c o m p r e s s e d k e r n e l
|-------------------------------------------------------------|
0x1000000 0x1000000+l3
The decompressed kernel seemingly overwriting the compressed kernel
image isn't a problem, because that gets relocated to a higher address
early on in the boot process, at the end of startup_64. setup_data,
however, stays in the same place, since those links are self referential
and nothing fixes them up. So the decompressed kernel clobbers it.
Fix this by appending setup_data to the cmdline blob rather than the
kernel image blob, which remains at a lower address that won't get
clobbered.
This could have been done by overwriting the initrd blob instead, but
that poses big difficulties, such as no longer being able to use memory
mapped files for initrd, hurting performance, and, more importantly, the
initrd address calculation is hard coded in qboot, and it always grows
down rather than up, which means lots of brittle semantics would have to
be changed around, incurring more complexity. In contrast, using cmdline
is simple and doesn't interfere with anything.
The microvm machine has a gross hack where it fiddles with fw_cfg data
after the fact. So this hack is updated to account for this appending,
by reserving some bytes.
Fixup-by: Michael S. Tsirkin <mst@redhat.com>
Cc: x86@kernel.org
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20221230220725.618763-1-Jason@zx2c4.com>
Message-ID: <20230128061015-mutt-send-email-mst@kernel.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Eric Biggers <ebiggers@google.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>