With all the facilities ready, send the requested page directly in the
rp-return thread rather than queuing it in the request queue, if and only
if postcopy preempt is enabled. It can achieve so because it uses separate
channel for sending urgent pages. The only shared data is bitmap and it's
protected by the bitmap_mutex.
Note that since we're moving the ownership of the urgent channel from the
migration thread to rp thread it also means the rp thread is responsible
for managing the qemufile, e.g. properly close it when pausing migration
happens. For this, let migration_release_from_dst_file to cover shutdown
of the urgent channel too, renaming it as migration_release_dst_files() to
better show what it does.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Since we use PageSearchStatus to represent a channel, it makes perfect
sense to keep last_sent_block (aka, leverage RAM_SAVE_FLAG_CONTINUE) to be
per-channel rather than global because each channel can be sending
different pages on ramblocks.
Hence move it from RAMState into PageSearchStatus.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
We used to allocate PSS structure on the stack for precopy when sending
pages. Make it static, so as to describe per-channel ram migration status.
Here we declared RAM_CHANNEL_MAX instances, preparing for postcopy to use
it, even though this patch has not yet to start using the 2nd instance.
This should not have any functional change per se, but it already starts to
export PSS information via the RAMState, so that e.g. one PSS channel can
start to reference the other PSS channel.
Always protect PSS access using the same RAMState.bitmap_mutex. We already
do so, so no code change needed, just some comment update. Maybe we should
consider renaming bitmap_mutex some day as it's going to be a more commonly
and big mutex we use for ram states, but just leave it for later.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Helper to init PSS structures.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Introduce pss_channel for PageSearchStatus, define it as "the migration
channel to be used to transfer this host page".
We used to have rs->f, which is a mirror to MigrationState.to_dst_file.
After postcopy preempt initial version, rs->f can be dynamically changed
depending on which channel we want to use.
But that later work still doesn't grant full concurrency of sending pages
in e.g. different threads, because rs->f can either be the PRECOPY channel
or POSTCOPY channel. This needs to be per-thread too.
PageSearchStatus is actually a good piece of struct which we can leverage
if we want to have multiple threads sending pages. Sending a single guest
page may not make sense, so we make the granule to be "host page", and in
the PSS structure we allow specify a QEMUFile* to migrate a specific host
page. Then we open the possibility to specify different channels in
different threads with different PSS structures.
The PSS prefix can be slightly misleading here because e.g. for the
upcoming usage of postcopy channel/thread it's not "searching" (or,
scanning) at all but sending the explicit page that was requested. However
since PSS existed for some years keep it as-is until someone complains.
This patch mostly (simply) replace rs->f with pss->pss_channel only. No
functional change intended for this patch yet. But it does prepare to
finally drop rs->f, and make ram_save_guest_page() thread safe.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Migration code has a lot to do with host pages. Teaching PSS core about
the idea of host page helps a lot and makes the code clean. Meanwhile,
this prepares for the future changes that can leverage the new PSS helpers
that this patch introduces to send host page in another thread.
Three more fields are introduced for this:
(1) host_page_sending: this is set to true when QEMU is sending a host
page, false otherwise.
(2) host_page_{start|end}: these point to the start/end of host page
we're sending, and it's only valid when host_page_sending==true.
For example, when we look up the next dirty page on the ramblock, with
host_page_sending==true, we'll not try to look for anything beyond the
current host page boundary. This can be slightly efficient than current
code because currently we'll set pss->page to next dirty bit (which can be
over current host page boundary) and reset it to host page boundary if we
found it goes beyond that.
With above, we can easily make migration_bitmap_find_dirty() self contained
by updating pss->page properly. rs* parameter is removed because it's not
even used in old code.
When sending a host page, we should use the pss helpers like this:
- pss_host_page_prepare(pss): called before sending host page
- pss_within_range(pss): whether we're still working on the cur host page?
- pss_host_page_finish(pss): called after sending a host page
Then we can use ram_save_target_page() to save one small page.
Currently ram_save_host_page() is still the only user. If there'll be
another function to send host page (e.g. in return path thread) in the
future, it should follow the same style.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
To prepare for thread-safety on page accountings, at least below counters
need to be accessed only atomically, they are:
ram_counters.transferred
ram_counters.duplicate
ram_counters.normal
ram_counters.postcopy_bytes
There are a lot of other counters but they won't be accessed outside
migration thread, then they're still safe to be accessed without atomic
ops.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Don't take the bitmap mutex when sending pages, or when being throttled by
migration_rate_limit() (which is a bit tricky to call it here in ram code,
but seems still helpful).
It prepares for the possibility of concurrently sending pages in >1 threads
using the function ram_save_host_page() because all threads may need the
bitmap_mutex to operate on bitmaps, so that either sendmsg() or any kind of
qemu_sem_wait() blocking for one thread will not block the other from
progressing.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Removing referencing to RAMState.f in compress_page_with_multi_thread() and
flush_compressed_data().
Compression code by default isn't compatible with having >1 channels (or it
won't currently know which channel to flush the compressed data), so to
make it simple we always flush on the default to_dst_file port until
someone wants to add >1 ports support, as rs->f right now can really
change (after postcopy preempt is introduced).
There should be no functional change at all after patch applied, since as
long as rs->f referenced in compression code, it must be to_dst_file.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The 2nd check on RAM_SAVE_FLAG_CONTINUE is a bit redundant. Use a boolean
to be clearer.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The major change is to replace "!save_page_use_compression()" with
"xbzrle_enabled" to make it clear.
Reasonings:
(1) When compression enabled, "!save_page_use_compression()" is exactly the
same as checking "xbzrle_enabled".
(2) When compression disabled, "!save_page_use_compression()" always return
true. We used to try calling the xbzrle code, but after this change we
won't, and we shouldn't need to.
Since at it, drop the xbzrle_enabled check in xbzrle_cache_zero_page()
because with this change it's not needed anymore.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Add the helper to show that postcopy preempt enabled, meanwhile active.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Any call to ram_find_and_save_block() needs to take the bitmap mutex. We
used to not take it for most of ram_save_complete() because we thought
we're the only one left using the bitmap, but it's not true after the
preempt full patchset applied, since the return path can be taking it too.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Leonardo Bras <leobras@redhat.com>
We were recalculating it left and right. We plan to change that
values on next patches.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Leonardo Bras <leobras@redhat.com>
We were calling qemu_target_page_size() left and right.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Fix authorship of commits 266aaedc37~..ac14949821. See commit
3bd2608db7 ("maint: Add .mailmap entries for patches claiming
list authorship") for rationale.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20221208155535.28363-1-philmd@linaro.org>
This reverts commit 14dccc8ea6.
Signed-off-by: Song Gao <gaosong@loongson.cn>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20221205113007.683505-1-gaosong@loongson.cn>
-----BEGIN PGP SIGNATURE-----
iLMEAAEIAB0WIQS4/x2g0v3LLaCcbCxAov/yOSY+3wUCY4nPggAKCRBAov/yOSY+
36cRA/9JFWuDT0TDhu0g1x0ktvpV+1GBPzkEXR2CVhDf2bly1ka2cLEtPUpiSE8E
Osw9cEBR3qX+LyO3gA0GySUr9jsc/yRqD38OL8HGZTCmZ/qCnHJSXvy+6a0LWYQq
ZIrFat7UjiTTeErkSQ6C4bUIl6YoUUSP0X2XxO6YF5j4uhGyqA==
=sVrx
-----END PGP SIGNATURE-----
Merge tag 'pull-loongarch-20221202' of https://gitlab.com/gaosong/qemu into staging
pull for 7.2-rc4
# -----BEGIN PGP SIGNATURE-----
#
# iLMEAAEIAB0WIQS4/x2g0v3LLaCcbCxAov/yOSY+3wUCY4nPggAKCRBAov/yOSY+
# 36cRA/9JFWuDT0TDhu0g1x0ktvpV+1GBPzkEXR2CVhDf2bly1ka2cLEtPUpiSE8E
# Osw9cEBR3qX+LyO3gA0GySUr9jsc/yRqD38OL8HGZTCmZ/qCnHJSXvy+6a0LWYQq
# ZIrFat7UjiTTeErkSQ6C4bUIl6YoUUSP0X2XxO6YF5j4uhGyqA==
# =sVrx
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 02 Dec 2022 05:12:18 EST
# gpg: using RSA key B8FF1DA0D2FDCB2DA09C6C2C40A2FFF239263EDF
# gpg: Good signature from "Song Gao <m17746591750@163.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B8FF 1DA0 D2FD CB2D A09C 6C2C 40A2 FFF2 3926 3EDF
* tag 'pull-loongarch-20221202' of https://gitlab.com/gaosong/qemu:
hw/loongarch/virt: Add cfi01 pflash device
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* fixes for aio cancellation in commands that may issue several
aios
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEUigzqnXi3OaiR2bATeGvMW1PDekFAmOI2uQACgkQTeGvMW1P
Dem6nQgAi8Dm0vhRLoEHqT6FG+VBy0Evpw2QThGE8PxsfzJ1nlwXt6s/NwEc10Uc
d5exp6AR9p37dGJfH82y8EYdEgMeJfsKQRDVMUR4n7eEOW+/Sp4WicO7iamEIWhr
CgRBw1aqU7Im0CHn+3nXu0LKXEtT+tOQrfnr255ELzCxKPZuP3Iw/+nzLQij1G4N
9D9FPPyec+blz+0HuRg12m1ri6TAb2k9CuODuZrqLDCW8Hnl1MVmmYGZrYBy9sPr
Q2zohAjad6R5/+4BCAlusbQ0deoXYKOJdb8J2A9EN73maSqjsHQAagfs+kKxAQK4
ttiy/M/l5EGJG496rZfUJZCnVlOllQ==
=Blzi
-----END PGP SIGNATURE-----
Merge tag 'nvme-next-pull-request' of git://git.infradead.org/qemu-nvme into staging
hw/nvme fixes
* fixes for aio cancellation in commands that may issue several
aios
# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCgAdFiEEUigzqnXi3OaiR2bATeGvMW1PDekFAmOI2uQACgkQTeGvMW1P
# Dem6nQgAi8Dm0vhRLoEHqT6FG+VBy0Evpw2QThGE8PxsfzJ1nlwXt6s/NwEc10Uc
# d5exp6AR9p37dGJfH82y8EYdEgMeJfsKQRDVMUR4n7eEOW+/Sp4WicO7iamEIWhr
# CgRBw1aqU7Im0CHn+3nXu0LKXEtT+tOQrfnr255ELzCxKPZuP3Iw/+nzLQij1G4N
# 9D9FPPyec+blz+0HuRg12m1ri6TAb2k9CuODuZrqLDCW8Hnl1MVmmYGZrYBy9sPr
# Q2zohAjad6R5/+4BCAlusbQ0deoXYKOJdb8J2A9EN73maSqjsHQAagfs+kKxAQK4
# ttiy/M/l5EGJG496rZfUJZCnVlOllQ==
# =Blzi
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 01 Dec 2022 11:48:36 EST
# gpg: using RSA key 522833AA75E2DCE6A24766C04DE1AF316D4F0DE9
# gpg: Good signature from "Klaus Jensen <its@irrelevant.dk>" [unknown]
# gpg: aka "Klaus Jensen <k.jensen@samsung.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: DDCA 4D9C 9EF9 31CC 3468 4272 63D5 6FC5 E55D A838
# Subkey fingerprint: 5228 33AA 75E2 DCE6 A247 66C0 4DE1 AF31 6D4F 0DE9
* tag 'nvme-next-pull-request' of git://git.infradead.org/qemu-nvme:
hw/nvme: remove copy bh scheduling
hw/nvme: fix aio cancel in dsm
hw/nvme: fix aio cancel in zone reset
hw/nvme: fix aio cancel in flush
hw/nvme: fix aio cancel in format
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Fixes regression with migration and vsock, as fixing that
exposes some known issues in vhost user cleanup, this attempts
to fix those as well. More work on vhost user is needed :)
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmOIWaEPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRp+RQH/2PVAjD/GA3zF5F3Z07vH51c55T6tluZ85c3
4u66SSkF5JR1hATCujYCtrt9V0mnqhmhhm4gJH5xcsynFjjyIXd2dDrTFRpCtRgn
icXOmYCc9pCu8XsluJnWvY/5r/KEDxqmGVE8Kyhz551QjvsBkezhI9x9vhJZJLCn
Xn1XQ/3jpUcQLwasu8AxZb0IDW8WdCtonbke6xIyMzOYGR2bnRdXlDXVVG1zJ/SZ
eS3HUad71VekhfzWq0fx8yEJnfvbes9vo007y8rOGdHOcMneWGAie52W1dOBhclh
Zt56zID55t1USEwlPxkZSj7UXNbVl7Uz/XU5ElN0yTesttP4Iq0=
=ZkaX
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
virtio: regression fix
Fixes regression with migration and vsock, as fixing that
exposes some known issues in vhost user cleanup, this attempts
to fix those as well. More work on vhost user is needed :)
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmOIWaEPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRp+RQH/2PVAjD/GA3zF5F3Z07vH51c55T6tluZ85c3
# 4u66SSkF5JR1hATCujYCtrt9V0mnqhmhhm4gJH5xcsynFjjyIXd2dDrTFRpCtRgn
# icXOmYCc9pCu8XsluJnWvY/5r/KEDxqmGVE8Kyhz551QjvsBkezhI9x9vhJZJLCn
# Xn1XQ/3jpUcQLwasu8AxZb0IDW8WdCtonbke6xIyMzOYGR2bnRdXlDXVVG1zJ/SZ
# eS3HUad71VekhfzWq0fx8yEJnfvbes9vo007y8rOGdHOcMneWGAie52W1dOBhclh
# Zt56zID55t1USEwlPxkZSj7UXNbVl7Uz/XU5ElN0yTesttP4Iq0=
# =ZkaX
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 01 Dec 2022 02:37:05 EST
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu:
include/hw: VM state takes precedence in virtio_device_should_start
hw/virtio: generalise CHR_EVENT_CLOSED handling
hw/virtio: add started_vu status field to vhost-user-gpio
vhost: enable vrings in vhost_dev_start() for vhost-user devices
tests/qtests: override "force-legacy" for gpio virtio-mmio tests
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The SET ADDRESS SPACE CONTROL FAST instruction is not privileged, it can be
used from problem space, too. Just the switching to the home address space
is privileged and should still generate a privilege exception. This bug is
e.g. causing programs like Java that use the "getcpu" vdso kernel function
to crash (see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=990417#26 ).
While we're at it, also check if DAT is not enabled. In that case the
instruction is supposed to generate a special operation exception.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/655
Message-Id: <20221201184443.136355-1-thuth@redhat.com>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
When running the migration test compiled with Clang from Fedora 37
and sanitizers enabled, there is an error complaining about unlink():
../tests/qtest/migration-test.c:1072:12: runtime error: null pointer
passed as argument 1, which is declared to never be null
/usr/include/unistd.h:858:48: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
../tests/qtest/migration-test.c:1072:12 in
(test program exited with status code 1)
TAP parsing error: Too few tests run (expected 33, got 20)
The data->clientcert and data->clientkey pointers can indeed be unset
in some tests, so we have to check them before calling unlink() with
those.
While we're at it, I also noticed that the code is only freeing
some but not all of the allocated strings in this function, and
indeed, valgrind is also complaining about memory leaks here.
So let's call g_free() on all allocated strings to avoid leaking
memory here.
Message-Id: <20221125083054.117504-1-thuth@redhat.com>
Tested-by: Bin Meng <bmeng@tinylab.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
In get_physical_address, the canonical address check failed to
set TranslateFault.stage2, which resulted in an uninitialized
read from the struct when reporting the fault in x86_cpu_tlb_fill.
Adjust all error paths to use structure assignment so that the
entire struct is always initialized.
Reported-by: Daniel Hoffman <dhoff749@gmail.com>
Fixes: 9bbcf37219 ("target/i386: Reorg GET_HPHYS")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221201074522.178498-1-richard.henderson@linaro.org>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1324
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
MMX state is saved/restored by FSAVE/FRSTOR so the instructions are
not illegal opcodes even if CR4.OSFXSR=0. Make sure that validate_vex
takes into account the prefix and only checks HF_OSFXSR_MASK in the
presence of an SSE instruction.
Fixes: 20581aadec ("target/i386: validate VEX prefixes via the instructions' exception classes", 2022-10-18)
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1350
Reported-by: Helge Konetzka (@hejko on gitlab.com)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fix a potential use-after-free by removing the bottom half and enqueuing
the completion directly.
Fixes: 796d20681d ("hw/nvme: reimplement the copy command to allow aio cancellation")
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
When the DSM operation is cancelled asynchronously, we set iocb->ret to
-ECANCELED. However, the callback function only checks the return value
of the completed aio, which may have completed succesfully prior to the
cancellation and thus the callback ends up continuing the dsm operation
instead of bailing out. Fix this.
Secondly, fix a potential use-after-free by removing the bottom half and
enqueuing the completion directly.
Fixes: d7d1474fd8 ("hw/nvme: reimplement dsm to allow cancellation")
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
If the zone reset operation is cancelled but the block unmap operation
completes normally, the callback will continue resetting the next zone
since it neglects to check iocb->ret which will have been set to
-ECANCELED. Make sure that this is checked and bail out if an error is
present.
Secondly, fix a potential use-after-free by removing the bottom half and
enqueuing the completion directly.
Fixes: 63d96e4ffd ("hw/nvme: reimplement zone reset to allow cancellation")
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Make sure that iocb->aiocb is NULL'ed when cancelling.
Fix a potential use-after-free by removing the bottom half and enqueuing
the completion directly.
Fixes: 38f4ac65ac ("hw/nvme: reimplement flush to allow cancellation")
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
There are several bugs in the async cancel code for the Format command.
Firstly, cancelling a format operation neglects to set iocb->ret as well
as clearing the iocb->aiocb after cancelling the underlying aiocb which
causes the aio callback to ignore the cancellation. Trivial fix.
Secondly, and worse, because the request is queued up for posting to the
CQ in a bottom half, if the cancellation is due to the submission queue
being deleted (which calls blk_aio_cancel), the req structure is
deallocated in nvme_del_sq prior to the bottom half being schedulued.
Fix this by simply removing the bottom half, there is no reason to defer
it anyway.
Fixes: 3bcf26d3d6 ("hw/nvme: reimplement format nvm to allow cancellation")
Reported-by: Jonathan Derrick <jonathan.derrick@linux.dev>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
The VM status should always preempt the device status for these
checks. This ensures the device is in the correct state when we
suspend the VM prior to migrations. This restores the checks to the
order they where in before the refactoring moved things around.
While we are at it lets improve our documentation of the various
fields involved and document the two functions.
Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to virtio_device_started)
Fixes: 259d69c00b (hw/virtio: introduce virtio_device_should_start)
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20221130112439.2527228-6-alex.bennee@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
..and use for both virtio-user-blk and virtio-user-gpio. This avoids
the circular close by deferring shutdown due to disconnection until a
later point. virtio-user-blk already had this mechanism in place so
generalise it as a vhost-user helper function and use for both blk and
gpio devices.
While we are at it we also fix up vhost-user-gpio to re-establish the
event handler after close down so we can reconnect later.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20221130112439.2527228-5-alex.bennee@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
As per the fix to vhost-user-blk in f5b22d06fb (vhost: recheck dev
state in the vhost_migration_log routine) we really should track the
connection and starting separately.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20221130112439.2527228-4-alex.bennee@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Commit 02b61f38d3 ("hw/virtio: incorporate backend features in features")
properly negotiates VHOST_USER_F_PROTOCOL_FEATURES with the vhost-user
backend, but we forgot to enable vrings as specified in
docs/interop/vhost-user.rst:
If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the
ring starts directly in the enabled state.
If ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, the ring is
initialized in a disabled state and is enabled by
``VHOST_USER_SET_VRING_ENABLE`` with parameter 1.
Some vhost-user front-ends already did this by calling
vhost_ops.vhost_set_vring_enable() directly:
- backends/cryptodev-vhost.c
- hw/net/virtio-net.c
- hw/virtio/vhost-user-gpio.c
But most didn't do that, so we would leave the vrings disabled and some
backends would not work. We observed this issue with the rust version of
virtiofsd [1], which uses the event loop [2] provided by the
vhost-user-backend crate where requests are not processed if vring is
not enabled.
Let's fix this issue by enabling the vrings in vhost_dev_start() for
vhost-user front-ends that don't already do this directly. Same thing
also in vhost_dev_stop() where we disable vrings.
[1] https://gitlab.com/virtio-fs/virtiofsd
[2] https://github.com/rust-vmm/vhost/blob/240fc2966/crates/vhost-user-backend/src/event_loop.rs#L217
Fixes: 02b61f38d3 ("hw/virtio: incorporate backend features in features")
Reported-by: German Maglione <gmaglione@redhat.com>
Tested-by: German Maglione <gmaglione@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20221123131630.52020-1-sgarzare@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20221130112439.2527228-3-alex.bennee@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The GPIO device is a VIRTIO_F_VERSION_1 devices but running with a
legacy MMIO interface we miss out that feature bit causing confusion.
For the GPIO test force the mmio bus to support non-legacy so we can
properly test it.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1333
Message-Id: <20221130112439.2527228-2-alex.bennee@linaro.org>
Acked-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
bdrv_*() APIs expect a valid BlockDriverState. Calling them with bs=NULL
leads to undefined behavior.
Jonathan Cameron reported this following NULL pointer dereference when a
VM with a virtio-blk device and a memory-backend-file object is
terminated:
1. qemu_cleanup() closes all drives, setting blk->root to NULL
2. qemu_cleanup() calls user_creatable_cleanup(), which results in a RAM
block notifier callback because the memory-backend-file is destroyed.
3. blk_unregister_buf() is called by virtio-blk's BlockRamRegistrar
notifier callback and undefined behavior occurs.
Fixes: baf422684d ("virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint")
Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20221121211923.1993171-1-stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20221128202741.4945-6-philmd@linaro.org>
Have qxl_get_check_slot_offset() return false if the requested
buffer size does not fit within the slot memory region.
Similarly qxl_phys2virt() now returns NULL in such case, and
qxl_dirty_one_surface() aborts.
This avoids buffer overrun in the host pointer returned by
memory_region_get_ram_ptr().
Fixes: CVE-2022-4144 (out-of-bounds read)
Reported-by: Wenxu Yin (@awxylitol)
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1336
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20221128202741.4945-5-philmd@linaro.org>
Currently qxl_phys2virt() doesn't check for buffer overrun.
In order to do so in the next commit, pass the buffer size
as argument.
For QXLCursor in qxl_render_cursor() -> qxl_cursor() we
verify the size of the chunked data ahead, checking we can
access 'sizeof(QXLCursor) + chunk->data_size' bytes.
Since in the SPICE_CURSOR_TYPE_MONO case the cursor is
assumed to fit in one chunk, no change are required.
In SPICE_CURSOR_TYPE_ALPHA the ahead read is handled in
qxl_unpack_chunks().
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20221128202741.4945-4-philmd@linaro.org>