Only 2 headers require "exec/tb-context.h". Instead of having
all files including "exec/exec-all.h" also including it, directly
include it where it is required:
- accel/tcg/cpu-exec.c
- accel/tcg/translate-all.c
For plugins/plugin.h, we were implicitly relying on
exec/exec-all.h -> exec/tb-context.h -> qemu/qht.h
which is now included directly.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210524170453.3791436-2-f4bug@amsat.org>
[rth: Fix plugins/plugin.h compilation]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tiny step towards a usable preconfig mode (myself)
* Kconfig and LOCK_GUARD cleanups (philippe)
* new x86 CPUID feature (Yang Zhong)
* "-object qtest" support (myself)
* Dirty ring support for KVM (Peter)
* Fixes for 6.0 command line parsing breakage (myself)
* Fix for macOS 11.3 SDK (Katsuhiro)
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCuRAQUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOL6Qf/bUjQNAUc2QQJya1lu8TEf1o4vjkK
C3EzFPVAj+m2O3OZOGEHcTh8+lDSzBeE2gB3bt4AD+KvFbQGXhLM3gMu/Ztymv8m
3rVEe/NxNyq/CgC307GIwF3in7rEzjH0+WHaOuoU340e3Po1FA7s20VnMysVxxng
4Pf4m4Y0k0eq022HgqZ/r/kbnINxDHagmzuyiFARkt8ooiuj4NyOMW7UKMk3fBvY
MLMPsBe3imWmVnkOF0n/qJ+Svbtx15iLgGIIggshy3rmPereUpIQYaJ9FS6jcXO2
YHuYDc2aGelMU84r+x+9UQra6auzJfc4UbylOsGjopCeFG2aU8rLMphvpw==
=UQwU
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging
* submodule cleanups (Philippe, myself)
* tiny step towards a usable preconfig mode (myself)
* Kconfig and LOCK_GUARD cleanups (philippe)
* new x86 CPUID feature (Yang Zhong)
* "-object qtest" support (myself)
* Dirty ring support for KVM (Peter)
* Fixes for 6.0 command line parsing breakage (myself)
* Fix for macOS 11.3 SDK (Katsuhiro)
# gpg: Signature made Wed 26 May 2021 13:50:12 BST
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* remotes/bonzini-gitlab/tags/for-upstream: (28 commits)
gitlab-ci: use --meson=git for CFI jobs
hw/scsi: Fix sector translation bug in scsi_unmap_complete_noio
configure: Avoid error messages about missing *-config-*.h files
doc: Add notes about -mon option mode=control argument.
qemu-config: load modules when instantiating option groups
vl: allow not specifying size in -m when using -M memory-backend
replication: move include out of root directory
remove qemu-options* from root directory
meson: Set implicit_include_directories to false
tests/qtest/fuzz: Fix build failure
KVM: Dirty ring support
KVM: Disable manual dirty log when dirty ring enabled
KVM: Add dirty-ring-size property
KVM: Cache kvm slot dirty bitmap size
KVM: Simplify dirty log sync in kvm_set_phys_mem
KVM: Provide helper to sync dirty bitmap from slot to ramblock
KVM: Provide helper to get kvm dirty log
KVM: Create the KVMSlot dirty bitmap on flag changes
KVM: Use a big lock to replace per-kml slots_lock
memory: Introduce log_sync_global() to memory listener
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The m68k trace mode is controlled by the top 2 bits in the SR register. Implement
the m68k "any instruction" trace mode where bit T1=1 and bit T0=0 in which the CPU
generates an EXCP_TRACE exception (vector 9 or offset 0x24) after executing each
instruction.
This functionality is used by the NetBSD kernel debugger to allow single-stepping
on m68k architectures.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210519142917.16693-5-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Introduce a new gen_singlestep_exception() function to be called when generating
the EXCP_DEBUG exception in single-step mode rather than calling
gen_raise_exception(EXCP_DEBUG) directly. This allows for the single-step
exception behaviour for all callers to be managed in a single place.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210519142917.16693-4-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
In order to consolidate the single-step exception handling into a single
helper, change gen_jmp_tb() so that it calls gen_raise_exception() directly
instead of gen_exception(). This ensures that all single-step exceptions are
now handled directly by gen_raise_exception().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210519142917.16693-3-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
The m68k translator currently checks the DisasContextBase singlestep_enabled
boolean directly to determine whether to single-step execution. Soon
single-stepping may also be triggered by setting the appropriate bits in the
SR register so centralise the check into a single is_singlestepping()
function.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210519142917.16693-2-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
source side always blocks if postcopy is only enabled at source side.
users are not able to cancel this migration in this case.
Let source side have chance to cancel this migration
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Message-Id: <20210525080552.28259-4-lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Typo fix
destination side:
$ build/qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive if=none,file=./Fedora-rdma-server-migration.qcow2,id=drive-virtio-disk0 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl -spice streaming-video=filter,port=5902,disable-ticketing -incoming rdma:192.168.1.10:8888
(qemu) migrate_set_capability postcopy-ram on
(qemu)
dest_init RDMA Device opened: kernel name rocep1s0f0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/rocep1s0f0, transport: (2) Ethernet
Segmentation fault (core dumped)
(gdb) bt
#0 qemu_rdma_accept (rdma=0x0) at ../migration/rdma.c:3272
#1 rdma_accept_incoming_migration (opaque=0x0) at ../migration/rdma.c:3986
#2 0x0000563c9e51f02a in aio_dispatch_handler
(ctx=ctx@entry=0x563ca0606010, node=0x563ca12b2150) at ../util/aio-posix.c:329
#3 0x0000563c9e51f752 in aio_dispatch_handlers (ctx=0x563ca0606010) at ../util/aio-posix.c:372
#4 aio_dispatch (ctx=0x563ca0606010) at ../util/aio-posix.c:382
#5 0x0000563c9e4f4d9e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ../util/async.c:306
#6 0x00007fe96ef3fa9f in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#7 0x0000563c9e4ffeb8 in glib_pollfds_poll () at ../util/main-loop.c:231
#8 os_host_main_loop_wait (timeout=12188789) at ../util/main-loop.c:254
#9 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:530
#10 0x0000563c9e3c7211 in qemu_main_loop () at ../softmmu/runstate.c:725
#11 0x0000563c9dfd46fe in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:50
The rdma return path will not be created when qemu incoming is starting
since migrate_copy() is false at that moment, then a NULL return path
rdma was referenced if the user enabled postcopy later.
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Message-Id: <20210525080552.28259-3-lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
rdma_freeaddrinfo() is the reverse operation of rdma_getaddrinfo()
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210525080552.28259-2-lizhijian@cn.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
the error path after calling qemu_rdma_dest_init() should do rdma cleanup
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Message-Id: <20210520081148.17001-1-lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
A segmentation fault was triggered when i try to abort a postcopy + rdma
migration.
since rdma_ack_cm_event releases a uninitialized cm_event in these case.
like below:
2496 ret = rdma_get_cm_event(rdma->channel, &cm_event);
2497 if (ret) {
2498 perror("rdma_get_cm_event after rdma_connect");
2499 ERROR(errp, "connecting to destination!");
2500 rdma_ack_cm_event(cm_event); <<<< cause segmentation fault
2501 goto err_rdma_source_connect;
2502 }
Refer to the rdma_get_cm_event() code, cm_event will be
updated/changed only if rdma_get_cm_event() returns 0. So it's okey to
remove the ack in error patch.
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Message-Id: <20210519064740.10828-1-lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Replaced a malloc() call and its respective free() with
GLib's g_try_malloc() and g_free() calls.
Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com>
Message-Id: <20210314032324.45142-8-ma.mandourr@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Replaced a call to calloc() and its respective free() call
with GLib's g_try_new0() and g_free() calls.
Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com>
Message-Id: <20210314032324.45142-7-ma.mandourr@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
There is no reason to set it in label "err". We should be able to set
it right after sending reply. It is easier to read.
Also got rid of label "err" because now only thing it was doing was
return a code. We can return from the error location itself and no
need to first jump to label "err".
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20210518213538.693422-8-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
In virtio_send_data_iov() we are checking first for short read and then
EOF condition. Change the order. Basically check for error and EOF first
and last remaining piece is short ready which will lead to retry
automatically at the end of while loop.
Just that it is little simpler to read to the code. There is no need
to call "continue" and also one less call of "len-=ret".
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20210518213538.693422-7-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
We need to skip bytes in two cases.
a. Before we start reading into in_sg, we need to skip iov_len bytes
in the beginning which typically will have fuse_out_header.
b. If preadv() does a short read, then we need to retry preadv() with
remainig bytes and skip the bytes preadv() read in short read.
For case a, there is no reason that skipping logic be inside the while
loop. Move it outside. And only retain logic "b" inside while loop.
Also get rid of variable "skip_size". Looks like we can do without it.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20210518213538.693422-6-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
in_sg_left seems to be being used primarly for debugging purpose. It is
keeping track of how many bytes are left in the scatter list we are
reading into.
We already have another variable "len" which keeps track how many bytes
are left to be read. And in_sg_left is greater than or equal to len. We
have already ensured that in the beginning of function.
if (in_len < tosend_len) {
fuse_log(FUSE_LOG_ERR, "%s: elem %d too small for data len %zd\n",
__func__, elem->index, tosend_len);
ret = E2BIG;
goto err;
}
So in_sg_left seems like a redundant variable. It probably was useful for
debugging when code was being developed. Get rid of it. It helps simplify
this function.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20210518213538.693422-5-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
There are places where we need to skip few bytes from front of the iovec
array. We have our own custom code for that. Looks like iov_discard_front()
can do same thing. So use that helper instead.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20210518213538.693422-4-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
pvreadv() can return following.
- error
- 0 in case of EOF
- short read
We seem to handle all the cases already. We are retrying read in case
of short read. So another check for short read seems like dead code.
Get rid of it.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20210518213538.693422-3-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
We don't seem to check for EINTR and retry. There are other places
in code where we check for EINTR. So lets add a check.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Message-Id: <20210518213538.693422-2-vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Commit f61fe11aa6 broke hmp_loadvm() by adding an incorrect negation
when converting from 0/-errno return values to a bool value. The result
is that loadvm resumes the VM now if it failed and keeps it stopped if
it failed. Fix it to restore the old behaviour and do it the other way
around.
Fixes: f61fe11aa6
Cc: qemu-stable@nongnu.org
Reported-by: Yanhui Ma <yama@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210511163151.45167-1-kwolf@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Use uint8_t for (unsigned) byte.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210518183655.1711377-7-philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Use uint16_t for (unsigned) 16-bit word.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210518183655.1711377-6-philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Use uint8_t for (unsigned) byte, and uint16_t for (unsigned)
16-bit word.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210518183655.1711377-5-philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Use uint8_t for (unsigned) byte, and uint16_t for (unsigned)
16-bit word.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210518183655.1711377-4-philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
To ease the file review, sort the declarations by the size of
the access (8, 16, 32). Simple code movement, no logical change.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210518183655.1711377-3-philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
To ease the file review, sort the declarations by the size of
the access (8, 16, 32). Simple code movement, no logical change.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210518183655.1711377-2-philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Ensure that the meson submodule is checked out by the check targets,
as they will need it to run "meson test".
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
check_lba_range expects sectors to be expressed in original qdev blocksize, but
scsi_unmap_complete_noio was translating them to 512 block sizes, which was
causing sense errors in the larger LBAs in devices using a 4k block size.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/345
Signed-off-by: Kit Westneat <kit.westneat@gmail.com>
Message-Id: <20210521142829.326217-1-kit.westneat@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When compiling with --disable-system there is a harmless yet still
annoying error message at the end of the "configure" step:
sed: can't read *-config-devices.h: No such file or directory
When only building the tools or docs, without any emulator at all,
there is even an additional message about missing *-config-target.h
files.
Fix it by checking whether any of these files are available before
using them.
Fixes: e0447a834d ("configure: Poison all current target-specific #defines")
Reported-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210519113840.298174-1-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The mode=control argument configures a QMP monitor.
Signed-off-by: Ali Shirvani <alishir@routerhosting.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <0799f0de89ad2482672b5d61d0de61e6eba782da.1621407918.git.alishir@routerhosting.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Right now the SPICE module is special cased to be loaded when processing
of the -spice command line option. However, the spice option group
can also be brought in via -readconfig, in which case the module is
not loaded.
Add a generic hook to load modules that provide a QemuOpts group,
and use it for the "spice" and "iscsi" groups.
Fixes: #194
Fixes: https://bugs.launchpad.net/qemu/+bug/1910696
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Starting in QEMU 6.0's commit f5c9fcb82d ("vl: separate
qemu_create_machine", 2020-12-10), a function have_custom_ram_size()
replaced the return value of set_memory_options().
The purpose of the return value was to record the presence of
"-m size", and if it was not there, change the default RAM
size to the size of the memory backend passed with "-M
memory-backend".
With that commit, however, have_custom_ram_size() is now queried only
after set_memory_options has stored the fixed-up RAM size in QemuOpts for
"future use". This was actually the only future use of the fixed-up RAM
size, so remove that code and fix the bug.
Cc: qemu-stable@nongnu.org
Fixes: f5c9fcb82d ("vl: separate qemu_create_machine", 2020-12-10)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The replication.h file is included from migration/colo.c and tests/unit/test-replication.c,
so it should be in include/.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These headers are also included from softmmu/vl.c, so they should be
in include/. Remove qemu-options-wrapper.h, since elsewhere
we include "template" headers directly and #define the parameters in
the including file; move qemu-options.h to include/.
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Without this, libvixl cannot be compiled with macOS 11.3 SDK due to
include file name conflict (usr/include/c++/v1/version conflicts with
VERSION).
Signed-off-by: Katsuhiro Ueno <uenobk@gmail.com>
Message-Id: <CA+pCdY09+OQfXq3YmRNuQE59ACOq7Py2q4hqOwgq4PnepCXhTA@mail.gmail.com>
Tested-by: Alexander Graf <agraf@csgraf.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On Fedora 32, using clang (version 10.0.1-3.fc32) we get:
tests/qtest/fuzz/fuzz.c:237:5: error: implicit declaration of function 'qemu_init' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
qemu_init(result.we_wordc, result.we_wordv, NULL);
^
qemu_init() is declared in "sysemu/sysemu.h", include this
header to fix.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210513162008.3922223-1-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM dirty ring is a new interface to pass over dirty bits from kernel to the
userspace. Instead of using a bitmap for each memory region, the dirty ring
contains an array of dirtied GPAs to fetch (in the form of offset in slots).
For each vcpu there will be one dirty ring that binds to it.
kvm_dirty_ring_reap() is the major function to collect dirty rings. It can be
called either by a standalone reaper thread that runs in the background,
collecting dirty pages for the whole VM. It can also be called directly by any
thread that has BQL taken.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210506160549.130416-11-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is for KVM_CLEAR_DIRTY_LOG, which is only
useful for KVM_GET_DIRTY_LOG. Skip enabling it for kvm dirty ring.
More importantly, KVM_DIRTY_LOG_INITIALLY_SET will not wr-protect all the pages
initially, which is against how kvm dirty ring is used - there's no way for kvm
dirty ring to re-protect a page before it's notified as being written first
with a GFN entry in the ring! So when KVM_DIRTY_LOG_INITIALLY_SET is enabled
with dirty ring, we'll see silent data loss after migration.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210506160549.130416-10-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a parameter for dirty gfn count for dirty rings. If zero, dirty ring is
disabled. Otherwise dirty ring will be enabled with the per-vcpu gfn count as
specified. If dirty ring cannot be enabled due to unsupported kernel or
illegal parameter, it'll fallback to dirty logging.
By default, dirty ring is not enabled (dirty-gfn-count default to 0).
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210506160549.130416-9-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cache it too because we'll reference it more frequently in the future.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210506160549.130416-8-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm_physical_sync_dirty_bitmap() on the whole section is inaccurate, because
the section can be a superset of the memslot that we're working on. The result
is that if the section covers multiple kvm memslots, we could be doing the
synchronization for multiple times for each kvmslot in the section.
With the two helpers that we just introduced, it's very easy to do it right now
by calling the helpers.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210506160549.130416-7-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm_physical_sync_dirty_bitmap() calculates the ramblock offset in an
awkward way from the MemoryRegionSection that passed in from the
caller. The truth is for each KVMSlot the ramblock offset never
change for the lifecycle. Cache the ramblock offset for each KVMSlot
into the structure when the KVMSlot is created.
With that, we can further simplify kvm_physical_sync_dirty_bitmap()
with a helper to sync KVMSlot dirty bitmap to the ramblock dirty
bitmap of a specific KVMSlot.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210506160549.130416-6-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Provide a helper kvm_slot_get_dirty_log() to make the function
kvm_physical_sync_dirty_bitmap() clearer. We can even cache the as_id
into KVMSlot when it is created, so that we don't even need to pass it
down every time.
Since at it, remove return value of kvm_physical_sync_dirty_bitmap()
because it should never fail.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210506160549.130416-5-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Previously we have two places that will create the per KVMSlot dirty
bitmap:
1. When a newly created KVMSlot has dirty logging enabled,
2. When the first log_sync() happens for a memory slot.
The 2nd case is lazy-init, while the 1st case is not (which is a fix
of what the 2nd case missed).
To do explicit initialization of dirty bitmaps, what we're missing is
to create the dirty bitmap when the slot changed from not-dirty-track
to dirty-track. Do that in kvm_slot_update_flags().
With that, we can safely remove the 2nd lazy-init.
This change will be needed for kvm dirty ring because kvm dirty ring
does not use the log_sync() interface at all.
Also move all the pre-checks into kvm_slot_init_dirty_bitmap().
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210506160549.130416-4-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Per-kml slots_lock will bring some trouble if we want to take all slots_lock of
all the KMLs, especially when we're in a context that we could have taken some
of the KML slots_lock, then we even need to figure out what we've taken and
what we need to take.
Make this simple by merging all KML slots_lock into a single slots lock.
Per-kml slots_lock isn't anything that helpful anyway - so far only x86 has two
address spaces (so, two slots_locks). All the rest archs will be having one
address space always, which means there's actually one slots_lock so it will be
the same as before.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210506160549.130416-3-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some of the memory listener may want to do log synchronization without
being able to specify a range of memory to sync but always globally.
Such a memory listener should provide this new method instead of the
log_sync() method.
Obviously we can also achieve similar thing when we put the global
sync logic into a log_sync() handler. However that's not efficient
enough because otherwise memory_global_dirty_log_sync() may do the
global sync N times, where N is the number of flat ranges in the
address space.
Make this new method be exclusive to log_sync().
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210506160549.130416-2-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The qtest server right now can only be created using the -qtest
and -qtest-log options. Allow an alternative way to create it
using "-object qtest,chardev=...,log=...".
This is part of the long term plan to make more (or all) of
QEMU configurable through QMP and preconfig mode.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>