With the last few changes to the fdset infrastructure, we now allow
multifd to use an fdset when migrating to a file. This is useful for
the scenario where the management layer wants to have control over the
migration file.
By receiving the file descriptors directly, QEMU can delegate some
high level operating system operations to the management layer (such
as mandatory access control). The management layer might also want to
add its own headers before the migration stream.
Document the "file:/dev/fdset/#" syntax for the multifd migration with
mapped-ram. The requirements for the fdset mechanism are:
- the fdset must contain two fds that are not duplicates between
themselves;
- if direct-io is to be used, exactly one of the fds must have the
O_DIRECT flag set;
- the file must be opened with WRONLY on the migration source side;
- the file must be opened with RDONLY on the migration destination
side.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
We're about to enable the use of O_DIRECT in the migration code and
due to the alignment restrictions imposed by filesystems we need to
make sure the flag is only used when doing aligned IO.
The migration will do parallel IO to different regions of a file, so
we need to use more than one file descriptor. Those cannot be obtained
by duplicating (dup()) since duplicated file descriptors share the
file status flags, including O_DIRECT. If one migration channel does
unaligned IO while another sets O_DIRECT to do aligned IO, the
filesystem would fail the unaligned operation.
The add-fd QMP command along with the fdset code are specifically
designed to allow the user to pass a set of file descriptors with
different access flags into QEMU to be later fetched by code that
needs to alternate between those flags when doing IO.
Extend the fdset matching to behave the same with the O_DIRECT flag.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
The tests are only allowed to run in systems that know about the
O_DIRECT flag and in filesystems which support it.
Note: this also brings back migrate_set_parameter_bool() which went
away when we removed the compression tests. I copied it verbatim.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
When multifd is used along with mapped-ram, we can take benefit of a
filesystem that supports the O_DIRECT flag and perform direct I/O in
the multifd threads. This brings a significant performance improvement
because direct-io writes bypass the page cache which would otherwise
be thrashed by the multifd data which is unlikely to be needed again
in a short period of time.
To be able to use a multifd channel opened with O_DIRECT, we must
ensure that a certain aligment is used. Filesystems usually require a
block-size alignment for direct I/O. The way to achieve this is by
enabling the mapped-ram feature, which already aligns its I/O properly
(see MAPPED_RAM_FILE_OFFSET_ALIGNMENT at ram.c).
By setting O_DIRECT on the multifd channels, all writes to the same
file descriptor need to be aligned as well, even the ones that come
from outside multifd, such as the QEMUFile I/O from the main migration
code. This makes it impossible to use the same file descriptor for the
QEMUFile and for the multifd channels. The various flags and metadata
written by the main migration code will always be unaligned by virtue
of their small size. To workaround this issue, we'll require a second
file descriptor to be used exclusively for direct I/O.
The second file descriptor can be obtained by QEMU by re-opening the
migration file (already possible), or by being provided by the user or
management application (support to be added in future patches).
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Add the direct-io migration parameter that tells the migration code to
use O_DIRECT when opening the migration stream file whenever possible.
This is currently only used with the mapped-ram migration that has a
clear window guaranteed to perform aligned writes.
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
We want to make use of the Error object to report fdset errors from
qemu_open_internal() and passing the error pointer to qemu_open_old()
would require changing all callers. Move the file channel to the new
API instead.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
I'm keeping the EACCES because callers expect to be able to look at
errno.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Remove fds right away instead of setting the ->removed flag. We don't
need the extra complexity of having a cleanup function reap the
removed entries at a later time.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
monitor_fdsets_cleanup() currently has three responsibilities:
1- Remove the fds that have been marked for removal(->removed=true) by
qmp_remove_fd(). This is overly complicated, but ok.
2- Remove any file descriptors that have been passed into QEMU and
never duplicated[1,2]. A file descriptor without duplicates
indicates that no part of QEMU has made use of it. This is
problematic because the current implementation does it only if the
guest is not running and the monitor is closed.
3- Remove/free fdsets that have become empty due to the above
removals. This is ok.
The scenario described in (2) is starting to show some cracks now that
we're trying to consume fds from the migration code:
- Doing cleanup every time the last monitor connection closes works to
reap unused fds, but also has the side effect of forcing the
management layer to pass the file descriptors again in case of a
disconnect/re-connect, if that happened to be the only monitor
connection.
Another side effect is that removing an fd with qmp_remove_fd() is
effectively delayed until the last monitor connection closes.
The usage of mon_refcount is also problematic because it's racy.
- Checking runstate_is_running() skips the cleanup unless the VM is
running and avoids premature cleanup of the fds, but also has the
side effect of blocking the legitimate removal of an fd via
qmp_remove_fd() if the VM happens to be in another state.
This affects qmp_remove_fd() and qmp_query_fdsets() in particular
because requesting a removal at a bad time (guest stopped) might
cause an fd to never be removed, or to be removed at a much later
point in time, causing the query command to continue showing the
supposedly removed fd/fdset.
Note that file descriptors that *have* been duplicated are owned by
the code that uses them and will be removed after qemu_close() is
called. Therefore we've decided that the best course of action to
avoid the undesired side-effects is to stop managing non-duplicated
file descriptors.
1- efb87c1697 ("monitor: Clean up fd sets on monitor disconnect")
2- ebe52b592d ("monitor: Prevent removing fd from set during init")
Reviewed-by: Peter Xu <peterx@redhat.com>
[fix logic mistake: s/fdset_free/fdset_free_if_empty]
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Introduce new functions to remove and free no longer used fds and
fdsets.
We need those to decouple the remove/free routines from
monitor_fdset_cleanup() which will go away in the next patches.
The new functions:
- monitor_fdset_free/_if_empty() will be used when a monitor
connection closes and when an fd is removed to cleanup any fdset
that is now empty.
- monitor_fdset_fd_free() will be used to remove one or more fds that
have been explicitly targeted by qmp_remove_fd().
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Those functions are not needed, one remove function should already
work. Clean it up.
Here the code doesn't really care about whether we need to keep that dupfd
around if close() failed: when that happens something got very wrong,
keeping the dup_fd around the fdsets may not help that situation so far.
Cc: Dr. David Alan Gilbert <dave@treblig.org>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[add missing return statement, removal during traversal is not safe]
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Add a test for file migration using fdset. The passing of fds is more
complex than using a file path. This is also the scenario where it's
most important we ensure that the initial migration stream offset is
respected because the fdset interface is the one used by the
management layer when providing a non empty migration file.
Note that fd passing is not available on Windows, so anything that
uses add-fd needs to exclude that platform.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
When doing file migration, QEMU accepts an offset that should be
skipped when writing the migration stream to the file. The purpose of
the offset is to allow the management layer to put its own metadata at
the start of the file.
We have tests for this in migration-test, but only testing that the
migration stream starts at the correct offset and not that it actually
leaves the data intact. Unsurprisingly, there's been a bug in that
area that the tests didn't catch.
Fix the tests to write some data to the offset region and check that
it's actually there after the migration.
While here, switch to using g_get_file_contents() which is more
portable than mmap().
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
When the "file:" migration support was added we missed the special
case in the qemu_open_old implementation that allows for a particular
file name format to be used to refer to a set of file descriptors that
have been previously provided to QEMU via the add-fd QMP command.
When using this fdset feature, we should not truncate the migration
file because being given an fd means that the management layer is in
control of the file and will likely already have some data written to
it. This is further indicated by the presence of the 'offset'
argument, which indicates the start of the region where QEMU is
allowed to write.
Fix the issue by replacing the O_TRUNC flag on open by an ftruncate
call, which will take the offset into consideration.
Fixes: 385f510df5 ("migration: file URI offset")
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
We forgot to drop the reference to the QIOChannel in the error path of
the offset adjustment. Do it now.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
tcg/loongarch64: Fix tcg_out_movi vs some pcrel pointers
util/bufferiszero: Split out host include files
util/bufferiszero: Add loongarch64 vector acceleration
accel/tcg: Fix typo causing tb->page_addr[1] to not be recorded
target/sparc: use signed denominator in sdiv helper
linux-user: Make TARGET_NR_setgroups affect only the current thread
-----BEGIN PGP SIGNATURE-----
iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmZzRoMdHHJpY2hhcmQu
aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV9Y7gf/ZUTGjCUdAO7W7J5e
Z3JLUNOfUHO6PxoE05963XJc+APwKiuL6Yo2bnJo6km7WM50CoaX9/7L9CXD7STg
s3eUJ2p7FfvOADZgO373nqRrB/2mhvoywhDbVJBl+NcRvRUDW8rMqrlSKIAwDIsC
kwwTWlCfpBSlUgm/c6yCVmt815+sGUPD2k/p+pIzAVUG6fGYAosC2fwPzPajiDGX
Q+obV1fryKq2SRR2dMnhmPRtr3pQBBkISLuTX6xNM2+CYhYqhBrAlQaOEGhp7Dx3
ucKjvQFpHgPOSdQxb/HaDv81A20ZUQaydiNNmuKQcTtMx3MsQFR8NyVjH7L+fbS8
JokjaQ==
=yVKz
-----END PGP SIGNATURE-----
Merge tag 'pull-tcg-20240619' of https://gitlab.com/rth7680/qemu into staging
tcg/loongarch64: Support 64- and 256-bit vectors
tcg/loongarch64: Fix tcg_out_movi vs some pcrel pointers
util/bufferiszero: Split out host include files
util/bufferiszero: Add loongarch64 vector acceleration
accel/tcg: Fix typo causing tb->page_addr[1] to not be recorded
target/sparc: use signed denominator in sdiv helper
linux-user: Make TARGET_NR_setgroups affect only the current thread
# -----BEGIN PGP SIGNATURE-----
#
# iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmZzRoMdHHJpY2hhcmQu
# aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV9Y7gf/ZUTGjCUdAO7W7J5e
# Z3JLUNOfUHO6PxoE05963XJc+APwKiuL6Yo2bnJo6km7WM50CoaX9/7L9CXD7STg
# s3eUJ2p7FfvOADZgO373nqRrB/2mhvoywhDbVJBl+NcRvRUDW8rMqrlSKIAwDIsC
# kwwTWlCfpBSlUgm/c6yCVmt815+sGUPD2k/p+pIzAVUG6fGYAosC2fwPzPajiDGX
# Q+obV1fryKq2SRR2dMnhmPRtr3pQBBkISLuTX6xNM2+CYhYqhBrAlQaOEGhp7Dx3
# ucKjvQFpHgPOSdQxb/HaDv81A20ZUQaydiNNmuKQcTtMx3MsQFR8NyVjH7L+fbS8
# JokjaQ==
# =yVKz
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 19 Jun 2024 01:58:43 PM PDT
# gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg: issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [ultimate]
* tag 'pull-tcg-20240619' of https://gitlab.com/rth7680/qemu: (24 commits)
tcg/loongarch64: Fix tcg_out_movi vs some pcrel pointers
target/sparc: use signed denominator in sdiv helper
linux-user: Make TARGET_NR_setgroups affect only the current thread
accel/tcg: Fix typo causing tb->page_addr[1] to not be recorded
util/bufferiszero: Add loongarch64 vector acceleration
util/bufferiszero: Split out host include files
tcg/loongarch64: Enable v256 with LASX
tcg/loongarch64: Support LASX in tcg_out_vec_op
tcg/loongarch64: Split out vdvjukN in tcg_out_vec_op
tcg/loongarch64: Remove temp_vec from tcg_out_vec_op
tcg/loongarch64: Support LASX in tcg_out_{mov,ld,st}
tcg/loongarch64: Split out vdvjvk in tcg_out_vec_op
tcg/loongarch64: Support LASX in tcg_out_addsub_vec
tcg/loongarch64: Simplify tcg_out_addsub_vec
tcg/loongarch64: Support LASX in tcg_out_dupi_vec
tcg/loongarch64: Use tcg_out_dup_vec in tcg_out_dupi_vec
tcg/loongarch64: Support LASX in tcg_out_dupm_vec
tcg/loongarch64: Support LASX in tcg_out_dup_vec
tcg/loongarch64: Simplify tcg_out_dup_vec
util/loongarch64: Detect LASX vector support
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Simplify the logic for two-part, 32-bit pc-relative addresses.
Rather than assume all such fit in int32_t, do some arithmetic
and assert a result, do some arithmetic first and then check
to see if the pieces are in range.
Cc: qemu-stable@nongnu.org
Fixes: dacc51720d ("tcg/loongarch64: Implement tcg_out_mov and tcg_out_movi")
Reviewed-by: Song Gao <gaosong@loongson.cn>
Reported-by: Song Gao <gaosong@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The result has to be done with the signed denominator (b32) instead of
the unsigned value passed in argument (b).
Cc: qemu-stable@nongnu.org
Fixes: 1326010322 ("target/sparc: Remove CC_OP_DIV")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2319
Signed-off-by: Clément Chigot <chigot@adacore.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240606144331.698361-1-chigot@adacore.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Like TARGET_NR_setuid, TARGET_NR_setgroups should affect only the
calling thread, and not the entire process. Therefore, implement it
using a syscall, and not a libc call.
Cc: qemu-stable@nongnu.org
Fixes: 19b84f3c35 ("added setgroups and getgroups syscalls")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20240614154710.1078766-1-iii@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
For TBs crossing page boundaries, the 2nd page will never be
recorded/removed, as the index of the 2nd page is computed from the
address of the 1st page. This is due to a typo, fix it.
Cc: qemu-stable@nongnu.org
Fixes: deba78709a ("accel/tcg: Always lock pages before translation")
Signed-off-by: Anton Johansson <anjo@rev.ng>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20240612133031.15298-1-anjo@rev.ng>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Use inline assembly because no release compiler allows
per-function selection of the ISA.
Tested-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Split out host/bufferiszero.h.inc for x86, aarch64 and generic
in order to avoid an overlong ifdef ladder.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Fixes a bug in the immediate shifts, because the exact
encoding depends on the element size.
Reviewed-by: Song Gao <gaosong@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Use TCG_VEC_TMP0 directly.
Reviewed-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Each element size has a different encoding, so code cannot
be shared in the same way as with tcg_out_dup_vec.
Reviewed-by: Song Gao <gaosong@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We can implement this with fld_d, fst_d for load and store,
and then use the normal v128 operations in registers.
This will improve support for guests which use v64.
Reviewed-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Make the MemOp enum cast explicit to use the QEMU
headers with a C++ compiler.
Signed-off-by: Roman Kiryanov <rkir@google.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240618224528.878425-1-rkir@google.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Boolean return value is reversed, to align with QEMU_ALLOCATED_FLAG, so
all callers must be adapted. Also rename share_surface variable in
vga_draw_graphic() to reduce confusion.
No functional change.
Suggested-by: Marc-André Lureau <marcandre.lureau@gmail.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <20240605131444.797896-4-kraxel@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
In case the display surface uses a shared buffer (i.e. uses vga vram
directly instead of a shadow) go unshare the buffer before clearing it.
This avoids vga memory corruption, which in turn fixes unblanking not
working properly with X11.
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2067
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <20240605131444.797896-2-kraxel@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
This eliminates the polling in cocoa_refresh and implements the
propagation of the mouse mode change from absolute to relative.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Phil Dennis-Jordan <phil@philjordan.eu>
Tested-by: Phil Dennis-Jordan <phil@philjordan.eu>
Message-ID: <20240322-mouse-v1-1-0b7d4d9bdfbf@daynix.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Remove myself from spice and ui entries.
Flip status to "Orphan" for entries which have nobody else listed.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Message-ID: <20240528083858.836262-5-kraxel@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Remove myself from virtio-gpu entries.
Flip status to "Orphan" for entries which have nobody else listed.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Message-ID: <20240528083858.836262-4-kraxel@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Add support for the unix-line-discard readline action, which erases from
the cursor position up to the beginning of the line. The default
binding, C-u, was chosen.
This is useful to quickly erase command input while working on the
monitor interface.
Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <6772067e1c0d4b1c5310e5446e9e3e1c6b3b5bc0.1718265822.git.manos.pitsidianakis@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
C-n and C-p are the default bindings for readline's next-history and
previous-history respectively. They have the same functionality as the
Down and Up arrow keys.
Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <9876594132d1f2e7210ab3f7ca01a82f95206447.1718265822.git.manos.pitsidianakis@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>