How this `abort` was intended to check for was:
- if the `mapping->first_mapping_index` is not the same as
`first_mapping_index`, which **should** happen only in one case,
when we are handling the first mapping, in that case
`mapping->first_mapping_index == -1`, in all other cases, the other
mappings after the first should have the condition `true`.
- From above, we know that this is the first mapping, so if the offset
is not `0`, then abort, since this is an invalid state.
The issue was that `first_mapping_index` is not set if we are
checking from the middle, the variable `first_mapping_index` is
only set if we passed through the check `cluster_was_modified` with the
first mapping, and in the same function call we checked the other
mappings.
One approach is to go into the loop even if `cluster_was_modified`
is not true so that we will be able to set `first_mapping_index` for the
first mapping, but since `first_mapping_index` is only used here,
another approach is to just check manually for the
`mapping->first_mapping_index != -1` since we know that this is the
value for the only entry where `offset == 0` (i.e. first mapping).
Signed-off-by: Amjad Alsharafi <amjadsharafi10@gmail.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <b0fbca3ee208c565885838f6a7deeaeb23f4f9c2.1721470238.git.amjadsharafi10@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The field is marked as "the offset in the file (in clusters)", but it
was being used like this
`cluster_size*(nums)+mapping->info.file.offset`, which is incorrect.
Signed-off-by: Amjad Alsharafi <amjadsharafi10@gmail.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <72f19a7903886dda1aa78bcae0e17702ee939262.1721470238.git.amjadsharafi10@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Before this commit, the behavior when calling `commit_one_file` for
example with `offset=0x2000` (second cluster), what will happen is that
we won't fetch the next cluster from the fat, and instead use the first
cluster for the read operation.
This is due to off-by-one error here, where `i=0x2000 !< offset=0x2000`,
thus not fetching the next cluster.
Signed-off-by: Amjad Alsharafi <amjadsharafi10@gmail.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <b97c1e1f1bc2f776061ae914f95d799d124fcd73.1721470238.git.amjadsharafi10@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In the case of scsi-block, RESERVATION_CONFLICT is not a backend error,
but indicates that the guest tried to make a request that it isn't
allowed to execute. Pass the error to the guest so that it can decide
what to do with it.
Without this, if we stop the VM in response to a RESERVATION_CONFLICT
(as is the default policy in management software such as oVirt or
KubeVirt), it can happen that the VM cannot be resumed any more because
every attempt to resume it immediately runs into the same error and
stops the VM again.
One case that expects RESERVATION_CONFLICT errors to be visible in the
guest is running the validation tests in Windows 2019's Failover Cluster
Manager, which intentionally tries to execute invalid requests to see if
they are properly rejected.
Buglink: https://issues.redhat.com/browse/RHEL-50000
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20240731123207.27636-5-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
scsi_block_sgio_complete() has surprising behaviour in that there are
error cases in which it directly completes the request and never calls
the passed callback. In the current state of the code, this doesn't seem
to result in bugs, but with future code changes, we must be careful to
never rely on the callback doing some cleanup until this code smell is
fixed. For now, just add warnings to make people aware of the trap.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20240731123207.27636-4-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of calling into scsi_handle_rw_error() directly from
scsi_block_sgio_complete() and skipping the normal callback, go through
the normal cleanup path by calling the callback with a positive error
value.
The important difference here is not only that the code path is cleaner,
but that the callbacks set r->req.aiocb = NULL. If we skip setting this
and the error action is BLOCK_ERROR_ACTION_STOP, resuming the VM runs
into an assertion failure in scsi_read_data() or scsi_write_data()
because the dangling aiocb pointer is unexpected.
Fixes: a108557bbf ("scsi: inline sg_io_sense_from_errno() into the callers.")
Buglink: https://issues.redhat.com/browse/RHEL-50000
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20240731123207.27636-3-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In some error cases, scsi_block_sgio_complete() never calls the passed
callback, but directly completes the request. This leads to bugs because
its error paths are not exact copies of what the callback would normally
do.
In preparation to fix this, allow passing positive return values to the
callbacks that represent the status code that should be used to complete
the request.
scsi_handle_rw_error() already handles positive values for its ret
parameter because scsi_block_sgio_complete() calls directly into it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20240731123207.27636-2-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Upstream clang 18 (and backports to clang 17 in Fedora and RHEL)
implemented support for __attribute__((cleanup())) in its Thread Safety
Analysis, so we can now actually have a proper implementation of
WITH_GRAPH_RDLOCK_GUARD() that understands when we acquire and when we
release the lock.
-Wthread-safety is now only enabled if the compiler is new enough to
understand this pattern. In theory, we could have used some #ifdefs to
keep the existing basic checks on old compilers, but as long as someone
runs a newer compiler (and our CI does), we will catch locking problems,
so it's probably not worth keeping multiple implementations for this.
The implementation can't use g_autoptr any more because the glib macros
define wrapper functions that don't have the right TSA attributes, so
the compiler would complain about them. Just use the cleanup attribute
directly instead.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20240627181245.281403-3-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The graph lock needs to be held when calling bdrv_co_pdiscard(). Fix
block_copy_task_entry() to take it for the call.
WITH_GRAPH_RDLOCK_GUARD() was implemented in a weak way because of
limitations in clang's Thread Safety Analysis at the time, so that it
only asserts that the lock is held (which allows calling functions that
require the lock), but we never deal with the unlocking (so even after
the scope of the guard, the compiler assumes that the lock is still
held). This is why the compiler didn't catch this locking error.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20240627181245.281403-2-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
BlockdevSnapshotInternal is the arguments type of command
blockdev-snapshot-internal-sync. Its doc comment contains this note:
# .. note:: In a transaction, if @name is empty or any snapshot matching
# @name exists, the operation will fail. Only some image formats
# support it; for example, qcow2, and rbd.
"In a transaction" is misleading, and "if @name is empty or any
snapshot matching @name exists, the operation will fail" is redundant
with the command's Errors documentation. Drop.
The remainder is fine. Move it to the command's doc comment, where it
is more prominently visible, with a slight rephrasing for clarity.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240718123609.3063055-1-armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
A new minor version of OpenSBI was just released after our bump to
OpenSBI 1.5. It contains significant bug fixes that it's worth doing
a new update for QEMU 9.1.
Submodule roms/opensbi 455de672dd..43cace6c36:
> lib: sbi: check result of pmp_get() in is_pmp_entry_mapped()
> lib: sbi: fwft: fix incorrect size passed to sbi_zalloc()
> lib: sbi: dbtr: fix potential NULL pointer dereferences
> include: Adjust Sscofpmf mhpmevent mask for upper 8 bits
> lib: sbi_hsm: Save/restore menvcfg only when it exists
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20240805120259.1705016-2-dbarboza@ventanamicro.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Coverity complained about the possible out-of-bounds access with
counter_virt/counter_virt_prev because these two arrays are
accessed with privilege mode. However, these two arrays are accessed
only when virt is enabled. Thus, the privilege mode can't be M mode.
Add the asserts anyways to detect any wrong usage of these arrays
in the future.
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Fixes: Coverity CID 1558459
Fixes: Coverity CID 1558462
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20240724-fixes-v1-1-4a64596b0d64@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
According to the risc-v specification:
"FLD and FSD are only guaranteed to execute atomically if the effective
address is naturally aligned and XLEN≥64."
We currently implement fld as MO_ATOM_IFALIGN when XLEN < 64, which does
not violate the rules. But it will hide some problems. So relax it to
MO_ATOM_NONE.
Signed-off-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240802072417.659-4-zhiwei_liu@linux.alibaba.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Zama16b loads and stores of no more than MXLEN bits defined in the F, D, and Q
extensions.
Signed-off-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240802072417.659-3-zhiwei_liu@linux.alibaba.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Compressed encodings also applies to zama16b.
https://github.com/riscv/riscv-isa-manual/pull/1557
Suggested-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240802072417.659-2-zhiwei_liu@linux.alibaba.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Fixes higher-half address parsing for QMP commands
`[p]memsave`.
Signed-off-by: Josh Junon <junon@oro.sh>
Message-ID: <20240802140704.13591-1-junon@oro.sh>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Subject tweaked, and one PRId64 updated to PRIu64]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Sweep the entire documentation again. Last done in commit
209e64d9ed (qapi: Refill doc comments to conform to current
conventions).
To check the generated documentation does not change, I compared the
generated HTML before and after this commit with "wdiff -3". Finds no
differences. Comparing with diff is not useful, as the reflown
paragraphs are visible there.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240729065220.860163-1-armbru@redhat.com>
[Straightforward conflict with commit 442110bc6f resolved]
Instead of using a slow implementation to close all open fd after
forking, use qemu_close_all_open_fd().
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240802145423.3232974-6-cleger@rivosinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
In order for this function to be usable by tap.c code, add a list of
file descriptors that should not be closed.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Message-ID: <20240802145423.3232974-5-cleger@rivosinc.com>
[rth: Use max_fd in qemu_close_all_open_fd_close_range]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The same code is used twice to actually close all open file descriptors
after forking. Factorize it in a single place.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240802145423.3232974-4-cleger@rivosinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
In order to make it cleaner, split qemu_close_all_open_fd() logic into
multiple subfunctions (close with close_range(), with /proc/self/fd and
fallback).
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240802145423.3232974-3-cleger@rivosinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Move close_all_open_fds() in oslib-posix, rename it
qemu_close_all_open_fds() and export it.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240802145423.3232974-2-cleger@rivosinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Analyzing qemu-produced core dumps of multi-threaded apps runs into:
(gdb) info threads
[...]
21 Thread 0x3ff83cc0740 (LWP 9295) warning: Couldn't find general-purpose registers in core file.
<unavailable> in ?? ()
The reason is that all pr_pid values are the same, because the same
TaskState is used for all CPUs when generating NT_PRSTATUS notes.
Fix by using TaskStates associated with individual CPUs.
Cc: qemu-stable@nongnu.org
Fixes: 243c470662 ("linux-user/elfload: Write corefile elf header in one block")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240801202340.21845-1-iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Changed val from uint64_t to a pointer to uint64_t in hvf_sysreg_read,
but didn't change its usage in hvf_sysreg_read_cp call.
Fixes: e9e640148c ("hvf: arm: Raise an exception for sysreg by default")
Reported-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240802-hvf-v1-1-e2c0292037e5@daynix.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
When a channel fails to create, the code currently just returns. This
is wrong for two reasons:
1) Channel n+1 will not get to initialize it's semaphores, leading to
an assert when terminate_threads tries to post to it:
qemu-system-x86_64: ../util/qemu-thread-posix.c:92:
qemu_mutex_lock_impl: Assertion `mutex->initialized' failed.
2) (theoretical) If channel n-1 already started creation it will
defeat the purpose of the channels_created logic which is in place
to avoid migrate_fd_cleanup() to run while channels are still being
created.
This cannot really happen today because the current failure cases
for multifd_new_send_channel_create() are all synchronous,
resulting from qio_channel_file_new_path() getting a bad
filename. This would hit all channels equally.
But I don't want to set a trap for future people, so have all
channels try to create (even if failing), and only fail after the
channels_created semaphore has been posted.
While here, remove the error_report_err call. There's one already at
migrate_fd_cleanup later on.
Cc: qemu-stable@nongnu.org
Reported-by: Jim Fehlig <jfehlig@suse.com>
Fixes: b7b03eb614 ("migration/multifd: Add outgoing QIOChannelFile support")
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
The QIOChannelFile object already has its reference decremented by
g_autoptr. Trying to unref an extra time causes:
ERROR:../qom/object.c:1241:object_unref: assertion failed: (obj->ref > 0)
Fixes: a701c03dec ("migration: Drop reference to QIOChannel if file seeking fails")
Fixes: 6d3279655a ("migration: Fix file migration with fdset")
Reported-by: Jim Fehlig <jfehlig@suse.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
The vcek-disabled property of the sev-snp-guest object is misspelled
vcek-required (which I suppose would use the opposite polarity) in
the call to object_class_property_add_bool(). Fix it.
Reported-by: Zixi Chen <zixchen@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
iQEzBAABCAAdFiEEIV1G9IJGaJ7HfzVi7wSWWzmNYhEFAmasTgwACgkQ7wSWWzmN
YhFUtAgAq45v7fQJ7cKKwRam/VrIkxT5cM59ODwzLSL9kPWfL6f/bJ7xM/zvLyvn
LNBXFWWu+eNKA73f95cckZwaqZ4U6giGbiesCACn1IpgVtieLS+Lq78jsifKIAsR
yxFvbT9oLhU0dZ1Up3+isc6V+jeAE4ZYu4KOiIt7PscTEzkJl+vSUjN4X9rRVtUD
PzONUacL6MoTJtX8UZJZXNzLN9JTsN39Gx+LSDGQ27MDmDvE3R9BW+T0ZgF9JQZ7
wnrL5sharqF3gxa7X55fPBI1qwY5gWcH0yyJpRdM8guA13vhtvlrhNSypip9eKWi
HtPHUTKEB5YOvF236WRiuQPIm/GNpA==
=7HGN
-----END PGP SIGNATURE-----
Merge tag 'net-pull-request' of https://github.com/jasowang/qemu into staging
# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEEIV1G9IJGaJ7HfzVi7wSWWzmNYhEFAmasTgwACgkQ7wSWWzmN
# YhFUtAgAq45v7fQJ7cKKwRam/VrIkxT5cM59ODwzLSL9kPWfL6f/bJ7xM/zvLyvn
# LNBXFWWu+eNKA73f95cckZwaqZ4U6giGbiesCACn1IpgVtieLS+Lq78jsifKIAsR
# yxFvbT9oLhU0dZ1Up3+isc6V+jeAE4ZYu4KOiIt7PscTEzkJl+vSUjN4X9rRVtUD
# PzONUacL6MoTJtX8UZJZXNzLN9JTsN39Gx+LSDGQ27MDmDvE3R9BW+T0ZgF9JQZ7
# wnrL5sharqF3gxa7X55fPBI1qwY5gWcH0yyJpRdM8guA13vhtvlrhNSypip9eKWi
# HtPHUTKEB5YOvF236WRiuQPIm/GNpA==
# =7HGN
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 02 Aug 2024 01:10:04 PM AEST
# gpg: using RSA key 215D46F48246689EC77F3562EF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* tag 'net-pull-request' of https://github.com/jasowang/qemu:
net: Reinstate '-net nic, model=help' output as documented in man page
net: update netdev stream man page with the reconnect parameter
net: update netdev dgram man page with unix socket
net: update netdev stream man page with unix socket
net: update netdev stream/dgram man page
virtio-net: Fix network stall at the host side waiting for kick
virtio-net: Ensure queue index fits with RSS
rtl8139: Fix behaviour for old kernels.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
While refactoring the NIC initialization code, I broke '-net nic,model=help'
which no longer outputs a list of available NIC models.
Fixes: 2cdeca04ad ("net: report list of available models according to platform")
Cc: qemu-stable@nongnu.org
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Jason Wang <jasowang@redhat.com>
"-netdev stream" supports a reconnect parameter that attempts to
reconnect automatically the socket if it is disconnected. The code
has been added but the man page has not been updated.
Fixes: 148fbf0d58 ("net: stream: add a new option to automatically reconnect"
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Add the description of "-netdev dgram" with a unix domain socket.
The code has been added but the man page has not been updated.
Fixes: 784e7a2531 ("net: dgram: add unix socket")
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Add the description of "-netdev stream" with a unix domain socket.
The code has been added but the man page has not been updated.
Include an example how to use "-netdev stream" and "passt" in place
of "-netdev user".
("passt" is a non privileged translation proxy between layer-2, like
"-netdev stream", and layer-4 on host, like TCP, UDP, ICMP/ICMPv6 echo)
Fixes: 13c6be9661 ("net: stream: add unix socket")
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Add the description of "-netdev stream" and "-netdev dgram" in the QEMU
manpage.
Add some examples on how to use them.
Fixes: 5166fe0ae4 ("qapi: net: add stream and dgram netdevs")
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Patch 06b1297017 ("virtio-net: fix network stall under load")
added double-check to test whether the available buffer size
can satisfy the request or not, in case the guest has added
some buffers to the avail ring simultaneously after the first
check. It will be lucky if the available buffer size becomes
okay after the double-check, then the host can send the packet
to the guest. If the buffer size still can't satisfy the request,
even if the guest has added some buffers, viritio-net would
stall at the host side forever.
The patch enables notification and checks whether the guest has
added some buffers since last check of available buffers when
the available buffers are insufficient. If no buffer is added,
return false, else recheck the available buffers in the loop.
If the available buffers are sufficient, disable notification
and return true.
Changes:
1. Change the return type of virtqueue_get_avail_bytes() from void
to int, it returns an opaque that represents the shadow_avail_idx
of the virtqueue on success, else -1 on error.
2. Add a new API: virtio_queue_enable_notification_and_check(),
it takes an opaque as input arg which is returned from
virtqueue_get_avail_bytes(). It enables notification firstly,
then checks whether the guest has added some buffers since
last check of available buffers or not by virtio_queue_poll(),
return ture if yes.
The patch also reverts patch "06b12970174".
The case below can reproduce the stall.
Guest 0
+--------+
| iperf |
---------------> | server |
Host | +--------+
+--------+ | ...
| iperf |----
| client |---- Guest n
+--------+ | +--------+
| | iperf |
---------------> | server |
+--------+
Boot many guests from qemu with virtio network:
qemu ... -netdev tap,id=net_x \
-device virtio-net-pci-non-transitional,\
iommu_platform=on,mac=xx:xx:xx:xx:xx:xx,netdev=net_x
Each guest acts as iperf server with commands below:
iperf3 -s -D -i 10 -p 8001
iperf3 -s -D -i 10 -p 8002
The host as iperf client:
iperf3 -c guest_IP -p 8001 -i 30 -w 256k -P 20 -t 40000
iperf3 -c guest_IP -p 8002 -i 30 -w 256k -P 20 -t 40000
After some time, the host loses connection to the guest,
the guest can send packet to the host, but can't receive
packet from the host.
It's more likely to happen if SWIOTLB is enabled in the guest,
allocating and freeing bounce buffer takes some CPU ticks,
copying from/to bounce buffer takes more CPU ticks, compared
with that there is no bounce buffer in the guest.
Once the rate of producing packets from the host approximates
the rate of receiveing packets in the guest, the guest would
loop in NAPI.
receive packets ---
| |
v |
free buf virtnet_poll
| |
v |
add buf to avail ring ---
|
| need kick the host?
| NAPI continues
v
receive packets ---
| |
v |
free buf virtnet_poll
| |
v |
add buf to avail ring ---
|
v
... ...
On the other hand, the host fetches free buf from avail
ring, if the buf in the avail ring is not enough, the
host notifies the guest the event by writing the avail
idx read from avail ring to the event idx of used ring,
then the host goes to sleep, waiting for the kick signal
from the guest.
Once the guest finds the host is waiting for kick singal
(in virtqueue_kick_prepare_split()), it kicks the host.
The host may stall forever at the sequences below:
Host Guest
------------ -----------
fetch buf, send packet receive packet ---
... ... |
fetch buf, send packet add buf |
... add buf virtnet_poll
buf not enough avail idx-> add buf |
read avail idx add buf |
add buf ---
receive packet ---
write event idx ... |
wait for kick add buf virtnet_poll
... |
---
no more packet, exit NAPI
In the first loop of NAPI above, indicated in the range of
virtnet_poll above, the host is sending packets while the
guest is receiving packets and adding buffers.
step 1: The buf is not enough, for example, a big packet
needs 5 buf, but the available buf count is 3.
The host read current avail idx.
step 2: The guest adds some buf, then checks whether the
host is waiting for kick signal, not at this time.
The used ring is not empty, the guest continues
the second loop of NAPI.
step 3: The host writes the avail idx read from avail
ring to used ring as event idx via
virtio_queue_set_notification(q->rx_vq, 1).
step 4: At the end of the second loop of NAPI, recheck
whether kick is needed, as the event idx in the
used ring written by the host is beyound the
range of kick condition, the guest will not
send kick signal to the host.
Fixes: 06b1297017 ("virtio-net: fix network stall under load")
Cc: qemu-stable@nongnu.org
Signed-off-by: Wencheng Yang <east.moutain.yang@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Ensure the queue index points to a valid queue when software RSS
enabled. The new calculation matches with the behavior of Linux's TAP
device with the RSS eBPF program.
Fixes: 4474e37a5b ("virtio-net: implement RX RSS processing")
Reported-by: Zhibin Hu <huzhibin5@huawei.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Old linux kernel rtl8139 drivers (ex. debian 2.1) uses outb to set the rx
mode for RxConfig. Unfortunatelly qemu does not support outb for RxConfig.
Signed-off-by: Hans <sungdgdhtryrt@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
revert virtio pci/SR-IOV emulation at author's request
a couple of fixes in virtio,vtd
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmarSFUPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRp7fwH/3wNCGhgHhF5dhKRKRn8hqhxYl2rXnv0LKYI
Rgsoxh3kw6oKBXxLG/B4V2GkqDSU8q8NuHnvGmmAUQ/uHmwTWbBbrZ+HwMMmaRhT
Ox8kIXiVYAtw24yLKDvyoKbMLjLKb9/QqTT4rbsQ9yl5PLxwoGGJEu/ifM1MbZZY
f5CDtj3hRArIZEjMt0Q3h+G7///BRVZxQ/0de57whGXcr349qgMpiIThvlCOj7Yf
rQ68AGS4yk1Jk0oxiYyWjo43o8JbB5bMnCrkzDy4ZdY5Sw9zGb48CmcrBUl4J9lv
NVDYK63dsvRS0ew7PxaEwu32MIQLJcn5s521m81/ZAhbdyzLnlI=
=/2+K
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
virtio,pci,pc: fixes
revert virtio pci/SR-IOV emulation at author's request
a couple of fixes in virtio,vtd
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmarSFUPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRp7fwH/3wNCGhgHhF5dhKRKRn8hqhxYl2rXnv0LKYI
# Rgsoxh3kw6oKBXxLG/B4V2GkqDSU8q8NuHnvGmmAUQ/uHmwTWbBbrZ+HwMMmaRhT
# Ox8kIXiVYAtw24yLKDvyoKbMLjLKb9/QqTT4rbsQ9yl5PLxwoGGJEu/ifM1MbZZY
# f5CDtj3hRArIZEjMt0Q3h+G7///BRVZxQ/0de57whGXcr349qgMpiIThvlCOj7Yf
# rQ68AGS4yk1Jk0oxiYyWjo43o8JbB5bMnCrkzDy4ZdY5Sw9zGb48CmcrBUl4J9lv
# NVDYK63dsvRS0ew7PxaEwu32MIQLJcn5s521m81/ZAhbdyzLnlI=
# =/2+K
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 01 Aug 2024 06:33:25 PM AEST
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [undefined]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [undefined]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu:
intel_iommu: Fix for IQA reg read dropped DW field
hw/i386/amd_iommu: Don't leak memory in amdvi_update_iotlb()
Revert "hw/pci: Rename has_power to enabled"
Revert "hw/ppc/spapr_pci: Do not create DT for disabled PCI device"
Revert "hw/ppc/spapr_pci: Do not reject VFs created after a PF"
Revert "pcie_sriov: Do not manually unrealize"
Revert "pcie_sriov: Ensure VF function number does not overflow"
Revert "pcie_sriov: Reuse SR-IOV VF device instances"
Revert "pcie_sriov: Release VFs failed to realize"
Revert "pcie_sriov: Remove num_vfs from PCIESriovPF"
Revert "pcie_sriov: Register VFs after migration"
Revert "hw/pci: Fix SR-IOV VF number calculation"
Revert "pcie_sriov: Ensure PF and VF are mutually exclusive"
Revert "pcie_sriov: Check PCI Express for SR-IOV PF"
Revert "pcie_sriov: Allow user to create SR-IOV device"
Revert "virtio-pci: Implement SR-IOV PF"
Revert "virtio-net: Implement SR-IOV VF"
Revert "docs: Document composable SR-IOV device"
virtio-rng: block max-bytes=0
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
In commit ad18376b90 we added an assert that the level value was
in-bounds for the array we're about to index into. However, the
assert condition is wrong -- env->config->interrupt_vector is an
array of uint32_t, so we should bounds check the index against
ARRAY_SIZE(...), not against sizeof().
Resolves: Coverity CID 1507131
Fixes: ad18376b90 ("target/xtensa: Assert that interrupt level is within bounds")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20240731172246.3682311-1-peter.maydell@linaro.org
The FMOPA (widening) SME instruction takes pairs of half-precision
floating point values, widens them to single-precision, does a
two-way dot product and accumulates the results into a
single-precision destination. We don't quite correctly handle the
FPCR bits FZ and FZ16 which control flushing of denormal inputs and
outputs. This is because at the moment we pass a single float_status
value to the helper function, which then uses that configuration for
all the fp operations it does. However, because the inputs to this
operation are float16 and the outputs are float32 we need to use the
fp_status_f16 for the float16 input widening but the normal fp_status
for everything else. Otherwise we will apply the flushing control
FPCR.FZ16 to the 32-bit output rather than the FPCR.FZ control, and
incorrectly flush a denormal output to zero when we should not (or
vice-versa).
(In commit 207d30b5fd we tried to fix the FZ handling but
didn't get it right, switching from "use FPCR.FZ for everything" to
"use FPCR.FZ16 for everything".)
Pass the CPU env to the sme_fmopa_h helper instead of an fp_status
pointer, and have the helper pass an extra fp_status into the
f16_dotadd() function so that we can use the right status for the
right parts of this operation.
Cc: qemu-stable@nongnu.org
Fixes: 207d30b5fd ("target/arm: Use FPST_F16 for SME FMOPA (widening)")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2373
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
If VT-D hardware supports scalable mode, Linux will set the IQA DW field
(bit11). In qemu, the vtd_mem_write and vtd_update_iq_dw set DW field well.
However, vtd_mem_read the DW field wrong because "& VTD_IQA_QS" dropped the
value of DW.
Replace "&VTD_IQA_QS" with "& (VTD_IQA_QS | VTD_IQA_DW_MASK)" could save
the DW field.
Test patch as below:
config the "x-scalable-mode" option:
"-device intel-iommu,caching-mode=on,x-scalable-mode=on,aw-bits=48"
After Linux OS boot, check the IQA_REG DW Field by usage 1 or 2:
1. IOMMU_DEBUGFS:
Before fix:
cat /sys/kernel/debug/iommu/intel/iommu_regset |grep IQA
IQA 0x90 0x00000001001da001
After fix:
cat /sys/kernel/debug/iommu/intel/iommu_regset |grep IQA
IQA 0x90 0x00000001001da801
Check DW field(bit11) is 1.
2. devmem2 read the IQA_REG (offset 0x90):
Before fix:
devmem2 0xfed90090
/dev/mem opened.
Memory mapped at address 0x7f72c795b000.
Value at address 0xFED90090 (0x7f72c795b090): 0x1DA001
After fix:
devmem2 0xfed90090
/dev/mem opened.
Memory mapped at address 0x7fc95281c000.
Value at address 0xFED90090 (0x7fc95281c090): 0x1DA801
Check DW field(bit11) is 1.
Signed-off-by: yeeli <seven.yi.lee@gmail.com>
Message-Id: <20240725031858.1529902-1-seven.yi.lee@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>