In the case of scsi-block, RESERVATION_CONFLICT is not a backend error,
but indicates that the guest tried to make a request that it isn't
allowed to execute. Pass the error to the guest so that it can decide
what to do with it.
Without this, if we stop the VM in response to a RESERVATION_CONFLICT
(as is the default policy in management software such as oVirt or
KubeVirt), it can happen that the VM cannot be resumed any more because
every attempt to resume it immediately runs into the same error and
stops the VM again.
One case that expects RESERVATION_CONFLICT errors to be visible in the
guest is running the validation tests in Windows 2019's Failover Cluster
Manager, which intentionally tries to execute invalid requests to see if
they are properly rejected.
Buglink: https://issues.redhat.com/browse/RHEL-50000
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20240731123207.27636-5-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
scsi_block_sgio_complete() has surprising behaviour in that there are
error cases in which it directly completes the request and never calls
the passed callback. In the current state of the code, this doesn't seem
to result in bugs, but with future code changes, we must be careful to
never rely on the callback doing some cleanup until this code smell is
fixed. For now, just add warnings to make people aware of the trap.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20240731123207.27636-4-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of calling into scsi_handle_rw_error() directly from
scsi_block_sgio_complete() and skipping the normal callback, go through
the normal cleanup path by calling the callback with a positive error
value.
The important difference here is not only that the code path is cleaner,
but that the callbacks set r->req.aiocb = NULL. If we skip setting this
and the error action is BLOCK_ERROR_ACTION_STOP, resuming the VM runs
into an assertion failure in scsi_read_data() or scsi_write_data()
because the dangling aiocb pointer is unexpected.
Fixes: a108557bbf ("scsi: inline sg_io_sense_from_errno() into the callers.")
Buglink: https://issues.redhat.com/browse/RHEL-50000
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20240731123207.27636-3-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In some error cases, scsi_block_sgio_complete() never calls the passed
callback, but directly completes the request. This leads to bugs because
its error paths are not exact copies of what the callback would normally
do.
In preparation to fix this, allow passing positive return values to the
callbacks that represent the status code that should be used to complete
the request.
scsi_handle_rw_error() already handles positive values for its ret
parameter because scsi_block_sgio_complete() calls directly into it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20240731123207.27636-2-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reset config values in the device RESET phase, not only once
when the device is realized, because otherwise the device can
use unknown values at reset.
Since we are adding a new reset method, use the preferred
Resettable API (for a simple leaf device reset, a
DeviceClass::reset method and a ResettableClass::reset_hold
method are essentially identical).
Reported-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20240802213122.86852-3-philmd@linaro.org>
When booting Linux we see:
PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [mem 0x10000000-0x17ffffff]
pci_bus 0000:00: root bus resource [io 0x1000-0x1fffff]
pci_bus 0000:00: No busn resource found for root bus, will use [bus 00-ff]
pci 0000:00:00.0: [11ab:4620] type 00 class 0x060000
pci 0000:00:00.0: [Firmware Bug]: reg 0x14: invalid BAR (can't size)
pci 0000:00:00.0: [Firmware Bug]: reg 0x18: invalid BAR (can't size)
pci 0000:00:00.0: [Firmware Bug]: reg 0x1c: invalid BAR (can't size)
pci 0000:00:00.0: [Firmware Bug]: reg 0x20: invalid BAR (can't size)
pci 0000:00:00.0: [Firmware Bug]: reg 0x24: invalid BAR (can't size)
This is due to missing base address register write mask.
Add it to get:
PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [mem 0x10000000-0x17ffffff]
pci_bus 0000:00: root bus resource [io 0x1000-0x1fffff]
pci_bus 0000:00: No busn resource found for root bus, will use [bus 00-ff]
pci 0000:00:00.0: [11ab:4620] type 00 class 0x060000
pci 0000:00:00.0: reg 0x10: [mem 0x00000000-0x00000fff pref]
pci 0000:00:00.0: reg 0x14: [mem 0x01000000-0x01000fff pref]
pci 0000:00:00.0: reg 0x18: [mem 0x1c000000-0x1c000fff]
pci 0000:00:00.0: reg 0x1c: [mem 0x1f000000-0x1f000fff]
pci 0000:00:00.0: reg 0x20: [mem 0x1be00000-0x1be00fff]
pci 0000:00:00.0: reg 0x24: [io 0x14000000-0x14000fff]
Since this device is only used by MIPS machines which aren't
versioned, we don't need to update migration compat machinery.
Mention the datasheet referenced. Remove the "Malta assumptions
ahead" comment since the reset values from the datasheet are used.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <20240802213122.86852-2-philmd@linaro.org>
If the user tries to use the virtio-gpu-gl device but the display
backend doesn't have OpenGL support enabled, we currently print a
rather uninformative error message:
$ qemu-system-aarch64 -M virt -device virtio-gpu-gl
qemu-system-aarch64: -device virtio-gpu-gl: opengl is not available
Since OpenGL is not enabled on display frontends by default, users
are quite likely to run into this. Improve the error message to
be more specific and to suggest to the user a path forward.
Note that the case of "user tried to enable OpenGL but the display
backend doesn't handle it" is caught elsewhere first, so we can
assume that isn't the problem:
$ qemu-system-aarch64 -M virt -device virtio-gpu-gl -display curses,gl=on
qemu-system-aarch64: OpenGL is not supported by the display
(Use of error_append_hint() requires us to add an ERRP_GUARD() to
the function, as noted in include/qapi/error.h.)
With this commit we now produce the hopefully more helpful error:
$ ./build/x86/qemu-system-aarch64 -M virt -device virtio-gpu-gl
qemu-system-aarch64: -device virtio-gpu-gl: The display backend does not have OpenGL support enabled
It can be enabled with '-display BACKEND,gl=on' where BACKEND is the name of the display backend to use.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2443
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-ID: <20240731154136.3494621-2-peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Coverity notes that the code at the end of the loop in
bmdma_prepare_buf() is unreachable. This is because in commit
9fbf0fa81f ("ide: remove hardcoded 2GiB transactional limit")
we removed the only codepath in the loop which could "break" out of
it, but didn't notice that this meant we should also remove the code
at the end of the loop.
Remove the dead code.
Resolves: Coverity CID 1547772
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
[PMD: Break and return once at EOF]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240805182419.22239-1-philmd@linaro.org>
Coverity complains about an overflow in isa_fdc_get_drive_max_chs()
that can happen if the loop over fd_formats never finds a match,
because we initialize *maxc to 0 and then at the end of the
function decrement it.
This can't ever actually happen because fd_formats has at least
one entry for each FloppyDriveType, so we must at least once
find a match and update *maxc, *maxh and *maxs. Assert that we
did find a match, which should keep Coverity happy and will also
detect possible bugs in the data in fd_formats.
Resolves: Coverity CID 1547663
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240731143617.3391947-6-peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
In ide_atapi_cmd_reply_end() we calculate a 16-bit size, and then
assign its two halves to s->lcyl and s->hcyl like this:
s->lcyl = size;
s->hcyl = size >> 8;
Coverity warns that the first line here can overflow the
8-bit s->lcyl variable. This is true, and in this case we're
deliberately only after the low 8 bits of the value. The
code is clearer to both humans and Coverity if we're explicit
that we only wanted the low 8 bits, though.
Resolves: Coverity CID 1547621
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20240731143617.3391947-5-peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
In pflash_write() Coverity points out that we can decrement the
unsigned pfl->counter below zero, which makes it wrap around. In
fact this is harmless, because if pfl->counter is 0 at this point we
also increment pfl->wcycle to 3, and the wcycle == 3 handling doesn't
look at counter; the only way back into code which looks at the
counter value is via wcycle == 1, which will reinitialize the counter.
But it's arguably a little clearer to break early in the "counter ==
0" if(), to avoid the decrement-below-zero.
Resolves: Coverity CID 1547611
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20240731143617.3391947-4-peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
We neglected to clear the @data_count index on ADMA error,
allowing to trigger assertion in sdhci_read_dataport() or
sdhci_write_dataport().
Cc: qemu-stable@nongnu.org
Fixes: d7dfca0807 ("hw/sdhci: introduce standard SD host controller")
Reported-by: Zheyu Ma <zheyuma97@gmail.com>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2455
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240730092138.32443-4-philmd@linaro.org>
Guest should not try to read the DAT lines from invalid
command state. If it still insists to do so, return a
dummy value.
Cc: qemu-stable@nongnu.org
Fixes: e2dec2eab0 ("hw/sd/sdcard: Remove default case in read/write on DAT lines")
Reported-by: Zheyu Ma <zheyuma97@gmail.com>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2454
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240730092138.32443-3-philmd@linaro.org>
On error the DAT lines are left unmodified to their
previous states. QEMU returns 0x00 for convenience.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240730092138.32443-2-philmd@linaro.org>
Now than LoongArch target can use the TYPE_LOONGARCH_IPI
model, restrict TYPE_LOONGSON_IPI to MIPS.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Tested-by: Bibo Mao <maobibo@loongson.cn>
Acked-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <20240805180622.21001-15-philmd@linaro.org>
Loongarch IPI inherits from class LoongsonIPICommonClass, and it
only contains Loongarch 3A5000 virt machine specific interfaces,
rather than mix different machine implementations together.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
[PMD: Rebased]
Co-Developed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Tested-by: Bibo Mao <maobibo@loongson.cn>
Acked-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <20240805180622.21001-14-philmd@linaro.org>
Loongarch IPI is added here, it inherits from class
TYPE_LOONGSON_IPI_COMMON, and two interfaces get_iocsr_as() and
cpu_by_arch_id() are added for Loongarch 3A5000 machine. It can
be used when ipi is emulated in userspace with KVM mode.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
[PMD: Rebased and simplified]
Co-Developed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Tested-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <20240805180622.21001-13-philmd@linaro.org>
Move the common code from loongson_ipi.c to loongson_ipi_common.c,
call parent_realize() instead of loongson_ipi_common_realize() in
loongson_ipi_realize().
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Tested-by: Bibo Mao <maobibo@loongson.cn>
Acked-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <20240805180622.21001-12-philmd@linaro.org>
In order to access loongson_ipi_core_read/write helpers
from loongson_ipi_common.c in the next commit, make their
prototype declaration public.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Tested-by: Bibo Mao <maobibo@loongson.cn>
Acked-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <20240805180622.21001-11-philmd@linaro.org>
Allow Loongson IPI implementations to have their own
cpu_by_arch_id() handler.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Tested-by: Bibo Mao <maobibo@loongson.cn>
Acked-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <20240805180622.21001-10-philmd@linaro.org>
Allow Loongson IPI implementations to have their own get_iocsr_as()
handler.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Tested-by: Bibo Mao <maobibo@loongson.cn>
Acked-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <20240805180622.21001-9-philmd@linaro.org>
In order to get LoongsonIPICommonClass in send_ipi_data()
in the next commit, propagate LoongsonIPICommonState.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Tested-by: Bibo Mao <maobibo@loongson.cn>
Acked-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <20240805180622.21001-8-philmd@linaro.org>
Move the IPICore structure and corresponding common fields
of LoongsonIPICommonState to "hw/intc/loongson_ipi_common.h".
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Tested-by: Bibo Mao <maobibo@loongson.cn>
Acked-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <20240805180622.21001-7-philmd@linaro.org>
It is easier to manage one array of MMIO MR rather
than one per vCPU.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Tested-by: Bibo Mao <maobibo@loongson.cn>
Acked-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <20240805180622.21001-6-philmd@linaro.org>
Introduce LOONGSON_IPI_COMMON stubs, QDev parent of LOONGSON_IPI.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Tested-by: Bibo Mao <maobibo@loongson.cn>
Acked-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <20240805180622.21001-4-philmd@linaro.org>
In preparation to extract common IPI code in few commits,
extract loongson_ipi_common_realize().
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Tested-by: Bibo Mao <maobibo@loongson.cn>
Acked-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <20240805180622.21001-3-philmd@linaro.org>
We'll have to add LoongsonIPIClass in few commits,
so rename LoongsonIPI as LoongsonIPIState for clarity.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Tested-by: Bibo Mao <maobibo@loongson.cn>
Acked-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Message-Id: <20240805180622.21001-2-philmd@linaro.org>
Patch 06b1297017 ("virtio-net: fix network stall under load")
added double-check to test whether the available buffer size
can satisfy the request or not, in case the guest has added
some buffers to the avail ring simultaneously after the first
check. It will be lucky if the available buffer size becomes
okay after the double-check, then the host can send the packet
to the guest. If the buffer size still can't satisfy the request,
even if the guest has added some buffers, viritio-net would
stall at the host side forever.
The patch enables notification and checks whether the guest has
added some buffers since last check of available buffers when
the available buffers are insufficient. If no buffer is added,
return false, else recheck the available buffers in the loop.
If the available buffers are sufficient, disable notification
and return true.
Changes:
1. Change the return type of virtqueue_get_avail_bytes() from void
to int, it returns an opaque that represents the shadow_avail_idx
of the virtqueue on success, else -1 on error.
2. Add a new API: virtio_queue_enable_notification_and_check(),
it takes an opaque as input arg which is returned from
virtqueue_get_avail_bytes(). It enables notification firstly,
then checks whether the guest has added some buffers since
last check of available buffers or not by virtio_queue_poll(),
return ture if yes.
The patch also reverts patch "06b12970174".
The case below can reproduce the stall.
Guest 0
+--------+
| iperf |
---------------> | server |
Host | +--------+
+--------+ | ...
| iperf |----
| client |---- Guest n
+--------+ | +--------+
| | iperf |
---------------> | server |
+--------+
Boot many guests from qemu with virtio network:
qemu ... -netdev tap,id=net_x \
-device virtio-net-pci-non-transitional,\
iommu_platform=on,mac=xx:xx:xx:xx:xx:xx,netdev=net_x
Each guest acts as iperf server with commands below:
iperf3 -s -D -i 10 -p 8001
iperf3 -s -D -i 10 -p 8002
The host as iperf client:
iperf3 -c guest_IP -p 8001 -i 30 -w 256k -P 20 -t 40000
iperf3 -c guest_IP -p 8002 -i 30 -w 256k -P 20 -t 40000
After some time, the host loses connection to the guest,
the guest can send packet to the host, but can't receive
packet from the host.
It's more likely to happen if SWIOTLB is enabled in the guest,
allocating and freeing bounce buffer takes some CPU ticks,
copying from/to bounce buffer takes more CPU ticks, compared
with that there is no bounce buffer in the guest.
Once the rate of producing packets from the host approximates
the rate of receiveing packets in the guest, the guest would
loop in NAPI.
receive packets ---
| |
v |
free buf virtnet_poll
| |
v |
add buf to avail ring ---
|
| need kick the host?
| NAPI continues
v
receive packets ---
| |
v |
free buf virtnet_poll
| |
v |
add buf to avail ring ---
|
v
... ...
On the other hand, the host fetches free buf from avail
ring, if the buf in the avail ring is not enough, the
host notifies the guest the event by writing the avail
idx read from avail ring to the event idx of used ring,
then the host goes to sleep, waiting for the kick signal
from the guest.
Once the guest finds the host is waiting for kick singal
(in virtqueue_kick_prepare_split()), it kicks the host.
The host may stall forever at the sequences below:
Host Guest
------------ -----------
fetch buf, send packet receive packet ---
... ... |
fetch buf, send packet add buf |
... add buf virtnet_poll
buf not enough avail idx-> add buf |
read avail idx add buf |
add buf ---
receive packet ---
write event idx ... |
wait for kick add buf virtnet_poll
... |
---
no more packet, exit NAPI
In the first loop of NAPI above, indicated in the range of
virtnet_poll above, the host is sending packets while the
guest is receiving packets and adding buffers.
step 1: The buf is not enough, for example, a big packet
needs 5 buf, but the available buf count is 3.
The host read current avail idx.
step 2: The guest adds some buf, then checks whether the
host is waiting for kick signal, not at this time.
The used ring is not empty, the guest continues
the second loop of NAPI.
step 3: The host writes the avail idx read from avail
ring to used ring as event idx via
virtio_queue_set_notification(q->rx_vq, 1).
step 4: At the end of the second loop of NAPI, recheck
whether kick is needed, as the event idx in the
used ring written by the host is beyound the
range of kick condition, the guest will not
send kick signal to the host.
Fixes: 06b1297017 ("virtio-net: fix network stall under load")
Cc: qemu-stable@nongnu.org
Signed-off-by: Wencheng Yang <east.moutain.yang@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Ensure the queue index points to a valid queue when software RSS
enabled. The new calculation matches with the behavior of Linux's TAP
device with the RSS eBPF program.
Fixes: 4474e37a5b ("virtio-net: implement RX RSS processing")
Reported-by: Zhibin Hu <huzhibin5@huawei.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Old linux kernel rtl8139 drivers (ex. debian 2.1) uses outb to set the rx
mode for RxConfig. Unfortunatelly qemu does not support outb for RxConfig.
Signed-off-by: Hans <sungdgdhtryrt@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
If VT-D hardware supports scalable mode, Linux will set the IQA DW field
(bit11). In qemu, the vtd_mem_write and vtd_update_iq_dw set DW field well.
However, vtd_mem_read the DW field wrong because "& VTD_IQA_QS" dropped the
value of DW.
Replace "&VTD_IQA_QS" with "& (VTD_IQA_QS | VTD_IQA_DW_MASK)" could save
the DW field.
Test patch as below:
config the "x-scalable-mode" option:
"-device intel-iommu,caching-mode=on,x-scalable-mode=on,aw-bits=48"
After Linux OS boot, check the IQA_REG DW Field by usage 1 or 2:
1. IOMMU_DEBUGFS:
Before fix:
cat /sys/kernel/debug/iommu/intel/iommu_regset |grep IQA
IQA 0x90 0x00000001001da001
After fix:
cat /sys/kernel/debug/iommu/intel/iommu_regset |grep IQA
IQA 0x90 0x00000001001da801
Check DW field(bit11) is 1.
2. devmem2 read the IQA_REG (offset 0x90):
Before fix:
devmem2 0xfed90090
/dev/mem opened.
Memory mapped at address 0x7f72c795b000.
Value at address 0xFED90090 (0x7f72c795b090): 0x1DA001
After fix:
devmem2 0xfed90090
/dev/mem opened.
Memory mapped at address 0x7fc95281c000.
Value at address 0xFED90090 (0x7fc95281c090): 0x1DA801
Check DW field(bit11) is 1.
Signed-off-by: yeeli <seven.yi.lee@gmail.com>
Message-Id: <20240725031858.1529902-1-seven.yi.lee@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In amdvi_update_iotlb() we will only put a new entry in the hash
table if to_cache.perm is not IOMMU_NONE. However we allocate the
memory for the new AMDVIIOTLBEntry and for the hash table key
regardless. This means that in the IOMMU_NONE case we will leak the
memory we alloacted.
Move the allocations into the if() to the point where we know we're
going to add the item to the hash table.
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2452
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20240731170019.3590563-1-peter.maydell@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>