mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Eric Blake	3066a540b5	nbd/server: CVE-2024-7409: Avoid use-after-free when closing server Commit `3e7ef738` plugged the use-after-free of the global nbd_server object, but overlooked a use-after-free of nbd_server->listener. Although this race is harder to hit, notice that our shutdown path first drops the reference count of nbd_server->listener, then triggers actions that can result in a pending client reaching the nbd_blockdev_client_closed() callback, which in turn calls qio_net_listener_set_client_func on a potentially stale object. If we know we don't want any more clients to connect, and have already told the listener socket to shut down, then we should not be trying to update the listener socket's associated function. Reproducer: > #!/usr/bin/python3 > > import os > from threading import Thread > > def start_stop(): > while 1: > os.system('virsh qemu-monitor-command VM \'{"execute": "nbd-server-start", +"arguments":{"addr":{"type":"unix","data":{"path":"/tmp/nbd-sock"}}}}\'') > os.system('virsh qemu-monitor-command VM \'{"execute": "nbd-server-stop"}\'') > > def nbd_list(): > while 1: > os.system('/path/to/build/qemu-nbd -L -k /tmp/nbd-sock') > > def test(): > sst = Thread(target=start_stop) > sst.start() > nlt = Thread(target=nbd_list) > nlt.start() > > sst.join() > nlt.join() > > test() Fixes: CVE-2024-7409 Fixes: `3e7ef738c8` ("nbd/server: CVE-2024-7409: Close stray clients at server-stop") CC: qemu-stable@nongnu.org Reported-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20240822143617.800419-2-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> (cherry picked from commit `3874f5f73c`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:29 +03:00
Eric Blake	73a5a92259	nbd/server: CVE-2024-7409: Close stray clients at server-stop A malicious client can attempt to connect to an NBD server, and then intentionally delay progress in the handshake, including if it does not know the TLS secrets. Although the previous two patches reduce this behavior by capping the default max-connections parameter and killing slow clients, they did not eliminate the possibility of a client waiting to close the socket until after the QMP nbd-server-stop command is executed, at which point qemu would SEGV when trying to dereference the NULL nbd_server global which is no longer present. This amounts to a denial of service attack. Worse, if another NBD server is started before the malicious client disconnects, I cannot rule out additional adverse effects when the old client interferes with the connection count of the new server (although the most likely is a crash due to an assertion failure when checking nbd_server->connections > 0). For environments without this patch, the CVE can be mitigated by ensuring (such as via a firewall) that only trusted clients can connect to an NBD server. Note that using frameworks like libvirt that ensure that TLS is used and that nbd-server-stop is not executed while any trusted clients are still connected will only help if there is also no possibility for an untrusted client to open a connection but then stall on the NBD handshake. Given the previous patches, it would be possible to guarantee that no clients remain connected by having nbd-server-stop sleep for longer than the default handshake deadline before finally freeing the global nbd_server object, but that could make QMP non-responsive for a long time. So intead, this patch fixes the problem by tracking all client sockets opened while the server is running, and forcefully closing any such sockets remaining without a completed handshake at the time of nbd-server-stop, then waiting until the coroutines servicing those sockets notice the state change. nbd-server-stop now has a second AIO_WAIT_WHILE_UNLOCKED (the first is indirectly through the blk_exp_close_all_type() that disconnects all clients that completed handshakes), but forced socket shutdown is enough to progress the coroutines and quickly tear down all clients before the server is freed, thus finally fixing the CVE. This patch relies heavily on the fact that nbd/server.c guarantees that it only calls nbd_blockdev_client_closed() from the main loop (see the assertion in nbd_client_put() and the hoops used in nbd_client_put_nonzero() to achieve that); if we did not have that guarantee, we would also need a mutex protecting our accesses of the list of connections to survive re-entrancy from independent iothreads. Although I did not actually try to test old builds, it looks like this problem has existed since at least commit `862172f45c` (v2.12.0, 2017) - even back when that patch started using a QIONetListener to handle listening on multiple sockets, nbd_server_free() was already unaware that the nbd_blockdev_client_closed callback can be reached later by a client thread that has not completed handshakes (and therefore the client's socket never got added to the list closed in nbd_export_close_all), despite that patch intentionally tearing down the QIONetListener to prevent new clients. Reported-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com> Fixes: CVE-2024-7409 CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20240807174943.771624-14-eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> (cherry picked from commit `3e7ef738c8`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:29 +03:00
Eric Blake	f4c1adb228	nbd/server: CVE-2024-7409: Drop non-negotiating clients A client that opens a socket but does not negotiate is merely hogging qemu's resources (an open fd and a small amount of memory); and a malicious client that can access the port where NBD is listening can attempt a denial of service attack by intentionally opening and abandoning lots of unfinished connections. The previous patch put a default bound on the number of such ongoing connections, but once that limit is hit, no more clients can connect (including legitimate ones). The solution is to insist that clients complete handshake within a reasonable time limit, defaulting to 10 seconds. A client that has not successfully completed NBD_OPT_GO by then (including the case of where the client didn't know TLS credentials to even reach the point of NBD_OPT_GO) is wasting our time and does not deserve to stay connected. Later patches will allow fine-tuning the limit away from the default value (including disabling it for doing integration testing of the handshake process itself). Note that this patch in isolation actually makes it more likely to see qemu SEGV after nbd-server-stop, as any client socket still connected when the server shuts down will now be closed after 10 seconds rather than at the client's whims. That will be addressed in the next patch. For a demo of this patch in action: $ qemu-nbd -f raw -r -t -e 10 file & $ nbdsh --opt-mode -c ' H = list() for i in range(20): print(i) H.insert(i, nbd.NBD()) H[i].set_opt_mode(True) H[i].connect_uri("nbd://localhost") ' $ kill $! where later connections get to start progressing once earlier ones are forcefully dropped for taking too long, rather than hanging. Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20240807174943.771624-13-eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> [eblake: rebase to changes earlier in series, reduce scope of timer] Signed-off-by: Eric Blake <eblake@redhat.com> (cherry picked from commit `b9b72cb3ce`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:29 +03:00
Eric Blake	9bdb40b2d4	nbd/server: CVE-2024-7409: Cap default max-connections to 100 Allowing an unlimited number of clients to any web service is a recipe for a rudimentary denial of service attack: the client merely needs to open lots of sockets without closing them, until qemu no longer has any more fds available to allocate. For qemu-nbd, we default to allowing only 1 connection unless more are explicitly asked for (-e or --shared); this was historically picked as a nice default (without an explicit -t, a non-persistent qemu-nbd goes away after a client disconnects, without needing any additional follow-up commands), and we are not going to change that interface now (besides, someday we want to point people towards qemu-storage-daemon instead of qemu-nbd). But for qemu proper, and the newer qemu-storage-daemon, the QMP nbd-server-start command has historically had a default of unlimited number of connections, in part because unlike qemu-nbd it is inherently persistent until nbd-server-stop. Allowing multiple client sockets is particularly useful for clients that can take advantage of MULTI_CONN (creating parallel sockets to increase throughput), although known clients that do so (such as libnbd's nbdcopy) typically use only 8 or 16 connections (the benefits of scaling diminish once more sockets are competing for kernel attention). Picking a number large enough for typical use cases, but not unlimited, makes it slightly harder for a malicious client to perform a denial of service merely by opening lots of connections withot progressing through the handshake. This change does not eliminate CVE-2024-7409 on its own, but reduces the chance for fd exhaustion or unlimited memory usage as an attack surface. On the other hand, by itself, it makes it more obvious that with a finite limit, we have the problem of an unauthenticated client holding 100 fds opened as a way to block out a legitimate client from being able to connect; thus, later patches will further add timeouts to reject clients that are not making progress. This is an INTENTIONAL change in behavior, and will break any client of nbd-server-start that was not passing an explicit max-connections parameter, yet expects more than 100 simultaneous connections. We are not aware of any such client (as stated above, most clients aware of MULTI_CONN get by just fine on 8 or 16 connections, and probably cope with later connections failing by relying on the earlier connections; libvirt has not yet been passing max-connections, but generally creates NBD servers with the intent for a single client for the sake of live storage migration; meanwhile, the KubeSAN project anticipates a large cluster sharing multiple clients [up to 8 per node, and up to 100 nodes in a cluster], but it currently uses qemu-nbd with an explicit --shared=0 rather than qemu-storage-daemon with nbd-server-start). We considered using a deprecation period (declare that omitting max-parameters is deprecated, and make it mandatory in 3 releases - then we don't need to pick an arbitrary default); that has zero risk of breaking any apps that accidentally depended on more than 100 connections, and where such breakage might not be noticed under unit testing but only under the larger loads of production usage. But it does not close the denial-of-service hole until far into the future, and requires all apps to change to add the parameter even if 100 was good enough. It also has a drawback that any app (like libvirt) that is accidentally relying on an unlimited default should seriously consider their own CVE now, at which point they are going to change to pass explicit max-connections sooner than waiting for 3 qemu releases. Finally, if our changed default breaks an app, that app can always pass in an explicit max-parameters with a larger value. It is also intentional that the HMP interface to nbd-server-start is not changed to expose max-connections (any client needing to fine-tune things should be using QMP). Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20240807174943.771624-12-eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> [ericb: Expand commit message to summarize Dan's argument for why we break corner-case back-compat behavior without a deprecation period] Signed-off-by: Eric Blake <eblake@redhat.com> (cherry picked from commit `c8a76dbd90`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:29 +03:00
Eric Blake	42f3a65e91	nbd/server: Plumb in new args to nbd_client_add() Upcoming patches to fix a CVE need to track an opaque pointer passed in by the owner of a client object, as well as request for a time limit on how fast negotiation must complete. Prepare for that by changing the signature of nbd_client_new() and adding an accessor to get at the opaque pointer, although for now the two servers (qemu-nbd.c and blockdev-nbd.c) do not change behavior even though they pass in a new default timeout value. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20240807174943.771624-11-eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> [eblake: s/LIMIT/MAX_SECS/ as suggested by Dan] Signed-off-by: Eric Blake <eblake@redhat.com> (cherry picked from commit `fb1c2aaa98`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:29 +03:00
Amjad Alsharafi	6622663a2c	iotests: Add `vvfat` tests Added several tests to verify the implementation of the vvfat driver. We needed a way to interact with it, so created a basic `fat16.py` driver that handled writing correct sectors for us. Added `vvfat` to the non-generic formats, as its not a normal image format. Signed-off-by: Amjad Alsharafi <amjadsharafi10@gmail.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <bb8149c945301aefbdf470a0924c07f69f9c087d.1721470238.git.amjadsharafi10@gmail.com> [kwolf: Made mypy and pylint happy to unbreak 297] Signed-off-by: Kevin Wolf <kwolf@redhat.com> (cherry picked from commit `c8f60bfb43`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:29 +03:00
Amjad Alsharafi	a7c4ab770a	vvfat: Fix reading files with non-continuous clusters When reading with `read_cluster` we get the `mapping` with `find_mapping_for_cluster` and then we call `open_file` for this mapping. The issue appear when its the same file, but a second cluster that is not immediately after it, imagine clusters `500 -> 503`, this will give us 2 mappings one has the range `500..501` and another `503..504`, both point to the same file, but different offsets. When we don't open the file since the path is the same, we won't assign `s->current_mapping` and thus accessing way out of bound of the file. From our example above, after `open_file` (that didn't open anything) we will get the offset into the file with `s->cluster_size(cluster_num-s->current_mapping->begin)`, which will give us `0x2000 (504-500)`, which is out of bound for this mapping and will produce some issues. Signed-off-by: Amjad Alsharafi <amjadsharafi10@gmail.com> Message-ID: <1f3ea115779abab62ba32c788073cdc99f9ad5dd.1721470238.git.amjadsharafi10@gmail.com> [kwolf: Simplified the patch based on Amjad's analysis and input] Signed-off-by: Kevin Wolf <kwolf@redhat.com> (cherry picked from commit `5eed3db336`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:29 +03:00
Amjad Alsharafi	a32b8139b9	vvfat: Fix wrong checks for cluster mappings invariant How this `abort` was intended to check for was: - if the `mapping->first_mapping_index` is not the same as `first_mapping_index`, which should happen only in one case, when we are handling the first mapping, in that case `mapping->first_mapping_index == -1`, in all other cases, the other mappings after the first should have the condition `true`. - From above, we know that this is the first mapping, so if the offset is not `0`, then abort, since this is an invalid state. The issue was that `first_mapping_index` is not set if we are checking from the middle, the variable `first_mapping_index` is only set if we passed through the check `cluster_was_modified` with the first mapping, and in the same function call we checked the other mappings. One approach is to go into the loop even if `cluster_was_modified` is not true so that we will be able to set `first_mapping_index` for the first mapping, but since `first_mapping_index` is only used here, another approach is to just check manually for the `mapping->first_mapping_index != -1` since we know that this is the value for the only entry where `offset == 0` (i.e. first mapping). Signed-off-by: Amjad Alsharafi <amjadsharafi10@gmail.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <b0fbca3ee208c565885838f6a7deeaeb23f4f9c2.1721470238.git.amjadsharafi10@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> (cherry picked from commit `f60a6f7e17`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:28 +03:00
Amjad Alsharafi	073c9f96ac	vvfat: Fix usage of `info.file.offset` The field is marked as "the offset in the file (in clusters)", but it was being used like this `cluster_size*(nums)+mapping->info.file.offset`, which is incorrect. Signed-off-by: Amjad Alsharafi <amjadsharafi10@gmail.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <72f19a7903886dda1aa78bcae0e17702ee939262.1721470238.git.amjadsharafi10@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> (cherry picked from commit `21b25a0e46`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:28 +03:00
Amjad Alsharafi	382618a67e	vvfat: Fix bug in writing to middle of file Before this commit, the behavior when calling `commit_one_file` for example with `offset=0x2000` (second cluster), what will happen is that we won't fetch the next cluster from the fat, and instead use the first cluster for the read operation. This is due to off-by-one error here, where `i=0x2000 !< offset=0x2000`, thus not fetching the next cluster. Signed-off-by: Amjad Alsharafi <amjadsharafi10@gmail.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <b97c1e1f1bc2f776061ae914f95d799d124fcd73.1721470238.git.amjadsharafi10@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> (cherry picked from commit `b881cf00c9`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:28 +03:00
Philippe Mathieu-Daudé	9df1c0a555	hw/sd/sdhci: Reset @data_count index on invalid ADMA transfers We neglected to clear the @data_count index on ADMA error, allowing to trigger assertion in sdhci_read_dataport() or sdhci_write_dataport(). Cc: qemu-stable@nongnu.org Fixes: `d7dfca0807` ("hw/sdhci: introduce standard SD host controller") Reported-by: Zheyu Ma <zheyuma97@gmail.com> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2455 Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20240730092138.32443-4-philmd@linaro.org> (cherry picked from commit `ed5a159c3d`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:28 +03:00
Richard Henderson	f9b9a68b3e	tcg/ppc: Sync tcg_out_test and constraints Ensure the code structure is the same for matching constraints and emitting code, lest we allow constants that cannot be trivially tested. Cc: qemu-stable@nongnu.org Fixes: `ad788aebba` ("tcg/ppc: Support TCG_COND_TST{EQ,NE}") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2487 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <44328324-af73-4439-9d2b-d414e0e13dd7@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> (cherry picked from commit `682a052805`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:28 +03:00
Richard Henderson	73b491b792	target/i386: Fix VSIB decode With normal SIB, index == 4 indicates no index. With VSIB, there is no exception for VR4/VR12. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2474 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Link: https://lore.kernel.org/r/20240805003130.1421051-3-richard.henderson@linaro.org Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit `ac63755b20`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> (Mjt: modify the change to pre-new-decoder introduced past qemu 9.0)	2024-08-28 08:37:28 +03:00
Ilya Leoshkevich	ce15d843f1	linux-user/elfload: Fix pr_pid values in core files Analyzing qemu-produced core dumps of multi-threaded apps runs into: (gdb) info threads [...] 21 Thread 0x3ff83cc0740 (LWP 9295) warning: Couldn't find general-purpose registers in core file. <unavailable> in ?? () The reason is that all pr_pid values are the same, because the same TaskState is used for all CPUs when generating NT_PRSTATUS notes. Fix by using TaskStates associated with individual CPUs. Cc: qemu-stable@nongnu.org Fixes: `243c470662` ("linux-user/elfload: Write corefile elf header in one block") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20240801202340.21845-1-iii@linux.ibm.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> (cherry picked from commit `5b0c2742c8`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:28 +03:00
Fabiano Rosas	986e253afd	migration/multifd: Fix multifd_send_setup cleanup when channel creation fails When a channel fails to create, the code currently just returns. This is wrong for two reasons: 1) Channel n+1 will not get to initialize it's semaphores, leading to an assert when terminate_threads tries to post to it: qemu-system-x86_64: ../util/qemu-thread-posix.c:92: qemu_mutex_lock_impl: Assertion `mutex->initialized' failed. 2) (theoretical) If channel n-1 already started creation it will defeat the purpose of the channels_created logic which is in place to avoid migrate_fd_cleanup() to run while channels are still being created. This cannot really happen today because the current failure cases for multifd_new_send_channel_create() are all synchronous, resulting from qio_channel_file_new_path() getting a bad filename. This would hit all channels equally. But I don't want to set a trap for future people, so have all channels try to create (even if failing), and only fail after the channels_created semaphore has been posted. While here, remove the error_report_err call. There's one already at migrate_fd_cleanup later on. Cc: qemu-stable@nongnu.org Reported-by: Jim Fehlig <jfehlig@suse.com> Fixes: `b7b03eb614` ("migration/multifd: Add outgoing QIOChannelFile support") Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> (cherry picked from commit `0bd5b9284f`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:28 +03:00
David Woodhouse	8636d5a32c	net: Reinstate '-net nic, model=help' output as documented in man page While refactoring the NIC initialization code, I broke '-net nic,model=help' which no longer outputs a list of available NIC models. Fixes: `2cdeca04ad` ("net: report list of available models according to platform") Cc: qemu-stable@nongnu.org Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Jason Wang <jasowang@redhat.com> (cherry picked from commit `64f75f57f9`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:28 +03:00
thomas	6ce7791497	virtio-net: Fix network stall at the host side waiting for kick Patch `06b1297017` ("virtio-net: fix network stall under load") added double-check to test whether the available buffer size can satisfy the request or not, in case the guest has added some buffers to the avail ring simultaneously after the first check. It will be lucky if the available buffer size becomes okay after the double-check, then the host can send the packet to the guest. If the buffer size still can't satisfy the request, even if the guest has added some buffers, viritio-net would stall at the host side forever. The patch enables notification and checks whether the guest has added some buffers since last check of available buffers when the available buffers are insufficient. If no buffer is added, return false, else recheck the available buffers in the loop. If the available buffers are sufficient, disable notification and return true. Changes: 1. Change the return type of virtqueue_get_avail_bytes() from void to int, it returns an opaque that represents the shadow_avail_idx of the virtqueue on success, else -1 on error. 2. Add a new API: virtio_queue_enable_notification_and_check(), it takes an opaque as input arg which is returned from virtqueue_get_avail_bytes(). It enables notification firstly, then checks whether the guest has added some buffers since last check of available buffers or not by virtio_queue_poll(), return ture if yes. The patch also reverts patch "06b12970174". The case below can reproduce the stall. Guest 0 +--------+ \| iperf \| ---------------> \| server \| Host \| +--------+ +--------+ \| ... \| iperf \|---- \| client \|---- Guest n +--------+ \| +--------+ \| \| iperf \| ---------------> \| server \| +--------+ Boot many guests from qemu with virtio network: qemu ... -netdev tap,id=net_x \ -device virtio-net-pci-non-transitional,\ iommu_platform=on,mac=xx:xx:xx:xx:xx:xx,netdev=net_x Each guest acts as iperf server with commands below: iperf3 -s -D -i 10 -p 8001 iperf3 -s -D -i 10 -p 8002 The host as iperf client: iperf3 -c guest_IP -p 8001 -i 30 -w 256k -P 20 -t 40000 iperf3 -c guest_IP -p 8002 -i 30 -w 256k -P 20 -t 40000 After some time, the host loses connection to the guest, the guest can send packet to the host, but can't receive packet from the host. It's more likely to happen if SWIOTLB is enabled in the guest, allocating and freeing bounce buffer takes some CPU ticks, copying from/to bounce buffer takes more CPU ticks, compared with that there is no bounce buffer in the guest. Once the rate of producing packets from the host approximates the rate of receiveing packets in the guest, the guest would loop in NAPI. receive packets --- \| \| v \| free buf virtnet_poll \| \| v \| add buf to avail ring --- \| \| need kick the host? \| NAPI continues v receive packets --- \| \| v \| free buf virtnet_poll \| \| v \| add buf to avail ring --- \| v ... ... On the other hand, the host fetches free buf from avail ring, if the buf in the avail ring is not enough, the host notifies the guest the event by writing the avail idx read from avail ring to the event idx of used ring, then the host goes to sleep, waiting for the kick signal from the guest. Once the guest finds the host is waiting for kick singal (in virtqueue_kick_prepare_split()), it kicks the host. The host may stall forever at the sequences below: Host Guest ------------ ----------- fetch buf, send packet receive packet --- ... ... \| fetch buf, send packet add buf \| ... add buf virtnet_poll buf not enough avail idx-> add buf \| read avail idx add buf \| add buf --- receive packet --- write event idx ... \| wait for kick add buf virtnet_poll ... \| --- no more packet, exit NAPI In the first loop of NAPI above, indicated in the range of virtnet_poll above, the host is sending packets while the guest is receiving packets and adding buffers. step 1: The buf is not enough, for example, a big packet needs 5 buf, but the available buf count is 3. The host read current avail idx. step 2: The guest adds some buf, then checks whether the host is waiting for kick signal, not at this time. The used ring is not empty, the guest continues the second loop of NAPI. step 3: The host writes the avail idx read from avail ring to used ring as event idx via virtio_queue_set_notification(q->rx_vq, 1). step 4: At the end of the second loop of NAPI, recheck whether kick is needed, as the event idx in the used ring written by the host is beyound the range of kick condition, the guest will not send kick signal to the host. Fixes: `06b1297017` ("virtio-net: fix network stall under load") Cc: qemu-stable@nongnu.org Signed-off-by: Wencheng Yang <east.moutain.yang@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> (cherry picked from commit `f937309fbd`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> (Mjt: context fixup in include/hw/virtio/virtio.h)	2024-08-28 08:37:28 +03:00
Akihiko Odaki	f8f1ea2a3c	virtio-net: Ensure queue index fits with RSS Ensure the queue index points to a valid queue when software RSS enabled. The new calculation matches with the behavior of Linux's TAP device with the RSS eBPF program. Fixes: `4474e37a5b` ("virtio-net: implement RX RSS processing") Reported-by: Zhibin Hu <huzhibin5@huawei.com> Cc: qemu-stable@nongnu.org Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> (cherry picked from commit `f1595ceb9a`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Fixes: CVE-2024-6505	2024-08-28 08:37:28 +03:00
Peter Maydell	d639c0cb09	target/arm: Handle denormals correctly for FMOPA (widening) The FMOPA (widening) SME instruction takes pairs of half-precision floating point values, widens them to single-precision, does a two-way dot product and accumulates the results into a single-precision destination. We don't quite correctly handle the FPCR bits FZ and FZ16 which control flushing of denormal inputs and outputs. This is because at the moment we pass a single float_status value to the helper function, which then uses that configuration for all the fp operations it does. However, because the inputs to this operation are float16 and the outputs are float32 we need to use the fp_status_f16 for the float16 input widening but the normal fp_status for everything else. Otherwise we will apply the flushing control FPCR.FZ16 to the 32-bit output rather than the FPCR.FZ control, and incorrectly flush a denormal output to zero when we should not (or vice-versa). (In commit `207d30b5fd` we tried to fix the FZ handling but didn't get it right, switching from "use FPCR.FZ for everything" to "use FPCR.FZ16 for everything".) (Mjt: it is commit `43929c818c` in stable-9.0) Pass the CPU env to the sme_fmopa_h helper instead of an fp_status pointer, and have the helper pass an extra fp_status into the f16_dotadd() function so that we can use the right status for the right parts of this operation. Cc: qemu-stable@nongnu.org Fixes: `207d30b5fd` ("target/arm: Use FPST_F16 for SME FMOPA (widening)") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2373 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> (cherry picked from commit `55f9f4ee01`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:28 +03:00
Marco Palumbi	65fee9ff07	hw/arm/mps2-tz.c: fix RX/TX interrupts order The order of the RX and TX interrupts are swapped. This commit fixes the order as per the following documents: * https://developer.arm.com/documentation/dai0505/latest/ * https://developer.arm.com/documentation/dai0521/latest/ * https://developer.arm.com/documentation/dai0524/latest/ * https://developer.arm.com/documentation/dai0547/latest/ Cc: qemu-stable@nongnu.org Signed-off-by: Marco Palumbi <Marco.Palumbi@tii.ae> Message-id: 20240730073123.72992-1-marco@palumbi.it Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> (cherry picked from commit `5a558be93a`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:28 +03:00
Peter Maydell	c15edb498d	hw/i386/amd_iommu: Don't leak memory in amdvi_update_iotlb() In amdvi_update_iotlb() we will only put a new entry in the hash table if to_cache.perm is not IOMMU_NONE. However we allocate the memory for the new AMDVIIOTLBEntry and for the hash table key regardless. This means that in the IOMMU_NONE case we will leak the memory we alloacted. Move the allocations into the if() to the point where we know we're going to add the item to the hash table. Cc: qemu-stable@nongnu.org Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2452 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20240731170019.3590563-1-peter.maydell@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit `9a45b07616`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:28 +03:00
Peter Maydell	d4d8259be5	docs/sphinx/depfile.py: Handle env.doc2path() returning a Path not a str In newer versions of Sphinx the env.doc2path() API is going to change to return a Path object rather than a str. This was originally visible in Sphinx 8.0.0rc1, but has been rolled back for the final 8.0.0 release. However it will probably emit a deprecation warning and is likely to change for good in 9.0: https://github.com/sphinx-doc/sphinx/issues/12686 Our use in depfile.py assumes a str, and if it is passed a Path it will fall over: Handler <function write_depfile at 0x77a1775ff560> for event 'build-finished' threw an exception (exception: unsupported operand type(s) for +: 'PosixPath' and 'str') Wrapping the env.doc2path() call in str() will coerce a Path object to the str we expect, and have no effect in older Sphinx versions that do return a str. Cc: qemu-stable@nongnu.org Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2458 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20240729120533.2486427-1-peter.maydell@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> (cherry picked from commit `48e5b5f994`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:28 +03:00
Peter Maydell	e5f76d5291	target/arm: Ignore SMCR_EL2.LEN and SVCR_EL2.LEN if EL2 is not enabled When determining the current vector length, the SMCR_EL2.LEN and SVCR_EL2.LEN settings should only be considered if EL2 is enabled (compare the pseudocode CurrentSVL and CurrentNSVL which call EL2Enabled()). We were checking against ARM_FEATURE_EL2 rather than calling arm_is_el2_enabled(), which meant that we would look at SMCR_EL2/SVCR_EL2 when in Secure EL1 or Secure EL0 even if Secure EL2 was not enabled. Use the correct check in sve_vqm1_for_el_sm(). Cc: qemu-stable@nongnu.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240722172957.1041231-5-peter.maydell@linaro.org (cherry picked from commit `f573ac059e`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:28 +03:00
Peter Maydell	540a70d2c4	target/arm: Avoid shifts by -1 in tszimm_shr() and tszimm_shl() The function tszimm_esz() returns a shift amount, or possibly -1 in certain cases that correspond to unallocated encodings in the instruction set. We catch these later in the trans_ functions (generally with an "a-esz < 0" check), but before we do the decodetree-generated code will also call tszimm_shr() or tszimm_sl(), which will use the tszimm_esz() return value as a shift count without checking that it is not negative, which is undefined behaviour. Avoid the UB by checking the return value in tszimm_shr() and tszimm_shl(). Cc: qemu-stable@nongnu.org Resolves: Coverity CID 1547617, 1547694 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240722172957.1041231-4-peter.maydell@linaro.org (cherry picked from commit `76916dfa89`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:28 +03:00
Peter Maydell	188b8ad0e4	target/arm: Fix UMOPA/UMOPS of 16-bit values The UMOPA/UMOPS instructions are supposed to multiply unsigned 8 or 16 bit elements and accumulate the products into a 64-bit element. In the Arm ARM pseudocode, this is done with the usual infinite-precision signed arithmetic. However our implementation doesn't quite get it right, because in the DEF_IMOP_64() macro we do: sum += (NTYPE)(n >> 0) * (MTYPE)(m >> 0); where NTYPE and MTYPE are uint16_t or int16_t. In the uint16_t case, the C usual arithmetic conversions mean the values are converted to "int" type and the multiply is done as a 32-bit multiply. This means that if the inputs are, for example, 0xffff and 0xffff then the result is 0xFFFE0001 as an int, which is then promoted to uint64_t for the accumulation into sum; this promotion incorrectly sign extends the multiply. Avoid the incorrect sign extension by casting to int64_t before the multiply, so we do the multiply as 64-bit signed arithmetic, which is a type large enough that the multiply can never overflow into the sign bit. (The equivalent 8-bit operations in DEF_IMOP_32() are fine, because the 8-bit multiplies can never overflow into the sign bit of a 32-bit integer.) Cc: qemu-stable@nongnu.org Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2372 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240722172957.1041231-3-peter.maydell@linaro.org (cherry picked from commit `ea3f5a90f0`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:28 +03:00
Peter Maydell	3a21e7bad6	target/arm: Don't assert for 128-bit tile accesses when SVL is 128 For an instruction which accesses a 128-bit element tile when the SVL is also 128 (for example MOV z0.Q, p0/M, ZA0H.Q[w0,0]), we will assert in get_tile_rowcol(): qemu-system-aarch64: ../../tcg/tcg-op.c:926: tcg_gen_deposit_z_i32: Assertion `len > 0' failed. This happens because we calculate len = ctz32(streaming_vec_reg_size(s)) - esz;$ but if the SVL and the element size are the same len is 0, and the deposit operation asserts. In this case the ZA storage contains exactly one 128 bit element ZA tile, and the horizontal or vertical slice is just that tile. This means that regardless of the index value in the Ws register, we always access that tile. (In pseudocode terms, we calculate (index + offset) MOD 1, which is 0.) Special case the len == 0 case to avoid hitting the assertion in tcg_gen_deposit_z_i32(). Cc: qemu-stable@nongnu.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240722172957.1041231-2-peter.maydell@linaro.org (cherry picked from commit `56f1c0db92`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:28 +03:00
Peter Maydell	a1019d125e	hw/misc/bcm2835_property: Fix handling of FRAMEBUFFER_SET_PALETTE The documentation of the "Set palette" mailbox property at https://github.com/raspberrypi/firmware/wiki/Mailbox-property-interface#set-palette says it has the form: Length: 24..1032 Value: u32: offset: first palette index to set (0-255) u32: length: number of palette entries to set (1-256) u32...: RGBA palette values (offset to offset+length-1) We get this wrong in a couple of ways: * we aren't checking the offset and length are in range, so the guest can make us spin for a long time by providing a large length * the bounds check on our loop is wrong: we should iterate through 'length' palette entries, not 'length - offset' entries Fix the loop to implement the bounds checks and get the loop condition right. In the process, make the variables local to this switch case, rather than function-global, so it's clearer what type they are when reading the code. Cc: qemu-stable@nongnu.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20240723131029.1159908-2-peter.maydell@linaro.org (cherry picked from commit `0892fffc2a`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> (Mjt: context fix due to lack of v9.0.0-1812-g5d5f1b60916a "hw/misc: Implement mailbox properties for customer OTP and device specific private keys" also remove now-unused local `n' variable which gets removed in the next change in this file, v9.0.0-2720-g32f1c201eedf "hw/misc/bcm2835_property: Avoid overflow in OTP access properties")	2024-08-28 08:37:28 +03:00
Frederik van Hövell	0555a42377	hw/char/bcm2835_aux: Fix assert when receive FIFO fills up When a bare-metal application on the raspi3 board reads the AUX_MU_STAT_REG MMIO register while the device's buffer is at full receive FIFO capacity (i.e. `s->read_count == BCM2835_AUX_RX_FIFO_LEN`) the assertion `assert(s->read_count < BCM2835_AUX_RX_FIFO_LEN)` fails. Reported-by: Cryptjar <cryptjar@junk.studio> Suggested-by: Cryptjar <cryptjar@junk.studio> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/459 Signed-off-by: Frederik van Hövell <frederik@fvhovell.nl> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> [PMM: commit message tweaks] Signed-off-by: Peter Maydell <peter.maydell@linaro.org> (cherry picked from commit `546d574b11`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:28 +03:00
Richard Henderson	e10112aea6	target/rx: Use target_ulong for address in LI Using int32_t meant that the address was sign-extended to uint64_t when passing to translator_ld*, triggering an assert. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2453 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Thomas Huth <thuth@redhat.com> (cherry picked from commit `83340193b9`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:28 +03:00
Thomas Huth	908ceebba8	hw/virtio: Fix the de-initialization of vhost-user devices The unrealize functions of the various vhost-user devices are calling the corresponding vhost__set_status() functions with a status of 0 to shut down the device correctly. Now these vhost__set_status() functions all follow this scheme: bool should_start = virtio_device_should_start(vdev, status); if (vhost_dev_is_started(&vvc->vhost_dev) == should_start) { return; } if (should_start) { /* ... do the initialization stuff ... / } else { / ... do the cleanup stuff ... / } The problem here is virtio_device_should_start(vdev, 0) currently always returns "true" since it internally only looks at vdev->started instead of looking at the "status" parameter. Thus once the device got started once, virtio_device_should_start() always returns true and thus the vhost__set_status() functions return early, without ever doing any clean-up when being called with status == 0. This causes e.g. problems when trying to hot-plug and hot-unplug a vhost user devices multiple times since the de-initialization step is completely skipped during the unplug operation. This bug has been introduced in commit `9f6bcfd99f` ("hw/virtio: move vm_running check to virtio_device_started") which replaced should_start = status & VIRTIO_CONFIG_S_DRIVER_OK; with should_start = virtio_device_started(vdev, status); which later got replaced by virtio_device_should_start(). This blocked the possibility to set should_start to false in case the status flag VIRTIO_CONFIG_S_DRIVER_OK was not set. Fix it by adjusting the virtio_device_should_start() function to only consider the status flag instead of vdev->started. Since this function is only used in the various vhost_*_set_status() functions for exactly the same purpose, it should be fine to fix it in this central place there without any risk to change the behavior of other code. Fixes: `9f6bcfd99f` ("hw/virtio: move vm_running check to virtio_device_started") Buglink: https://issues.redhat.com/browse/RHEL-40708 Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20240618121958.88673-1-thuth@redhat.com> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit `d72479b117`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:28 +03:00
Sergey Dyasli	d1eacbef3e	Revert "qemu-char: do not operate on sources from finalize callbacks" This reverts commit `2b316774f6`. After `038b421788` ("Revert "chardev: use a child source for qio input source"") we've been observing the "iwp->src == NULL" assertion triggering periodically during the initial capabilities querying by libvirtd. One of possible backtraces: Thread 1 (Thread 0x7f16cd4f0700 (LWP 43858)): 0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 1 0x00007f16c6c21e65 in __GI_abort () at abort.c:79 2 0x00007f16c6c21d39 in __assert_fail_base at assert.c:92 3 0x00007f16c6c46e86 in __GI___assert_fail (assertion=assertion@entry=0x562e9bcdaadd "iwp->src == NULL", file=file@entry=0x562e9bcdaac8 "../chardev/char-io.c", line=line@entry=99, function=function@entry=0x562e9bcdab10 <__PRETTY_FUNCTION__.20549> "io_watch_poll_finalize") at assert.c:101 4 0x0000562e9ba20c2c in io_watch_poll_finalize (source=<optimized out>) at ../chardev/char-io.c:99 5 io_watch_poll_finalize (source=<optimized out>) at ../chardev/char-io.c:88 6 0x00007f16c904aae0 in g_source_unref_internal () from /lib64/libglib-2.0.so.0 7 0x00007f16c904baf9 in g_source_destroy_internal () from /lib64/libglib-2.0.so.0 8 0x0000562e9ba20db0 in io_remove_watch_poll (source=0x562e9d6720b0) at ../chardev/char-io.c:147 9 remove_fd_in_watch (chr=chr@entry=0x562e9d5f3800) at ../chardev/char-io.c:153 10 0x0000562e9ba23ffb in update_ioc_handlers (s=0x562e9d5f3800) at ../chardev/char-socket.c:592 11 0x0000562e9ba2072f in qemu_chr_fe_set_handlers_full at ../chardev/char-fe.c:279 12 0x0000562e9ba207a9 in qemu_chr_fe_set_handlers at ../chardev/char-fe.c:304 13 0x0000562e9ba2ca75 in monitor_qmp_setup_handlers_bh (opaque=0x562e9d4c2c60) at ../monitor/qmp.c:509 14 0x0000562e9bb6222e in aio_bh_poll (ctx=ctx@entry=0x562e9d4c2f20) at ../util/async.c:216 15 0x0000562e9bb4de0a in aio_poll (ctx=0x562e9d4c2f20, blocking=blocking@entry=true) at ../util/aio-posix.c:722 16 0x0000562e9b99dfaa in iothread_run (opaque=0x562e9d4c26f0) at ../iothread.c:63 17 0x0000562e9bb505a4 in qemu_thread_start (args=0x562e9d4c7ea0) at ../util/qemu-thread-posix.c:543 18 0x00007f16c70081ca in start_thread (arg=<optimized out>) at pthread_create.c:479 19 0x00007f16c6c398d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 io_remove_watch_poll(), which makes sure that iwp->src is NULL, calls g_source_destroy() which finds that iwp->src is not NULL in the finalize callback. This can only happen if another thread has managed to trigger io_watch_poll_prepare() callback in the meantime. Move iwp->src destruction back to the finalize callback to prevent the described race, and also remove the stale comment. The deadlock glib bug was fixed back in 2010 by b35820285668 ("gmain: move finalization of GSource outside of context lock"). Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sergey Dyasli <sergey.dyasli@nutanix.com> Link: https://lore.kernel.org/r/20240712092659.216206-1-sergey.dyasli@nutanix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit `e0bf95443e`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:28 +03:00
Peter Maydell	7c17a7a9bd	util/async.c: Forbid negative min/max in aio_context_set_thread_pool_params() aio_context_set_thread_pool_params() takes two int64_t arguments to set the minimum and maximum number of threads in the pool. We do some bounds checking on these, but we don't catch the case where the inputs are negative. This means that later in the function when we assign these inputs to the AioContext::thread_pool_min and ::thread_pool_max fields, which are of type int, the values might overflow the smaller type. A negative number of threads is meaningless, so make aio_context_set_thread_pool_params() return an error if either min or max are negative. Resolves: Coverity CID 1547605 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20240723150927.1396456-1-peter.maydell@linaro.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> (cherry picked from commit `851495571d`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-08-28 08:37:28 +03:00
Song Gao	50675777a0	target/loongarch: Fix helper_lddir() a CID INTEGER_OVERFLOW issue When the lddir level is 4 and the base is a HugePage, we may try to put value 4 into a field in the TLBENTRY that is only 2 bits wide. Fixes: Coverity CID 1547717 Fixes: `9c70db9a43` ("target/loongarch: Fix tlb huge page loading issue") Signed-off-by: Song Gao <gaosong@loongson.cn> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20240724015853.1317396-1-gaosong@loongson.cn> (cherry picked from commit `a18ffbcf8b`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-07-26 13:12:12 +03:00
Philippe Mathieu-Daudé	9d0076b4a1	hw/intc/loongson_ipi: Fix resource leak Once initialised, QOM objects can be realized and unrealized multiple times before being finalized. Resources allocated in REALIZE must be deallocated in an equivalent UNREALIZE handler. Free the CPU array in loongson_ipi_unrealize() instead of loongson_ipi_finalize(). Cc: qemu-stable@nongnu.org Fixes: `5e90b8db38` ("hw/loongarch: Set iocsr address space per-board rather than percpu") Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Song Gao <gaosong@loongson.cn> Message-Id: <20240723111405.14208-3-philmd@linaro.org> (cherry picked from commit `0c2086bc73`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> (Mjt: rename loongson back to longarch for 9.0 due to lack of v9.0.0-582-gb4a12dfc2132 "hw/intc/loongarch_ipi: Rename as loongson_ipi")	2024-07-24 16:14:36 +03:00
Bibo Mao	437c556037	hw/intc/loongson_ipi: Access memory in little endian Loongson IPI is only available in little-endian, so use that to access the guest memory (in case we run on a big-endian host). Cc: qemu-stable@nongnu.org Signed-off-by: Bibo Mao <maobibo@loongson.cn> Fixes: `f6783e3438` ("hw/loongarch: Add LoongArch ipi interrupt support") [PMD: Extracted from bigger commit, added commit description] Co-Developed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Bibo Mao <maobibo@loongson.cn> Tested-by: Bibo Mao <maobibo@loongson.cn> Acked-by: Song Gao <gaosong@loongson.cn> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Message-Id: <20240718133312.10324-3-philmd@linaro.org> (cherry picked from commit `2465c89fb9`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> (Mjt: fixups for 9.0, for lack of: v9.0.0-583-g91d0b151de4c "hw/intc/loongson_ipi: Implement IOCSR address space for MIPS" v9.0.0-582-gb4a12dfc2132 "hw/intc/loongarch_ipi: Rename as loongson_ipi")	2024-07-24 13:13:38 +03:00
songziming	425457bf75	chardev/char-win-stdio.c: restore old console mode If I use `-serial stdio` on Windows, after QEMU exits, the terminal could not handle arrow keys and tab any more. Because stdio backend on Windows sets console mode to virtual terminal input when starts, but does not restore the old mode when finalize. This small patch saves the old console mode and set it back. Signed-off-by: Ziming Song <s.ziming@hotmail.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-ID: <ME3P282MB25488BE7C39BF0C35CD0DA5D8CA82@ME3P282MB2548.AUSP282.PROD.OUTLOOK.COM> (cherry picked from commit `903cc9e117`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-07-24 07:47:38 +03:00
Paolo Bonzini	8b9e56e732	target/i386: do not crash if microvm guest uses SGX CPUID leaves sgx_epc_get_section assumes a PC platform is in use: bool sgx_epc_get_section(int section_nr, uint64_t addr, uint64_t size) { PCMachineState *pcms = PC_MACHINE(qdev_get_machine()); However, sgx_epc_get_section is called by CPUID regardless of whether SGX state has been initialized or which platform is in use. Check whether the machine has the right QOM class and if not behave as if there are no EPC sections. Fixes: `1dec2e1f19` ("i386: Update SGX CPUID info according to hardware/KVM/user input", 2021-09-30) Cc: qemu-stable@nongnu.org Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2142 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit `13be929aff`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-07-24 07:45:34 +03:00
Clément Mathieu--Drif	485637f282	intel_iommu: fix FRCD construction macro The constant must be unsigned, otherwise the two's complement overrides the other fields when a PASID is present. Fixes: `1b2b12376c` ("intel-iommu: PASID support") Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Minwoo Im <minwoo.im@samsung.com> Message-Id: <20240709142557.317271-2-clement.mathieu--drif@eviden.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit `a3c8d7e385`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-07-24 07:41:28 +03:00
Manos Pitsidianakis	439009617e	virtio-snd: check for invalid param shift operands When setting the parameters of a PCM stream, we compute the bit flag with the format and rate values as shift operand to check if they are set in supported_formats and supported_rates. If the guest provides a format/rate value which when shifting 1 results in a value bigger than the number of bits in supported_formats/supported_rates, we must report an error. Previously, this ended up triggering the not reached assertions later when converting to internal QEMU values. Reported-by: Zheyu Ma <zheyuma97@gmail.com> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2416 Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Message-Id: <virtio-snd-fuzz-2416-fix-v1-manos.pitsidianakis@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit `9b6083465f`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-07-24 07:38:33 +03:00
Manos Pitsidianakis	6ef295cb1c	virtio-snd: add max size bounds check in input cb When reading input audio in the virtio-snd input callback, virtio_snd_pcm_in_cb(), we do not check whether the iov can actually fit the data buffer. This is because we use the buffer->size field as a total-so-far accumulator instead of byte-size-left like in TX buffers. This triggers an out of bounds write if the size of the virtio queue element is equal to virtio_snd_pcm_status, which makes the available space for audio data zero. This commit adds a check for reaching the maximum buffer size before attempting any writes. Reported-by: Zheyu Ma <zheyuma97@gmail.com> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2427 Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Message-Id: <virtio-snd-fuzz-2427-fix-v1-manos.pitsidianakis@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit `98e77e3dd8`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-07-24 07:35:33 +03:00
Zhao Liu	e9e92433c8	hw/cxl/cxl-host: Fix segmentation fault when getting cxl-fmw property QEMU crashes (Segmentation fault) when getting cxl-fmw property via qmp: (QEMU) qom-get path=machine property=cxl-fmw This issue is caused by accessing wrong callback (opaque) type in machine_get_cfmw(). cxl_machine_init() sets the callback as `CXLState ` type but machine_get_cfmw() treats the callback as `CXLFixedMemoryWindowOptionsList `. Fix this error by casting opaque to `CXLState ` type in machine_get_cfmw(). Fixes: `03b39fcf64` ("hw/cxl: Make the CXL fixed memory window setup a machine parameter.") Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Xingtao Yao <yaoxt.fnst@fujitsu.com> Link: https://lore.kernel.org/r/20240704093404.1848132-1-zhao1.liu@linux.intel.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240705113956.941732-2-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit `a207d5f87d`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-07-24 07:31:09 +03:00
Zheyu Ma	7676498754	hw/nvme: fix memory leak in nvme_dsm The allocated memory to hold LBA ranges leaks in the nvme_dsm function. This happens because the allocated memory for iocb->range is not freed in all error handling paths. Fix this by adding a free to ensure that the allocated memory is properly freed. ASAN log: ==3075137==ERROR: LeakSanitizer: detected memory leaks Direct leak of 480 byte(s) in 6 object(s) allocated from: #0 0x55f1f8a0eddd in malloc llvm/compiler-rt/lib/asan/asan_malloc_linux.cpp:129:3 #1 0x7f531e0f6738 in g_malloc (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x5e738) #2 0x55f1faf1f091 in blk_aio_get block/block-backend.c:2583:12 #3 0x55f1f945c74b in nvme_dsm hw/nvme/ctrl.c:2609:30 #4 0x55f1f945831b in nvme_io_cmd hw/nvme/ctrl.c:4470:16 #5 0x55f1f94561b7 in nvme_process_sq hw/nvme/ctrl.c:7039:29 Cc: qemu-stable@nongnu.org Fixes: `d7d1474fd8` ("hw/nvme: reimplement dsm to allow cancellation") Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> (cherry picked from commit `c510fe78f1`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-07-23 21:00:28 +03:00
Akihiko Odaki	44304ff1a8	hvf: arm: Do not advance PC when raising an exception hvf did not advance PC when raising an exception for most unhandled system registers, but it mistakenly advanced PC when raising an exception for GICv3 registers. Cc: qemu-stable@nongnu.org Fixes: `a2260983c6` ("hvf: arm: Add support for GICv3") Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20240716-pmu-v3-4-8c7c1858a227@daynix.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> (cherry picked from commit `30a1690f24`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-07-19 19:52:11 +03:00
Richard Henderson	43929c818c	target/arm: Use FPST_F16 for SME FMOPA (widening) This operation has float16 inputs and thus must use the FZ16 control not the FZ control. Cc: qemu-stable@nongnu.org Fixes: `3916841ac7` ("target/arm: Implement FMOPA, FMOPS (widening)") Reported-by: Daniyal Khan <danikhan632@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20240717060149.204788-3-richard.henderson@linaro.org Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2374 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> (cherry picked from commit `207d30b5fd`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-07-19 19:51:57 +03:00
Daniyal Khan	0169889dca	target/arm: Use float_status copy in sme_fmopa_s We made a copy above because the fp exception flags are not propagated back to the FPST register, but then failed to use the copy. Cc: qemu-stable@nongnu.org Fixes: `558e956c71` ("target/arm: Implement FMOPA, FMOPS (non-widening)") Signed-off-by: Daniyal Khan <danikhan632@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20240717060149.204788-2-richard.henderson@linaro.org [rth: Split from a larger patch] Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> (cherry picked from commit `31d93fedf4`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-07-19 19:51:12 +03:00
Peter Maydell	50ecd5b1fd	target/arm: LDAPR should honour SCTLR_ELx.nAA In commit `c1a1f80518` when we added the FEAT_LSE2 relaxations to the alignment requirements for atomic and ordered loads and stores, we didn't quite get it right for LDAPR/LDAPRH/LDAPRB with no immediate offset. These instructions were handled in the old decoder as part of disas_ldst_atomic(), but unlike all the other insns that function decoded (LDADD, LDCLR, etc) these insns are "ordered", not "atomic", so they should be using check_ordered_align() rather than check_atomic_align(). Commit `c1a1f80518` used check_atomic_align() regardless for everything in disas_ldst_atomic(). We then carried that incorrect check over in the decodetree conversion, where LDAPR/LDAPRH/LDAPRB are now handled by trans_LDAPR(). The effect is that when FEAT_LSE2 is implemented, these instructions don't honour the SCTLR_ELx.nAA bit and will generate alignment faults when they should not. (The LDAPR insns with an immediate offset were in disas_ldst_ldapr_stlr() and then in trans_LDAPR_i() and trans_STLR_i(), and have always used the correct check_ordered_align().) Use check_ordered_align() in trans_LDAPR(). Cc: qemu-stable@nongnu.org Fixes: `c1a1f80518` ("target/arm: Relax ordered/atomic alignment checks for LSE2") Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240709134504.3500007-3-peter.maydell@linaro.org (cherry picked from commit `25489b521b`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-07-19 19:50:46 +03:00
Peter Maydell	c68b380b62	target/arm: Fix handling of LDAPR/STLR with negative offset When we converted the LDAPR/STLR instructions to decodetree we accidentally introduced a regression where the offset is negative. The 9-bit immediate field is signed, and the old hand decoder correctly used sextract32() to get it out of the insn word, but the ldapr_stlr_i pattern in the decode file used "imm:9" instead of "imm:s9", so it treated the field as unsigned. Fix the pattern to treat the field as a signed immediate. Cc: qemu-stable@nongnu.org Fixes: `2521b6073b` ("target/arm: Convert LDAPR/STLR (imm) to decodetree") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2419 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240709134504.3500007-2-peter.maydell@linaro.org (cherry picked from commit `5669d26ec6`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-07-19 19:50:25 +03:00
Markus Armbruster	ab32beaa8e	qapi/qom: Document feature unstable of @x-vfio-user-server Commit `8f9a9259d3` added ObjectType member @x-vfio-user-server with feature unstable, but neglected to explain why it is unstable. Do that now. Fixes: `8f9a9259d3` (vfio-user: define vfio-user-server object) Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com> Cc: John G Johnson <john.g.johnson@oracle.com> Cc: Jagannathan Raman <jag.raman@oracle.com> Cc: qemu-stable@nongnu.org Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20240703095310.1242102-1-armbru@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> [Indentation fixed] (cherry picked from commit `3becc93908`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-07-19 19:44:07 +03:00
Fiona Ebner	34ac08aa1c	scsi: fix regression and honor bootindex again for legacy drives Commit `3089637461` ("scsi: Don't ignore most usb-storage properties") removed the call to object_property_set_int() and thus the 'set' method for the bootindex property was also not called anymore. Here that method is device_set_bootindex() (as configured by scsi_dev_instance_init() -> device_add_bootindex_property()) which as a side effect registers the device via add_boot_device_path(). As reported by a downstream user [0], the bootindex property did not have the desired effect anymore for legacy drives. Fix the regression by explicitly calling the add_boot_device_path() function after checking that the bootindex is not yet used (to avoid add_boot_device_path() calling exit()). [0]: https://forum.proxmox.com/threads/149772/post-679433 Cc: qemu-stable@nongnu.org Fixes: `3089637461` ("scsi: Don't ignore most usb-storage properties") Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Link: https://lore.kernel.org/r/20240710152529.1737407-1-f.ebner@proxmox.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit `57a8a80d1a`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-07-19 19:43:05 +03:00
Fiona Ebner	fb35521448	hw/scsi/lsi53c895a: bump instruction limit in scripts processing to fix regression Commit `9876359990` ("hw/scsi/lsi53c895a: add timer to scripts processing") reduced the maximum allowed instruction count by a factor of 100 all the way down to 100. This causes the "Check Point R81.20 Gaia" appliance [0] to fail to boot after fully finishing the installation via the appliance's web interface (there is already one reboot before that). With a limit of 150, the appliance still fails to boot, while with a limit of 200, it works. Bump to 500 to fix the regression and be on the safe side. Originally reported in the Proxmox community forum[1]. [0]: https://support.checkpoint.com/results/download/124397 [1]: https://forum.proxmox.com/threads/149772/post-683459 Cc: qemu-stable@nongnu.org Fixes: `9876359990` ("hw/scsi/lsi53c895a: add timer to scripts processing") Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Acked-by: Sven Schnelle <svens@stackframe.org> Link: https://lore.kernel.org/r/20240715131403.223239-1-f.ebner@proxmox.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit `a4975023fb`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-07-19 19:42:44 +03:00

1 2 3 4 5 ...

112429 Commits