mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Emanuele Giuseppe Esposito	7455ff1aa0	aio_wait_kick: add missing memory barrier It seems that aio_wait_kick always required a memory barrier or atomic operation in the caller, but nobody actually took care of doing it. Let's put the barrier in the function instead, and pair it with another one in AIO_WAIT_WHILE. Read aio_wait_kick() comment for further explanation. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220524173054.12651-1-eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Fabian Ebner	9b38fc56c0	block/gluster: correctly set max_pdiscard On 64-bit platforms, assigning SIZE_MAX to the int64_t max_pdiscard results in a negative value, and the following assertion would trigger down the line (it's not the same max_pdiscard, but computed from the other one): qemu-system-x86_64: ../block/io.c:3166: bdrv_co_pdiscard: Assertion `max_pdiscard >= bs->bl.request_alignment' failed. On 32-bit platforms, it's fine to keep using SIZE_MAX. The assertion in qemu_gluster_co_pdiscard() is checking that the value of 'bytes' can safely be passed to glfs_discard_async(), which takes a size_t for the argument in question, so it is kept as is. And since max_pdiscard is still <= SIZE_MAX, relying on max_pdiscard is still fine. Fixes: `0c8022876f` ("block: use int64_t instead of int in driver discard handlers") Cc: qemu-stable@nongnu.org Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> Message-Id: <20220520075922.43972-1-f.ebner@proxmox.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Stefano Garzarella	66dc5f9606	block/rbd: report a better error when namespace does not exist If the namespace does not exist, rbd_create() fails with -ENOENT and QEMU reports a generic "error rbd create: No such file or directory": $ qemu-img create rbd:rbd/namespace/image 1M Formatting 'rbd:rbd/namespace/image', fmt=raw size=1048576 qemu-img: rbd:rbd/namespace/image: error rbd create: No such file or directory Unfortunately rados_ioctx_set_namespace() does not fail if the namespace does not exist, so let's use rbd_namespace_exists() in qemu_rbd_connect() to check if the namespace exists, reporting a more understandable error: $ qemu-img create rbd:rbd/namespace/image 1M Formatting 'rbd:rbd/namespace/image', fmt=raw size=1048576 qemu-img: rbd:rbd/namespace/image: namespace 'namespace' does not exist Reported-by: Tingting Mao <timao@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220517071012.6120-1-sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Stefan Hajnoczi	ca941c406c	qsd: document vduse-blk exports Document vduse-blk exports in qemu-storage-daemon --help and the qemu-storage-daemon(1) man page. Based-on: <20220523084611.91-1-xieyongji@bytedance.com> Cc: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220525121947.859820-1-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Xie Yongji	d043e2db87	libvduse: Add support for reconnecting To support reconnecting after restart or crash, VDUSE backend might need to resubmit inflight I/Os. This stores the metadata such as the index of inflight I/O's descriptors to a shm file so that VDUSE backend can restore them during reconnecting. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20220523084611.91-9-xieyongji@bytedance.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Xie Yongji	9e4dea6727	vduse-blk: Add vduse-blk resize support To support block resize, this uses vduse_dev_update_config() to update the capacity field in configuration space and inject config interrupt on the block resize callback. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220523084611.91-8-xieyongji@bytedance.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Xie Yongji	2a2359b844	vduse-blk: Implement vduse-blk export This implements a VDUSE block backends based on the libvduse library. We can use it to export the BDSs for both VM and container (host) usage. The new command-line syntax is: $ qemu-storage-daemon \ --blockdev file,node-name=drive0,filename=test.img \ --export vduse-blk,node-name=drive0,id=vduse-export0,writable=on After the qemu-storage-daemon started, we need to use the "vdpa" command to attach the device to vDPA bus: $ vdpa dev add name vduse-export0 mgmtdev vduse Also the device must be removed via the "vdpa" command before we stop the qemu-storage-daemon. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220523084611.91-7-xieyongji@bytedance.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Xie Yongji	a6caeee811	libvduse: Add VDUSE (vDPA Device in Userspace) library VDUSE [1] is a linux framework that makes it possible to implement software-emulated vDPA devices in userspace. This adds a library as a subproject to help implementing VDUSE backends in QEMU. [1] https://www.kernel.org/doc/html/latest/userspace-api/vduse.html Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20220523084611.91-6-xieyongji@bytedance.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Xie Yongji	92e879505f	linux-headers: Add vduse.h This adds vduse header to linux headers so that the relevant VDUSE API can be used in subsequent patches. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220523084611.91-5-xieyongji@bytedance.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Xie Yongji	5c36802970	block/export: Abstract out the logic of virtio-blk I/O process Abstract the common logic of virtio-blk I/O process to a function named virtio_blk_process_req(). It's needed for the following commit. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20220523084611.91-4-xieyongji@bytedance.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Xie Yongji	8e7fd6f623	block/export: Fix incorrect length passed to vu_queue_push() Now the req->size is set to the correct value only when handling VIRTIO_BLK_T_GET_ID request. This patch fixes it. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20220523084611.91-3-xieyongji@bytedance.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Xie Yongji	ac1fc3a3a9	block: Support passing NULL ops to blk_set_dev_ops() This supports passing NULL ops to blk_set_dev_ops() so that we can remove stale ops in some cases. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220523084611.91-2-xieyongji@bytedance.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Vladimir Sementsov-Ogievskiy	618af89e55	block: simplify handling of try to merge different sized bitmaps We have too much logic to simply check that bitmaps are of the same size. Let's just define that hbitmap_merge() and bdrv_dirty_bitmap_merge_internal() require their argument bitmaps be of same size, this simplifies things. Let's look through the callers: For backup_init_bcs_bitmap() we already assert that merge can't fail. In bdrv_reclaim_dirty_bitmap_locked() we gracefully handle the error that can't happen: successor always has same size as its parent, drop this logic. In bdrv_merge_dirty_bitmap() we already has assertion and separate check. Make the check explicit and improve error message. Signed-off-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru> Reviewed-by: Nikita Lapshin <nikita.lapshin@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20220517111206.23585-4-v.sementsov-og@mail.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Vladimir Sementsov-Ogievskiy	58cbfbdf73	block: improve block_dirty_bitmap_merge(): don't allocate extra bitmap We don't need extra bitmap. All we need is to backup the original bitmap when we do first merge. So, drop extra temporary bitmap and work directly with target and backup. Still to keep old semantics, that on failure target is unchanged and user don't need to restore, we need a local_backup variable and do restore ourselves on failure path. Signed-off-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru> Message-Id: <20220517111206.23585-3-v.sementsov-og@mail.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Vladimir Sementsov-Ogievskiy	775b30b305	block: block_dirty_bitmap_merge(): fix error path At the end we ignore failure of bdrv_merge_dirty_bitmap() and report success. And still set errp. That's wrong. Signed-off-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru> Reviewed-by: Nikita Lapshin <nikita.lapshin@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20220517111206.23585-2-v.sementsov-og@mail.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Stefan Hajnoczi	1ab5096b3a	block: get rid of blk->guest_block_size Commit `1b7fd72955` ("block: rename buffer_alignment to guest_block_size") noted: At this point, the field is set by the device emulation, but completely ignored by the block layer. The last time the value of buffer_alignment/guest_block_size was actually used was before commit `339064d506` ("block: Don't use guest sector size for qemu_blockalign()"). This value has not been used since 2013. Get rid of it. Cc: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220518130945.2657905-1-stefanha@redhat.com> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Faria <afaria@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Stefan Hajnoczi	3399848b7f	block: drop unused bdrv_co_drain() API bdrv_co_drain() has not been used since commit `9a0cec664e` ("mirror: use bdrv_drained_begin/bdrv_drained_end") in 2016. Remove it so there are fewer drain scenarios to worry about. Use bdrv_drained_begin()/bdrv_drained_end() instead. They are "mixed" functions that can be called from coroutine context. Unlike bdrv_co_drain(), these functions provide control of the length of the drained section, which is usually the right thing. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220521122714.3837731-1-stefanha@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Alberto Faria <afaria@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2022-06-24 17:07:06 +02:00
Richard Henderson	3a821c52e1	hw/nvme updates - sriov functionality - odd fixes -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEUigzqnXi3OaiR2bATeGvMW1PDekFAmK02wUACgkQTeGvMW1P DenNPwgAwhQCXXacTb+6vEdxN30QoWygzQj5BLm//SiXlj7hBX7P/JqCxYF5vUDU EaZkl4n3ry5T1xqlUWIBFdIAmKyrsWz2eKTrX41g64i/L+/nfJXZ+IgQc3WkM/FK 5NwwAE8q/JGiRczLesF/9QvQq/90L6QtyC48bsS8AIcl5IcqHCKGwEJS7LErltex YZDJyTNU4wB2XFophylJUL43GrHa/kUFA2ZHgs9iuH0p5LGG6UM3KoinBKcbwn47 iEWKccvsHSyfE8VpJJS5STMEeGGaBPziZ654ElLmzVq6EXDKMCoX03naQ9Q8oSpl FiktbxllCYdmECb44PNBEd/nLdpCdQ== =o54a -----END PGP SIGNATURE----- Merge tag 'nvme-next-pull-request' of git://git.infradead.org/qemu-nvme into staging hw/nvme updates - sriov functionality - odd fixes # -----BEGIN PGP SIGNATURE----- # # iQEzBAABCAAdFiEEUigzqnXi3OaiR2bATeGvMW1PDekFAmK02wUACgkQTeGvMW1P # DenNPwgAwhQCXXacTb+6vEdxN30QoWygzQj5BLm//SiXlj7hBX7P/JqCxYF5vUDU # EaZkl4n3ry5T1xqlUWIBFdIAmKyrsWz2eKTrX41g64i/L+/nfJXZ+IgQc3WkM/FK # 5NwwAE8q/JGiRczLesF/9QvQq/90L6QtyC48bsS8AIcl5IcqHCKGwEJS7LErltex # YZDJyTNU4wB2XFophylJUL43GrHa/kUFA2ZHgs9iuH0p5LGG6UM3KoinBKcbwn47 # iEWKccvsHSyfE8VpJJS5STMEeGGaBPziZ654ElLmzVq6EXDKMCoX03naQ9Q8oSpl # FiktbxllCYdmECb44PNBEd/nLdpCdQ== # =o54a # -----END PGP SIGNATURE----- # gpg: Signature made Thu 23 Jun 2022 02:28:37 PM PDT # gpg: using RSA key 522833AA75E2DCE6A24766C04DE1AF316D4F0DE9 # gpg: Good signature from "Klaus Jensen <its@irrelevant.dk>" [unknown] # gpg: aka "Klaus Jensen <k.jensen@samsung.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: DDCA 4D9C 9EF9 31CC 3468 4272 63D5 6FC5 E55D A838 # Subkey fingerprint: 5228 33AA 75E2 DCE6 A247 66C0 4DE1 AF31 6D4F 0DE9 * tag 'nvme-next-pull-request' of git://git.infradead.org/qemu-nvme: hw/nvme: clear aen mask on reset Revert "hw/block/nvme: add support for sgl bit bucket descriptor" hw/nvme: clean up CC register write logic hw/acpi: Make the PCI hot-plug aware of SR-IOV hw/nvme: Update the initalization place for the AER queue docs: Add documentation for SR-IOV and Virtualization Enhancements hw/nvme: Add support for the Virtualization Management command hw/nvme: Initialize capability structures for primary/secondary controllers hw/nvme: Calculate BAR attributes in a function hw/nvme: Remove reg_size variable and update BAR0 size calculation hw/nvme: Make max_ioqpairs and msix_qsize configurable in runtime hw/nvme: Implement the Function Level Reset hw/nvme: Add support for Secondary Controller List hw/nvme: Add support for Primary Controller Capabilities hw/nvme: Add support for SR-IOV Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2022-06-23 14:52:30 -07:00
Klaus Jensen	98836e8e01	hw/nvme: clear aen mask on reset The internally maintained AEN mask is not cleared on reset. Fix this. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:29 +02:00
Klaus Jensen	b9147a3aa1	Revert "hw/block/nvme: add support for sgl bit bucket descriptor" This reverts commit `d97eee64fe`. The emulated controller correctly accounts for not including bit buckets in the controller-to-host data transfer, however it doesn't correctly account for the holes for the on-disk data offsets. Reported-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:29 +02:00
Klaus Jensen	cc9bcee265	hw/nvme: clean up CC register write logic The SRIOV series exposed an issued with how CC register writes are handled and how CSTS is set in response to that. Specifically, after applying the SRIOV series, the controller could end up in a state with CC.EN set to '1' but with CSTS.RDY cleared to '0', causing drivers to expect CSTS.RDY to transition to '1' but timing out. Clean this up. Reviewed-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Reviewed-by: Lukasz Maniak <lukasz.maniak@linux.intel.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:29 +02:00
Łukasz Gieryk	58660bfa36	hw/acpi: Make the PCI hot-plug aware of SR-IOV PCI device capable of SR-IOV support is a new, still-experimental feature with only a single working example of the Nvme device. This patch in an attempt to fix a double-free problem when a SR-IOV-capable Nvme device is hot-unplugged in the following scenario: Qemu CLI: --------- -device pcie-root-port,slot=0,id=rp0 -device nvme-subsys,id=subsys0 -device nvme,id=nvme0,bus=rp0,serial=deadbeef,subsys=subsys0,sriov_max_vfs=1,sriov_vq_flexible=2,sriov_vi_flexible=1 Guest OS: --------- sudo nvme virt-mgmt /dev/nvme0 -c 0 -r 1 -a 1 -n 0 sudo nvme virt-mgmt /dev/nvme0 -c 0 -r 0 -a 1 -n 0 echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset sleep 1 echo 1 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs nvme virt-mgmt /dev/nvme0 -c 1 -r 1 -a 8 -n 1 nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 8 -n 2 nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 9 -n 0 sleep 2 echo 01:00.1 > /sys/bus/pci/drivers/nvme/bind Qemu monitor: ------------- device_del nvme0 Explanation of the problem and the proposed solution: 1) The current SR-IOV implementation assumes it’s the PhysicalFunction that creates and deletes VirtualFunctions. 2) It’s a design decision (the Nvme device at least) for the VFs to be of the same class as PF. Effectively, they share the dc->hotpluggable value. 3) When a VF is created, it’s added as a child node to PF’s PCI bus slot. 4) Monitor/device_del triggers the ACPI mechanism. The implementation is not aware of SR/IOV and ejects PF’s PCI slot, directly unrealizing all hot-pluggable (!acpi_pcihp_pc_no_hotplug) children nodes. 5) VFs are unrealized directly, and it doesn’t work well with (1). SR/IOV structures are not updated, so when it’s PF’s turn to be unrealized, it works on stale pointers to already-deleted VFs. The proposed fix is to make the PCI ACPI code aware of SR/IOV. Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:29 +02:00
Łukasz Gieryk	b7698b917a	hw/nvme: Update the initalization place for the AER queue This patch updates the initialization place for the AER queue, so it’s initialized once, at controller initialization, and not every time controller is enabled. While the original version works for a non-SR-IOV device, as it’s hard to interact with the controller if it’s not enabled, the multiple reinitialization is not necessarily correct. With the SR/IOV feature enabled a segfault can happen: a VF can have its controller disabled, while a namespace can still be attached to the controller through the parent PF. An event generated in such case ends up on an uninitialized queue. While it’s an interesting question whether a VF should support AER in the first place, I don’t think it must be answered today. Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:29 +02:00
Lukasz Maniak	751babf5bb	docs: Add documentation for SR-IOV and Virtualization Enhancements Documentation describes 5 new parameters being added regarding SR-IOV: sriov_max_vfs sriov_vq_flexible sriov_vi_flexible sriov_max_vi_per_vf sriov_max_vq_per_vf The description also includes the simplest possible QEMU invocation and the series of NVMe commands required to enable SR-IOV support. Signed-off-by: Lukasz Maniak <lukasz.maniak@linux.intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:29 +02:00
Łukasz Gieryk	11871f53ef	hw/nvme: Add support for the Virtualization Management command With the new command one can: - assign flexible resources (queues, interrupts) to primary and secondary controllers, - toggle the online/offline state of given controller. Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:29 +02:00
Łukasz Gieryk	746d42b133	hw/nvme: Initialize capability structures for primary/secondary controllers With four new properties: - sriov_v{i,q}_flexible, - sriov_max_v{i,q}_per_vf, one can configure the number of available flexible resources, as well as the limits. The primary and secondary controller capability structures are initialized accordingly. Since the number of available queues (interrupts) now varies between VF/PF, BAR size calculation is also adjusted. Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:29 +02:00
Łukasz Gieryk	aa81771337	hw/nvme: Calculate BAR attributes in a function An NVMe device with SR-IOV capability calculates the BAR size differently for PF and VF, so it makes sense to extract the common code to a separate function. Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:29 +02:00
Łukasz Gieryk	3bfcc51737	hw/nvme: Remove reg_size variable and update BAR0 size calculation The n->reg_size parameter unnecessarily splits the BAR0 size calculation in two phases; removed to simplify the code. With all the calculations done in one place, it seems the pow2ceil, applied originally to reg_size, is unnecessary. The rounding should happen as the last step, when BAR size includes Nvme registers, queue registers, and MSIX-related space. Finally, the size of the mmio memory region is extended to cover the 1st 4KiB padding (see the map below). Access to this range is handled as interaction with a non-existing queue and generates an error trace, so actually nothing changes, while the reg_size variable is no longer needed. -------------------- \| BAR0 \| -------------------- [Nvme Registers ] [Queues ] [power-of-2 padding] - removed in this patch [4KiB padding (1) ] [MSIX TABLE ] [4KiB padding (2) ] [MSIX PBA ] [power-of-2 padding] Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:29 +02:00
Łukasz Gieryk	decc02614f	hw/nvme: Make max_ioqpairs and msix_qsize configurable in runtime The NVMe device defines two properties: max_ioqpairs, msix_qsize. Having them as constants is problematic for SR-IOV support. SR-IOV introduces virtual resources (queues, interrupts) that can be assigned to PF and its dependent VFs. Each device, following a reset, should work with the configured number of queues. A single constant is no longer sufficient to hold the whole state. This patch tries to solve the problem by introducing additional variables in NvmeCtrl’s state. The variables for, e.g., managing queues are therefore organized as: - n->params.max_ioqpairs – no changes, constant set by the user - n->(mutable_state) – (not a part of this patch) user-configurable, specifies number of queues available _after_ reset - n->conf_ioqpairs - (new) used in all the places instead of the ‘old’ n->params.max_ioqpairs; initialized in realize() and updated during reset() to reflect user’s changes to the mutable state Since the number of available i/o queues and interrupts can change in runtime, buffers for sq/cqs and the MSIX-related structures are allocated big enough to handle the limits, to completely avoid the complicated reallocation. A helper function (nvme_update_msixcap_ts) updates the corresponding capability register, to signal configuration changes. Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:28 +02:00
Łukasz Gieryk	1e9c685ec7	hw/nvme: Implement the Function Level Reset This patch implements the Function Level Reset, a feature currently not implemented for the Nvme device, while listed as a mandatory ("shall") in the 1.4 spec. The implementation reuses FLR-related building blocks defined for the pci-bridge module, and follows the same logic: - FLR capability is advertised in the PCIE config, - custom pci_write_config callback detects a write to the trigger register and performs the PCI reset, - which, eventually, calls the custom dc->reset handler. Depending on reset type, parts of the state should (or should not) be cleared. To distinguish the type of reset, an additional parameter is passed to the reset function. This patch also enables advertisement of the Power Management PCI capability. The main reason behind it is to announce the no_soft_reset=1 bit, to signal SR-IOV support where each VF can be reset individually. The implementation purposedly ignores writes to the PMCS.PS register, as even such naïve behavior is enough to correctly handle the D3->D0 transition. It’s worth to note, that the power state transition back to to D3, with all the corresponding side effects, wasn't and stil isn't handled properly. Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:28 +02:00
Lukasz Maniak	99f48ae7ae	hw/nvme: Add support for Secondary Controller List Introduce handling for Secondary Controller List (Identify command with CNS value of 15h). Secondary controller ids are unique in the subsystem, hence they are reserved by it upon initialization of the primary controller to the number of sriov_max_vfs. ID reservation requires the addition of an intermediate controller slot state, so the reserved controller has the address 0xFFFF. A secondary controller is in the reserved state when it has no virtual function assigned, but its primary controller is realized. Secondary controller reservations are released to NULL when its primary controller is unregistered. Signed-off-by: Lukasz Maniak <lukasz.maniak@linux.intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:28 +02:00
Lukasz Maniak	5e6f963f01	hw/nvme: Add support for Primary Controller Capabilities Implementation of Primary Controller Capabilities data structure (Identify command with CNS value of 14h). Currently, the command returns only ID of a primary controller. Handling of remaining fields are added in subsequent patches implementing virtualization enhancements. Signed-off-by: Lukasz Maniak <lukasz.maniak@linux.intel.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:28 +02:00
Lukasz Maniak	44c2c09488	hw/nvme: Add support for SR-IOV This patch implements initial support for Single Root I/O Virtualization on an NVMe device. Essentially, it allows to define the maximum number of virtual functions supported by the NVMe controller via sriov_max_vfs parameter. Passing a non-zero value to sriov_max_vfs triggers reporting of SR-IOV capability by a physical controller and ARI capability by both the physical and virtual function devices. NVMe controllers created via virtual functions mirror functionally the physical controller, which may not entirely be the case, thus consideration would be needed on the way to limit the capabilities of the VF. NVMe subsystem is required for the use of SR-IOV. Signed-off-by: Lukasz Maniak <lukasz.maniak@linux.intel.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2022-06-23 23:24:28 +02:00
Richard Henderson	7db86fe2ed	[v3] Migration pull 2022-06-23 This replaces my and Juan's earlier pulls over the last 2 days; 4th time lucky? Compared to my pull earlier: Removed Hyman's dirty ring set In this migration PULL request: - Dainiel Berrangé - qemufileops cleanup - Leonardo Bras - cleanups for zero copy - Juan Quintela - RDMA cleanups Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEERfXHG0oMt/uXep+pBRYzHrxb/ecFAmK0mnwACgkQBRYzHrxb /ecJYQ/8DdQBVYpJjpHj3mBx39aXodM7uM4Jt1okuSF92y9KRSNdIs3nvvwWAZbL dWAUHUZBNOfZF7Eqe6WWCIDNxUTz88RkMM16N3+a5sqBa0xU6rP6cvyw9vYrnsmx aHVQ1ESosTby2qcb1ofjYRXWNt7GhDtRIH55m3mSalP/WAgjMe3MsrAtz66T4w55 4paTVwgS/WMuLD9dqyESyofePnvtM8z3ye2a75JRscBQYmpO+XuX3IX5ah6m439s fI1BezQU2Q4YNDmCEWvdfZ2tqgcfi8dHnu0JB9QTfbkPVh9jw25VPpnymimMB7iW MlXAlDr7m9HQI6OjIkq8pXBcgWQpbVGMon1CcrDmGCReEjnQ5lTsb27fkXzf/Nwu 09iuNfYGcSGAbZ8GZa/lrRTGeINrSj99uOVxrTvVS0db2+1va3hkamGMULhsdX6O smOrje79pVLAr7JJSMH2bqFv9cKtLu77HndSVtswkRRMhtDU+VQI5FxYlwueuawJ toDM4DJMd3pJHIpPrUwxlo4D9dkdxPfqC1GATDPxw9/vYgbORn8fkt5g9EYxBzc0 pWRY9SNuw0MC54JGEoFc77+VKJXK1A97j9GoF+Vyoh30yTgZ3q9tm2eElpYwtHDy t8zEVC9QadcgMdRAnJqgZgaWdfwKiHpjplSn5lOGDLOo7gfSmik= =ajVU -----END PGP SIGNATURE----- Merge tag 'pull-migration-20220623b' of https://gitlab.com/dagrh/qemu into staging [v3] Migration pull 2022-06-23 This replaces my and Juan's earlier pulls over the last 2 days; 4th time lucky? Compared to my pull earlier: Removed Hyman's dirty ring set In this migration PULL request: - Dainiel Berrangé - qemufileops cleanup - Leonardo Bras - cleanups for zero copy - Juan Quintela - RDMA cleanups Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEERfXHG0oMt/uXep+pBRYzHrxb/ecFAmK0mnwACgkQBRYzHrxb # /ecJYQ/8DdQBVYpJjpHj3mBx39aXodM7uM4Jt1okuSF92y9KRSNdIs3nvvwWAZbL # dWAUHUZBNOfZF7Eqe6WWCIDNxUTz88RkMM16N3+a5sqBa0xU6rP6cvyw9vYrnsmx # aHVQ1ESosTby2qcb1ofjYRXWNt7GhDtRIH55m3mSalP/WAgjMe3MsrAtz66T4w55 # 4paTVwgS/WMuLD9dqyESyofePnvtM8z3ye2a75JRscBQYmpO+XuX3IX5ah6m439s # fI1BezQU2Q4YNDmCEWvdfZ2tqgcfi8dHnu0JB9QTfbkPVh9jw25VPpnymimMB7iW # MlXAlDr7m9HQI6OjIkq8pXBcgWQpbVGMon1CcrDmGCReEjnQ5lTsb27fkXzf/Nwu # 09iuNfYGcSGAbZ8GZa/lrRTGeINrSj99uOVxrTvVS0db2+1va3hkamGMULhsdX6O # smOrje79pVLAr7JJSMH2bqFv9cKtLu77HndSVtswkRRMhtDU+VQI5FxYlwueuawJ # toDM4DJMd3pJHIpPrUwxlo4D9dkdxPfqC1GATDPxw9/vYgbORn8fkt5g9EYxBzc0 # pWRY9SNuw0MC54JGEoFc77+VKJXK1A97j9GoF+Vyoh30yTgZ3q9tm2eElpYwtHDy # t8zEVC9QadcgMdRAnJqgZgaWdfwKiHpjplSn5lOGDLOo7gfSmik= # =ajVU # -----END PGP SIGNATURE----- # gpg: Signature made Thu 23 Jun 2022 09:53:16 AM PDT # gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7 # gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full] * tag 'pull-migration-20220623b' of https://gitlab.com/dagrh/qemu: (25 commits) migration: remove the QEMUFileOps abstraction migration: remove the QEMUFileOps 'get_return_path' callback migration: remove the QEMUFileOps 'writev_buffer' callback migration: remove the QEMUFileOps 'get_buffer' callback migration: remove the QEMUFileOps 'close' callback migration: remove the QEMUFileOps 'set_blocking' callback migration: remove the QEMUFileOps 'shut_down' callback migration: remove unused QEMUFileGetFD typedef / qemu_get_fd method migration: introduce new constructors for QEMUFile migration: hardcode assumption that QEMUFile is backed with QIOChannel migration: stop passing 'opaque' parameter to QEMUFile hooks migration: convert savevm to use QIOChannelBlock for VMState migration: introduce a QIOChannel impl for BlockDriverState VMState migration: rename qemu_file_update_transfer to qemu_file_acct_rate_limit migration: rename qemu_update_position to qemu_file_credit_transfer migration: rename qemu_ftell to qemu_file_total_transferred migration: rename 'pos' field in QEMUFile to 'bytes_processed' migration: rename rate limiting fields in QEMUFile migration: remove unreachble RDMA code in save_hook impl migration: switch to use QIOChannelNull for dummy channel ... Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2022-06-23 10:14:20 -07:00
Daniel P. Berrangé	77ef2dc1c8	migration: remove the QEMUFileOps abstraction Now that all QEMUFile callbacks are removed, the entire concept can be deleted. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-06-23 10:18:13 +01:00
Daniel P. Berrangé	02bdbe172d	migration: remove the QEMUFileOps 'get_return_path' callback This directly implements the get_return_path logic using QIOChannel APIs. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-06-23 10:18:13 +01:00
Daniel P. Berrangé	ec2135eec8	migration: remove the QEMUFileOps 'writev_buffer' callback This directly implements the writev_buffer logic using QIOChannel APIs. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-06-23 10:18:13 +01:00
Daniel P. Berrangé	f759d7050b	migration: remove the QEMUFileOps 'get_buffer' callback This directly implements the get_buffer logic using QIOChannel APIs. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Fixup len = -EIO as spotted by Peter Xu	2022-06-23 10:17:58 +01:00
Daniel P. Berrangé	0ae1f7f055	migration: remove the QEMUFileOps 'close' callback This directly implements the close logic using QIOChannel APIs. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-06-22 19:33:43 +01:00
Daniel P. Berrangé	80ad97069c	migration: remove the QEMUFileOps 'set_blocking' callback This directly implements the set_blocking logic using QIOChannel APIs. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-06-22 19:33:43 +01:00
Daniel P. Berrangé	d3c581b750	migration: remove the QEMUFileOps 'shut_down' callback This directly implements the shutdown logic using QIOChannel APIs. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-06-22 19:33:43 +01:00
Daniel P. Berrangé	0f58c3fcc7	migration: remove unused QEMUFileGetFD typedef / qemu_get_fd method Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-06-22 19:33:43 +01:00
Daniel P. Berrangé	c0c6e1e2dd	migration: introduce new constructors for QEMUFile Prepare for the elimination of QEMUFileOps by introducing a pair of new constructors. This lets us distinguish between an input and output file object explicitly rather than via the existance of specific callbacks. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-06-22 19:33:43 +01:00
Daniel P. Berrangé	2893a2884b	migration: hardcode assumption that QEMUFile is backed with QIOChannel The only callers of qemu_fopen_ops pass 'true' for the 'has_ioc' parameter, so hardcode this assumption in QEMUFile, by passing in the QIOChannel object as a non-opaque parameter. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Fixed long line	2022-06-22 19:33:43 +01:00
Daniel P. Berrangé	365c0463db	migration: stop passing 'opaque' parameter to QEMUFile hooks The only user of the hooks is RDMA which provides a QIOChannel backed impl of QEMUFile. It can thus use the qemu_file_get_ioc() method. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-06-22 19:33:43 +01:00
Daniel P. Berrangé	67bdabe2af	migration: convert savevm to use QIOChannelBlock for VMState With this change, all QEMUFile usage is backed by QIOChannel at last. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Wrap long lines	2022-06-22 19:33:43 +01:00
Daniel P. Berrangé	65cf200a51	migration: introduce a QIOChannel impl for BlockDriverState VMState Introduce a QIOChannelBlock class that exposes the BlockDriverState VMState region for I/O. This is kept in the migration/ directory rather than io/, to avoid a mutual dependancy between block/ <-> io/ directories. Also the VMState should only be used by the migration code. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Fixed coding style in qio_channel_block_close	2022-06-22 19:33:43 +01:00
Daniel P. Berrangé	bc698c367d	migration: rename qemu_file_update_transfer to qemu_file_acct_rate_limit The qemu_file_update_transfer name doesn't give a clear guide on what its purpose is, and how it differs from the qemu_file_credit_transfer method. The latter is specifically for accumulating for total migration traffic, while the former is specifically for accounting in thue rate limit calculations. The new name give better guidance on its usage. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-06-22 19:33:43 +01:00
Daniel P. Berrangé	1a93bd2f60	migration: rename qemu_update_position to qemu_file_credit_transfer The qemu_update_position method name gives the misleading impression that it is changing the current file offset. Most of the files are just streams, however, so there's no concept of a file offset in the general case. What this method is actually used for is to report on the number of bytes that have been transferred out of band from the main I/O methods. This new name better reflects this purpose. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-06-22 19:33:43 +01:00
Daniel P. Berrangé	fbfa6404e5	migration: rename qemu_ftell to qemu_file_total_transferred The name 'ftell' gives the misleading impression that the QEMUFile objects are seekable. This is not the case, as in general we just have an opaque stream. The users of this method are only interested in the total bytes processed. This switches to a new name that reflects the intended usage. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Wrapped long line	2022-06-22 19:33:36 +01:00

1 2 3 4 5 ...

96814 Commits