mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Christoph Müllner	af99aa72ef	RISC-V: Adding T-Head MemPair extension This patch adds support for the T-Head MemPair instructions. The patch uses the T-Head specific decoder and translation. Co-developed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Message-Id: <20230131202013.2541053-9-christoph.muellner@vrull.eu> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2023-02-07 08:19:23 +10:00
Christoph Müllner	b8a5832b87	RISC-V: Adding T-Head multiply-accumulate instructions This patch adds support for the T-Head MAC instructions. The patch uses the T-Head specific decoder and translation. Co-developed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Message-Id: <20230131202013.2541053-8-christoph.muellner@vrull.eu> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2023-02-07 08:19:23 +10:00
Christoph Müllner	3290933853	RISC-V: Adding XTheadCondMov ISA extension This patch adds support for the XTheadCondMov ISA extension. The patch uses the T-Head specific decoder and translation. Co-developed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Message-Id: <20230131202013.2541053-7-christoph.muellner@vrull.eu> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2023-02-07 08:19:23 +10:00
Christoph Müllner	fa13458546	RISC-V: Adding XTheadBs ISA extension This patch adds support for the XTheadBs ISA extension. The patch uses the T-Head specific decoder and translation. Co-developed-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Co-developed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Message-Id: <20230131202013.2541053-6-christoph.muellner@vrull.eu> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2023-02-07 08:19:23 +10:00
Christoph Müllner	426c049196	RISC-V: Adding XTheadBb ISA extension This patch adds support for the XTheadBb ISA extension. The patch uses the T-Head specific decoder and translation. Co-developed-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Co-developed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Message-Id: <20230131202013.2541053-5-christoph.muellner@vrull.eu> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2023-02-07 08:19:23 +10:00
Christoph Müllner	c9410a689f	RISC-V: Adding XTheadBa ISA extension This patch adds support for the XTheadBa ISA extension. The patch uses the T-Head specific decoder and translation. Co-developed-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Co-developed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Message-Id: <20230131202013.2541053-4-christoph.muellner@vrull.eu> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2023-02-07 08:19:23 +10:00
Christoph Müllner	134c3ffa34	RISC-V: Adding XTheadSync ISA extension This patch adds support for the XTheadSync ISA extension. The patch uses the T-Head specific decoder and translation. The implementation introduces a helper to execute synchronization tasks: helper_tlb_flush_all() performs a synchronized TLB flush on all CPUs. Co-developed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Message-Id: <20230131202013.2541053-3-christoph.muellner@vrull.eu> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2023-02-07 08:19:23 +10:00
Christoph Müllner	49a7f3aabb	RISC-V: Adding XTheadCmo ISA extension This patch adds support for the XTheadCmo ISA extension. To avoid interfering with standard extensions, decoder and translation are in its own xthead* specific files. Future patches should be able to easily add additional T-Head extension. The implementation does not have much functionality (besides accepting the instructions and not qualifying them as illegal instructions if the hart executes in the required privilege level for the instruction), as QEMU does not model CPU caches and instructions are documented to not raise any exceptions. Co-developed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20230131202013.2541053-2-christoph.muellner@vrull.eu> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2023-02-07 08:19:23 +10:00
Daniel Henrique Barboza	4b402886ac	hw/riscv: change riscv_compute_fdt_addr() semantics As it is now, riscv_compute_fdt_addr() is receiving a dram_base, a mem_size (which is defaulted to MachineState::ram_size in all boards) and the FDT pointer. And it makes a very important assumption: the DRAM interval dram_base + mem_size is contiguous. This is indeed the case for most boards that use a FDT. The Icicle Kit board works with 2 distinct RAM banks that are separated by a gap. We have a lower bank with 1GiB size, a gap follows, then at 64GiB the high memory starts. MachineClass::default_ram_size for this board is set to 1.5Gb, and machine_init() is enforcing it as minimal RAM size, meaning that there we'll always have at least 512 MiB in the Hi RAM area. Using riscv_compute_fdt_addr() in this board is weird because not only the board has sparse RAM, and it's calling it using the base address of the Lo RAM area, but it's also using a mem_size that we have guarantees that it will go up to the Hi RAM. All the function assumptions doesn't work for this board. In fact, what makes the function works at all in this case is a coincidence. Commit `1a475d39ef` introduced a 3GB boundary for the FDT, down from 4Gb, that is enforced if dram_base is lower than 3072 MiB. For the Icicle Kit board, memmap[MICROCHIP_PFSOC_DRAM_LO].base is 0x80000000 (2 Gb) and it has a 1Gb size, so it will fall in the conditions to put the FDT under a 3Gb address, which happens to be exactly at the end of DRAM_LO. If the base address of the Lo area started later than 3Gb this function would be unusable by the board. Changing any assumptions inside riscv_compute_fdt_addr() can also break it by accident as well. Let's change riscv_compute_fdt_addr() semantics to be appropriate to the Icicle Kit board and for future boards that might have sparse RAM topologies to worry about: - relieve the condition that the dram_base + mem_size area is contiguous, since this is already not the case today; - receive an extra 'dram_size' size attribute that refers to a contiguous RAM block that the board wants the FDT to reside on. Together with 'mem_size' and 'fdt', which are now now being consumed by a MachineState pointer, we're able to make clear assumptions based on the DRAM block and total mem_size available to ensure that the FDT will be put in a valid RAM address. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20230201171212.1219375-4-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2023-02-07 08:19:23 +10:00
Daniel Henrique Barboza	bc2c015353	hw/riscv: split fdt address calculation from fdt load A common trend in other archs is to calculate the fdt address, which is usually straightforward, and then calling a function that loads the fdt/dtb by using that address. riscv_load_fdt() is doing a bit too much in comparison. It's calculating the fdt address via an elaborated heuristic to put the FDT at the bottom of DRAM, and "bottom of DRAM" will vary across boards and configurations, then it's actually loading the fdt, and finally it's returning the fdt address used to the caller. Reduce the existing complexity of riscv_load_fdt() by splitting its code into a new function, riscv_compute_fdt_addr(), that will take care of all fdt address logic. riscv_load_fdt() can then be a simple function that just loads a fdt at the given fdt address. We're also taken the opportunity to clarify the intentions and assumptions made by these functions. riscv_load_fdt() is now receiving a hwaddr as fdt_addr because there is no restriction of having to load the fdt in higher addresses that doesn't fit in an uint32_t. Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Message-Id: <20230201171212.1219375-3-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2023-02-07 08:19:23 +10:00
Daniel Henrique Barboza	909f7da604	hw/riscv/boot.c: calculate fdt size after fdt_pack() fdt_pack() can change the fdt size, meaning that fdt_totalsize() can contain a now deprecated (bigger) value. Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Message-Id: <20230201171212.1219375-2-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2023-02-07 08:19:23 +10:00
Sergey Matyukevich	26934f9a95	target/riscv: set tval for triggered watchpoints According to privileged spec, if [sm]tval is written with a nonzero value when a breakpoint exception occurs, then [sm]tval will contain the faulting virtual address. Set tval to hit address when breakpoint exception is triggered by hardware watchpoint. Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com> Reviewed-by: Bin Meng <bmeng@tinylab.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20230131170955.752743-1-geomatsi@gmail.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2023-02-07 08:19:23 +10:00
Daniel Henrique Barboza	606a2439ba	hw/riscv/spike.c: rename MachineState 'mc' pointers to' ms' Follow the QEMU convention of naming MachineState pointers as 'ms' by renaming the instances where we're calling it 'mc'. Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Message-Id: <20230124212234.412630-4-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2023-02-07 08:19:23 +10:00
Daniel Henrique Barboza	568e0614d0	hw/riscv/virt.c: rename MachineState 'mc' pointers to 'ms' We have a convention in other QEMU boards/archs to name MachineState pointers as either 'machine' or 'ms'. MachineClass pointers are usually called 'mc'. The 'virt' RISC-V machine has a lot of instances where MachineState pointers are named 'mc'. There is nothing wrong with that, but we gain more compatibility with the rest of the QEMU code base, and easier reviews, if we follow QEMU conventions. Rename all 'mc' MachineState pointers to 'ms'. This is a very tedious and mechanical patch that was produced by doing the following: - find/replace all 'MachineState mc' to 'MachineState ms'; - find/replace all 'mc->fdt' to 'ms->fdt'; - find/replace all 'mc->smp.cpus' to 'ms->smp.cpus'; - replace any remaining occurrences of 'mc' that the compiler complained about. Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Message-Id: <20230124212234.412630-3-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2023-02-07 08:19:23 +10:00
Daniel Henrique Barboza	2967f37d44	hw/riscv/virt.c: calculate socket count once in create_fdt_imsic() riscv_socket_count() returns either ms->numa_state->num_nodes or 1 depending on NUMA support. In any case the value can be retrieved only once and used in the rest of the function. This will also alleviate the rename we're going to do next by reducing the instances of MachineState 'mc' inside hw/riscv/virt.c. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Message-Id: <20230124212234.412630-2-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2023-02-07 08:19:23 +10:00
Anup Patel	f008a2d218	target/riscv: Ensure opcode is saved for all relevant instructions We should call decode_save_opc() for all relevant instructions which can potentially generate a virtual instruction fault or a guest page fault because generating transformed instruction upon guest page fault expects opcode to be available. Without this, hypervisor will see transformed instruction as zero in htinst CSR for guest MMIO emulation which makes MMIO emulation in hypervisor slow and also breaks nested virtualization. Fixes: `a9814e3e08` ("target/riscv: Minimize the calls to decode_save_opc") Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20230120125950.2246378-5-apatel@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2023-02-07 08:19:23 +10:00
Anup Patel	ae0edf2188	target/riscv: No need to re-start QEMU timer when timecmp == UINT64_MAX The time CSR will wrap-around immediately after reaching UINT64_MAX so we don't need to re-start QEMU timer when timecmp == UINT64_MAX in riscv_timer_write_timecmp(). Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20230120125950.2246378-4-apatel@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2023-02-07 08:19:23 +10:00
Anup Patel	14cb78bfaf	target/riscv: Don't clear mask in riscv_cpu_update_mip() for VSTIP Instead of clearing mask in riscv_cpu_update_mip() for VSTIP, we should call riscv_cpu_update_mip() with mask == 0 from timer_helper.c for VSTIP. Fixes: `3ec0fe18a3` ("target/riscv: Add vstimecmp suppor") Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20230120125950.2246378-3-apatel@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2023-02-07 08:19:22 +10:00
Anup Patel	2cfb3b6c9b	target/riscv: Update VS timer whenever htimedelta changes The htimedelta[h] CSR has impact on the VS timer comparison so we should call riscv_timer_write_timecmp() whenever htimedelta changes. Fixes: `3ec0fe18a3` ("target/riscv: Add vstimecmp suppor") Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20230120125950.2246378-2-apatel@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2023-02-07 08:19:22 +10:00
Alistair Francis	32c435a1ae	hw/riscv: boot: Don't use CSRs if they are disabled If the CSRs and CSR instructions are disabled because the Zicsr extension isn't enabled then we want to make sure we don't run any CSR instructions in the boot ROM. This patches removes the CSR instructions from the reset-vec if the extension isn't enabled. We replace the instruction with a NOP instead. Note that we don't do this for the SiFive U machine, as we are modelling the hardware in that case. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1447 Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Message-Id: <20230123035754.75553-1-alistair.francis@opensource.wdc.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2023-02-07 08:19:22 +10:00
Wilfred Mallawa	7ae7146287	include/hw/riscv/opentitan: update opentitan IRQs Updates the opentitan IRQs to match the latest supported commit of Opentitan from TockOS. OPENTITAN_SUPPORTED_SHA := 565e4af39760a123c59a184aa2f5812a961fde47 Memory layout as per [1] [1] `565e4af397/hw/top_earlgrey/sw/autogen/top_earlgrey_memory.h` Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20230123063619.222459-1-wilfred.mallawa@opensource.wdc.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2023-02-07 08:19:22 +10:00
Philipp Tomsich	3de1fb712a	target/riscv: update disas.c for xnor/orn/andn and slli.uw The decoding of the following instructions from Zb[abcs] currently contains decoding/printing errors: * xnor,orn,andn: the rs2 operand is not being printed * slli.uw: decodes and prints the immediate shift-amount as a register (e.g. 'shift-by-2' becomes 'sp') instead of interpreting this as an immediate This commit updates the instruction descriptions to use the appropriate decoding/printing formats. Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20230120151551.1022761-1-philipp.tomsich@vrull.eu> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>	2023-02-07 08:19:22 +10:00
Jiang Jiacheng	1b1f4ab69c	migration: save/delete migration thread info To support query migration thread infomation, save and delete thread(live_migration and multifdsend) information at thread creation and finish. Signed-off-by: Jiang Jiacheng <jiangjiacheng@huawei.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:57 +01:00
Jiang Jiacheng	671326201d	migration: Introduce interface query-migrationthreads Introduce interface query-migrationthreads. The interface is used to query information about migration threads and returns with migration thread's name and its id. Introduce threadinfo.c to manage threads with migration. Signed-off-by: Jiang Jiacheng <jiangjiacheng@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:57 +01:00
Zhenzhong Duan	ebfc578715	multifd: Fix flush of zero copy page send request Make IO channel flush call after the inflight request has been drained in multifd thread, or else we may missed to flush the inflight request. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:57 +01:00
Zhenzhong Duan	ddbe628c97	multifd: Fix a race on reading MultiFDPages_t.block In multifd_queue_page() MultiFDPages_t.block is checked twice. Between the two checks, MultiFDPages_t.block may be reset to NULL by multifd thread. This lead to the 2nd check always true then a redundant page submitted to multifd thread again. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:57 +01:00
manish.mishra	6720c2b327	migration: check magic value for deciding the mapping of channels Current logic assumes that channel connections on the destination side are always established in the same order as the source and the first one will always be the main channel followed by the multifid or post-copy preemption channel. This may not be always true, as even if a channel has a connection established on the source side it can be in the pending state on the destination side and a newer connection can be established first. Basically causing out of order mapping of channels on the destination side. Currently, all channels except post-copy preempt send a magic number, this patch uses that magic number to decide the type of channel. This logic is applicable only for precopy(multifd) live migration, as mentioned, the post-copy preempt channel does not send any magic number. Also, tls live migrations already does tls handshake before creating other channels, so this issue is not possible with tls, hence this logic is avoided for tls live migrations. This patch uses read peek to check the magic number of channels so that current data/control stream management remains un-effected. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Suggested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: manish.mishra <manish.mishra@nutanix.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:57 +01:00
manish.mishra	84615a19dd	io: Add support for MSG_PEEK for socket channel MSG_PEEK peeks at the channel, The data is treated as unread and the next read shall still return this data. This support is currently added only for socket class. Extra parameter 'flags' is added to io_readv calls to pass extra read flags like MSG_PEEK. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Suggested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: manish.mishra <manish.mishra@nutanix.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
Zhenzhong Duan	bd9510d385	migration/dirtyrate: Show sample pages only in page-sampling mode The value of "Sample Pages" is confusing in mode other than page-sampling. See below: (qemu) calc_dirty_rate -b 10 520 (qemu) info dirty_rate Status: measuring Start Time: 11646834 (ms) Sample Pages: 520 (per GB) Period: 10 (sec) Mode: dirty-bitmap Dirty rate: (not ready) (qemu) info dirty_rate Status: measured Start Time: 11646834 (ms) Sample Pages: 0 (per GB) Period: 10 (sec) Mode: dirty-bitmap Dirty rate: 2 (MB/s) While it's totally useless in dirty-ring and dirty-bitmap mode, fix to show it only in page-sampling mode. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
Dr. David Alan Gilbert	bb25a72895	migration: Perform vmsd structure check during tests Perform a check on vmsd structures during test runs in the hope of catching any missing terminators and other simple screwups. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
Dr. David Alan Gilbert	89c5684891	migration: Add canary to VMSTATE_END_OF_LIST We fairly regularly forget VMSTATE_END_OF_LIST markers off descriptions; given that the current check is only for ->name being NULL, sometimes we get unlucky and the code apparently works and no one spots the error. Explicitly add a flag, VMS_END that should be set, and assert it is set during the traversal. Note: This can't go in until we update the copy of vmstate.h in slirp. Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
Fiona Ebner	74ecf6ac2b	migration/rdma: fix return value for qio_channel_rdma_{readv,writev} upon errors. As the documentation in include/io/channel.h states, only -1 and QIO_CHANNEL_ERR_BLOCK should be returned upon error. Other values have the potential to confuse the call sites. error_setg is used rather than error_setg_errno, because there are certain code paths where -1 (as a non-errno) is propagated up (e.g. starting from qemu_rdma_block_for_wrid or qemu_rdma_post_recv_control) all the way to qio_channel_rdma_{readv,writev}. Similar to `a216ec85b7` ("migration/channel-block: fix return value for qio_channel_block_{readv,writev}"). Suggested-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
Peter Xu	db18dee7d7	migration: Show downtime during postcopy phase The downtime should be displayed during postcopy phase because the switchover phase is done. OTOH it's weird to show "expected downtime" which can confuse what does that mean if the switchover has already happened anyway. This is a slight ABI change on QMP, but I assume it shouldn't affect anyone. Reviewed-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
David Hildenbrand	d71920d425	virtio-mem: Proper support for preallocation with migration Ordinary memory preallocation runs when QEMU starts up and creates the memory backends, before processing the incoming migration stream. With virtio-mem, we don't know which memory blocks to preallocate before migration started. Now that we migrate the virtio-mem bitmap early, before migrating any RAM content, we can safely preallocate memory for all plugged memory blocks before migrating any RAM content. This is especially relevant for the following cases: (1) User errors With hugetlb/files, if we don't have sufficient backend memory available on the migration destination, we'll crash QEMU (SIGBUS) during RAM migration when running out of backend memory. Preallocating memory before actual RAM migration allows for failing gracefully and informing the user about the setup problem. (2) Excluded memory ranges during migration For example, virtio-balloon free page hinting will exclude some pages from getting migrated. In that case, we won't crash during RAM migration, but later, when running the VM on the destination, which is bad. To fix this for new QEMU machines that migrate the bitmap early, preallocate the memory early, before any RAM migration. Warn with old QEMU machines. Getting postcopy right is a bit tricky, but we essentially now implement the same (problematic) preallocation logic as ordinary preallocation: preallocate memory early and discard it again before precopy starts. During ordinary preallocation, discarding of RAM happens when postcopy is advised. As the state (bitmap) is loaded after postcopy was advised but before postcopy starts listening, we have to discard memory we preallocated immediately again ourselves. Note that nothing (not even hugetlb reservations) guarantees for postcopy that backend memory (especially, hugetlb pages) are still free after they were freed ones while discarding RAM. Still, allocating that memory at least once helps catching some basic setup problems. Before this change, trying to restore a VM when insufficient hugetlb pages are around results in the process crashing to to a "Bus error" (SIGBUS). With this change, QEMU fails gracefully: qemu-system-x86_64: qemu_prealloc_mem: preallocating memory failed: Bad address qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-mem-device-early' qemu-system-x86_64: load of migration failed: Cannot allocate memory And we can even introspect the early migration data, including the bitmap: $ ./scripts/analyze-migration.py -f STATEFILE { "ram (2)": { "section sizes": { "0000:00:03.0/mem0": "0x0000000780000000", "0000:00:04.0/mem1": "0x0000000780000000", "pc.ram": "0x0000000100000000", "/rom@etc/acpi/tables": "0x0000000000020000", "pc.bios": "0x0000000000040000", "0000:00:02.0/e1000.rom": "0x0000000000040000", "pc.rom": "0x0000000000020000", "/rom@etc/table-loader": "0x0000000000001000", "/rom@etc/acpi/rsdp": "0x0000000000001000" } }, "0000:00:03.0/virtio-mem-device-early (51)": { "tmp": "00 00 00 01 40 00 00 00 00 00 00 07 80 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00", "size": "0x0000000040000000", "bitmap": "ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [...] }, "0000:00:04.0/virtio-mem-device-early (53)": { "tmp": "00 00 00 08 c0 00 00 00 00 00 00 07 80 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00", "size": "0x00000001fa400000", "bitmap": "ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [...] }, [...] Reported-by: Jing Qi <jinqi@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>S Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
David Hildenbrand	3b95a71b22	virtio-mem: Migrate immutable properties early The bitmap and the size are immutable while migration is active: see virtio_mem_is_busy(). We can migrate this information early, before migrating any actual RAM content. Further, all information we need for sanity checks is immutable as well. Having this information in place early will, for example, allow for properly preallocating memory before touching these memory locations during RAM migration: this way, we can make sure that all memory was actually preallocated and that any user errors (e.g., insufficient hugetlb pages) can be handled gracefully. In contrast, usable_region_size and requested_size can theoretically still be modified on the source while the VM is running. Keep migrating these properties the usual, late, way. Use a new device property to keep behavior of compat machines unmodified. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>S Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
David Hildenbrand	ce1761f0f9	virtio-mem: Fail if a memory backend with "prealloc=on" is specified "prealloc=on" for the memory backend does not work as expected, as virtio-mem will simply discard all preallocated memory immediately again. In the best case, it's an expensive NOP. In the worst case, it's an unexpected allocation error. Instead, "prealloc=on" should be specified for the virtio-mem device only, such that virtio-mem will try preallocating memory before plugging memory dynamically to the guest. Fail if such a memory backend is provided. Tested-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>S Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
David Hildenbrand	80fe315c38	migration/ram: Factor out check for advised postcopy Let's factor out this check, to be used in virtio-mem context next. While at it, fix a spelling error in a related comment. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>S Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
David Hildenbrand	508f7988fd	migration/vmstate: Introduce VMSTATE_WITH_TMP_TEST() and VMSTATE_BITMAP_TEST() We'll make use of both next in the context of virtio-mem. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>S Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
David Hildenbrand	62f42625d4	migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM) For virtio-mem, we want to have the plugged/unplugged state of memory blocks available before migrating any actual RAM content, and perform sanity checks before touching anything on the destination. This information is immutable on the migration source while migration is active, We want to use this information for proper preallocation support with migration: currently, we don't preallocate memory on the migration target, and especially with hugetlb, we can easily run out of hugetlb pages during RAM migration and will crash (SIGBUS) instead of catching this gracefully via preallocation. Migrating device state via a VMSD before we start iterating is currently impossible: the only approach that would be possible is avoiding a VMSD and migrating state manually during save_setup(), to be restored during load_state(). Let's allow for migrating device state via a VMSD early, during the setup phase in qemu_savevm_state_setup(). To keep it simple, we indicate applicable VMSD's using an "early_setup" flag. Note that only very selected devices (i.e., ones seriously messing with RAM setup) are supposed to make use of such early state migration. While at it, also use a bool for the "unmigratable" member. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>S Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
David Hildenbrand	e3bf5e68e2	migration/savevm: Prepare vmdesc json writer in qemu_savevm_state_setup() ... and store it in the migration state. This is a preparation for storing selected vmds's already in qemu_savevm_state_setup(). Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
David Hildenbrand	5e104f24e7	migration/savevm: Move more savevm handling into vmstate_save() Let's move more code into vmstate_save(), reducing code duplication and preparing for reuse of vmstate_save() in qemu_savevm_state_setup(). We have to move vmstate_save() to make the compiler happy. We'll now also trace from qemu_save_device_state(), triggering the same tracepoints as previously called from qemu_savevm_state_complete_precopy_non_iterable() only. Note that qemu_save_device_state() ignores iterable device state, such as RAM, and consequently doesn't trigger some other trace points (e.g., trace_savevm_state_setup()). Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
David Hildenbrand	e41c57702e	migration/ram: Optimize ram_write_tracking_start() for RamDiscardManager ram_block_populate_read() already optimizes for RamDiscardManager. However, ram_write_tracking_start() will still try protecting discarded memory ranges. Let's optimize, because discarded ranges don't map any pages and (1) For anonymous memory, trying to protect using uffd-wp without a mapped page is ignored by the kernel and consequently a NOP. (2) For shared/file-backed memory, we will fill present page tables in the range with PTE markers. However, we will even allocate page tables just to fill them with unnecessary PTE markers and effectively waste memory. So let's exclude these ranges, just like ram_block_populate_read() already does. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
David Hildenbrand	59bcc049c1	migration/ram: Rely on used_length for uffd_change_protection() ram_mig_ram_block_resized() will abort migration (including background snapshots) when resizing a RAMBlock. ram_block_populate_read() will only populate RAM up to used_length, so at least for anonymous memory protecting everything between used_length and max_length won't actually be protected and is just a NOP. So let's only protect everything up to used_length. Note: it still makes sense to register uffd-wp for max_length, such that RAM_UF_WRITEPROTECT is independent of a changing used_length. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
David Hildenbrand	7cc8e9e0fa	migration/ram: Don't explicitly unprotect when unregistering uffd-wp When unregistering uffd-wp, older kernels before commit f369b07c86143 ("mm/uffd:reset write protection when unregister with wp-mode") won't clear the uffd-wp PTE bit. When re-registering uffd-wp, the previous uffd-wp PTE bits would trigger again. With above commit, the kernel will clear the uffd-wp PTE bits when unregistering itself. Consequently, we'll clear the uffd-wp PTE bits now twice -- whereby we don't care about clearing them at all: a new background snapshot will re-register uffd-wp and re-protect all memory either way. So let's skip the manual clearing of uffd-wp. If ever relevant, we could clear conditionally in uffd_unregister_memory() -- we just need a way to figure out more recent kernels. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
David Hildenbrand	72ef3a3708	migration/ram: Fix error handling in ram_write_tracking_start() If something goes wrong during uffd_change_protection(), we would miss to unregister uffd-wp and not release our reference. Fix it by performing the uffd_change_protection(true) last. Note that a uffd_change_protection(false) on the recovery path without a prior uffd_change_protection(false) is fine. Fixes: `278e2f551a` ("migration: support UFFD write fault processing in ram_save_iterate()") Cc: qemu-stable@nongnu.org Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
David Hildenbrand	5f19a44919	migration/ram: Fix populate_read_range() Unfortunately, commit `f7b9dcfbcf` broke populate_read_range(): the loop end condition is very wrong, resulting in that function not populating the full range. Lets' fix that. Fixes: `f7b9dcfbcf` ("migration/ram: Factor out populating pages readable in ram_block_populate_pages()") Cc: qemu-stable@nongnu.org Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
Peter Xu	d5890ea072	util/userfaultfd: Add uffd_open() Add a helper to create the uffd handle. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
Juan Quintela	d9df92925e	migration: simplify migration_iteration_run() Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2023-02-06 19:22:56 +01:00
Juan Quintela	fd70385d38	migration: Remove unused threshold_size parameter Until previous commit, save_live_pending() was used for ram. Now with the split into state_pending_estimate() and state_pending_exact() it is not needed anymore, so remove them. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2023-02-06 19:22:56 +01:00
Juan Quintela	c8df4a7aef	migration: Split save_live_pending() into state_pending_* We split the function into to: - state_pending_estimate: We estimate the remaining state size without stopping the machine. - state pending_exact: We calculate the exact amount of remaining state. The only "device" that implements different functions for _estimate() and _exact() is ram. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2023-02-06 19:22:56 +01:00

... 4 5 6 7 8 ...

101336 Commits