As of today, when booting upstream U-Boot for Xilinx Zynq, the UART
does not receive anything. Debugging shows that the UART input clock
frequency is zero which prevents the UART from receiving anything as
per the logic in uart_receive().
From zynq_slcr_reset_exit() comment, it intends to compute output
clocks according to ps_clk and registers. zynq_slcr_compute_clocks()
is called to accomplish the task, inside which device_is_in_reset()
is called to actually make the attempt in vain.
Rework reset_hold() and reset_exit() so that in the reset exit phase,
the logic can really compute output clocks in reset_exit().
With this change, upstream U-Boot boots properly again with:
$ qemu-system-arm -M xilinx-zynq-a9 -m 1G -display none -serial null -serial stdio \
-device loader,file=u-boot-dtb.bin,addr=0x4000000,cpu-num=0
Fixes: 38867cb7ec ("hw/misc/zynq_slcr: add clock generation for uarts")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 20210901124521.30599-2-bmeng.cn@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
A PS/2 keyboard has a separate command reply queue that is
independent of the key queue. This prevents that command replies
and keyboard input mix. Keyboard command replies take precedence
over queued keystrokes. A new keyboard command removes any
remaining command replies from the command reply queue.
Implement a separate keyboard command reply queue and clear the
command reply queue before command execution. This brings the
PS/2 keyboard emulation much closer to a real PS/2 keyboard.
The command reply queue is located in a few free bytes directly
in front of the scancode queue. Because the scancode queue has
a maximum length of 16 bytes there are 240 bytes available for
the command reply queue. At the moment only a maximum of 3 bytes
are required. For compatibility reasons rptr, wptr and count kept
their function. rptr is the start, wptr is the end and count is
the length of the entire keyboard queue. The new variable cwptr
is the end of the command reply queue or -1 if the queue is
empty. To write to the command reply queue, rptr is moved
backward by the number of required bytes and the command replies
are written to the buffer starting at the new rptr position.
After writing, cwptr is at the old rptr position. Copying cwptr
to rptr clears the command reply queue. The command reply queue
can't overflow because each new keyboard command clears the
command reply queue.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/501
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/502
Signed-off-by: Volker Rümelin <vr_qemu@t-online.de>
Message-Id: <20210810133258.8231-2-vr_qemu@t-online.de>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Extend the used ps2 buffer size to the available buffer size but
keep the maximum ps2 queue size.
The next patch needs a few bytes of the larger buffer size.
Signed-off-by: Volker Rümelin <vr_qemu@t-online.de>
Message-Id: <20210810133258.8231-1-vr_qemu@t-online.de>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
These will soon be required to enable nubus devices to support interrupts.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210830102447.10806-13-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Now that q800 VIA1 and VIA2 are completely separate devices there is no need to
add a specific device prefix to ensure that the IRQ lines remain separate.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210830102447.10806-11-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Remove the mac_via device and wire up both q800 VIA1 and VIA2 directly for the
m68k q800 machine.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210830102447.10806-10-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
After this change mac_via_reset() is now empty and can be removed.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210830102447.10806-8-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
These variables are already present in MOS6522Q800VIA1State and so it is just
the VMStateDescription move that is needed.
With this change the mac_via VMStateDescription is now empty and can be removed
completely.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210830102447.10806-7-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
The ADB is accessed using clock and data pins on q800 VIA1 port B and so can be
moved to MOS6522Q800VIA1State.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210830102447.10806-6-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
The PRAM/RTC is accessed using clock and data pins on q800 VIA1 port B and so
can be moved to MOS6522Q800VIA1State.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210830102447.10806-5-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
The PRAM contents are accessed using clock and data pins on q800 VIA1 port B
and so can be moved to MOS6522Q800VIA1State.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210830102447.10806-4-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
This variable is already present in MOS6522Q800VIA1State and can be moved
immediately into the q800 VIA1 VMStateDescription.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210830102447.10806-3-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Move the parent mos6522 objects from vmstate_mac_via into the new VMStateDescription
structures to begin the process of splitting MacVIAState into separate VIA1 and
VIA2 devices.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210830102447.10806-2-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
After an SDLC "Enter hunt" command has been sent the STATUS_SYNC bit should remain
high until the flag byte has been detected. Whilst the ESCC device doesn't yet
implement SDLC mode, without this change the active low STATUS_SYNC is constantly
asserted causing the MacOS OpenTransport extension to hang on startup as it thinks
it is constantly receiving LocalTalk responses during its initial negotiation
phase.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210903113223.19551-10-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
This removes duplication of the internal device state initialisation between
device reset and soft reset.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210903113223.19551-9-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Now that register values at reset are handled elsewhere for all of device reset,
soft reset and hard reset, escc_reset_chn() only needs to handle initialisation
of internal device state.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210903113223.19551-8-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
The hardware reset differs from a device reset in that it only changes the contents
of specific registers. Remove the code that resets all the registers to zero during
hardware reset and implement the default values using the existing soft reset code
with the additional changes listed in the table in the "Z85C30 Reset" section.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210903113223.19551-7-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
The software reset differs from a device reset in that it only changes the contents
of specific registers. Remove the code that resets all the registers to zero during
soft reset and implement the default values listed in the table in the "Z85C30 Reset"
section.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210903113223.19551-6-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
This new hardware reset function is to be called for both channels when the
hardware reset bit is written to register WR9. Its initial implementation is
the same as the existing escc_reset_chn() function used for device reset.
Add a new trace event when the guest initiates a hard reset via the WR9 register
to help diagnose guest reset issues.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210903113223.19551-5-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
This new software reset function is to be called when the appropriate channel
software reset bit is written to register WR9. Its initial implementation is
the same as the existing escc_reset_chn() function used for device reset.
Add a new trace event when the guest initiates a soft reset via the WR9 register
to help diagnose guest reset issues.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210903113223.19551-4-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
This is to ensure that a device reset always returns the ESCC to a known state.
Note that this is currently redundant with the same code in escc_reset_chn()
but that will change shortly.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210903113223.19551-3-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Also fix a couple of spelling mistakes in comments.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20210903113223.19551-2-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Commit 24f675cd3b ("sparc/sun4m: Use start-powered-off CPUState property") changed
the sun4m CPU reset code to use the start-powered-off property and so split the
creation of the CPU into separate instantiation and realization phases to enable
the new start-powered-off property to be set.
This accidentally broke sun4m machines with more than one CPU present since
sparc_cpu_realizefn() sets a default CPU id, and now that realization occurs after
calling cpu_sparc_set_id() in cpu_devinit() the CPU id gets reset back to the
default instead of being uniquely encoded based upon the CPU number. As soon as
another CPU is brought online, the OS gets confused between them and promptly
panics.
Resolve the issue by moving the cpu_sparc_set_id() call in cpu_devinit() to after
the point where the CPU device has been realized as before.
Fixes: 24f675cd3b ("sparc/sun4m: Use start-powered-off CPUState property")
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210825095100.20180-1-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Add the new gen16 features to the default model and fence them for
machine version 6.1 and earlier.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210907101017.27126-1-borntraeger@de.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
The PAGE_SIZE macro is causing trouble on Alpine Linux since it
clashes with a macro from a system header there. We already have
the TARGET_PAGE_SIZE, TARGET_PAGE_MASK and TARGET_PAGE_BITS macros
in QEMU anyway, so let's simply replace the PAGE_SIZE, PAGE_MASK
and PAGE_SHIFT macro with their TARGET_* counterparts.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/572
Message-Id: <20210901125800.611183-1-thuth@redhat.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Let's enable storage keys lazily under TCG, just as we do under KVM.
Only fairly old Linux versions actually make use of storage keys, so it
can be kind of wasteful to allocate quite some memory and track
changes and references if nobody cares.
We have to make sure to flush the TLB when enabling storage keys after
the VM was already running: otherwise it might happen that we don't
catch references or modifications afterwards.
Add proper documentation to all callbacks.
The kvm-unit-tests skey tests keeps on working with this change.
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210903155514.44772-14-david@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
... and make it return a bool instead.
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210903155514.44772-13-david@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Let's validate the given address and report a proper error in case it's
not. All call paths now properly check the validity of the given GFN.
Remove the TODO.
The errors inside the getter and setter should only trigger if something
really goes wrong now, for example, with a broken migration stream. Or
when we forget to update the storage key allocation with memory hotplug.
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210903155514.44772-12-david@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Handle it similar to migration. Assert that we're holding the BQL, to
make sure we don't see concurrent modifications.
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210903155514.44772-11-david@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Let's use the guest_phys_blocks API to get physical memory regions
that are well defined inside our physical address space and migrate the
storage keys of these.
This is a preparation for having memory besides initial ram defined in
the guest physical address space, for example, via memory devices. We
get rid of the ms->ram_size dependency.
Please note that we will usually have very little (--> 1) physical
ranges. With virtio-mem might have significantly more ranges in the
future. If that turns out to be a problem (e.g., total memory
footprint of the list), we could look into a memory mapping
API that avoids creation of a list and instead triggers a callback for
each range.
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210903155514.44772-10-david@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
hsch and csch basically have two parts: execute the command,
and perform the halt/clear function. For fully emulated
subchannels, it is pretty clear how it will work: check the
subchannel state, and actually 'perform the halt/clear function'
and set cc 0 if everything looks good.
For passthrough subchannels, some of the checking is done
within QEMU, but some has to be done within the kernel. QEMU's
subchannel state may be such that we can perform the async
function, but the kernel may still get a cc != 0 when it is
actually executing the instruction. In that case, we need to
set the condition actually encountered by the kernel; if we
set cc 0 on error, we would actually need to inject an interrupt
as well.
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Jared Rossi <jrossi@linux.ibm.com>
Message-Id: <20210705163952.736020-2-cohuck@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
This patch switches to initialize dev.nvqs from the VhostNetOptions
instead of assuming it was 2. This is useful for implementing control
virtqueue support which will be a single vhost_net structure with a
single cvq.
Note that nvqs is still set to 2 for all users and this patch does not
change functionality.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20210903091031.47303-6-jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The nvqs and vqs have been initialized during vhost_net_init() and are
not expected to change during the life cycle of vhost_net
structure. So this patch removes the meaningless assignment.
Reviewed-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20210903091031.47303-4-jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We should return error code instead of zero, otherwise there's no way
for the caller to detect the failure.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20210903091031.47303-3-jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Qemu will crash on vhost backend unexpected exit and re-connect │
in some case due to access released memory.
Signed-off-by: Yuwei Zhang <zhangyuwei.9149@bytedance.com>
Message-Id: <20210830123433.45727-1-zhangyuwei.9149@bytedance.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
virtio_free_region_cache() is called within call_rcu(),
always with a non-NULL argument. Ensure new code keep it
that way by replacing the NULL check by an assertion.
Add a comment this function is called within call_rcu().
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210826172658.2116840-3-philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
While virtio_queue_packed_empty_rcu() uses the '_rcu' suffix,
it is not obvious it is called within rcu_read_lock(). All other
functions from this file called with the RCU locked have a comment
describing it. Document this one similarly for consistency.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210826172658.2116840-2-philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
There is no need to use fresh typecasts to get references to pci device structs
when there is an existing reference to pci device struct. Use existing reference.
Minor cleanup.
Signed-off-by: Ani Sinha <ani@anisinha.ca>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210825031949.919376-3-ani@anisinha.ca>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
commit c0e427d6eb ("hw/acpi/ich9: Enable ACPI PCI hot-plug") removed all
uses of find_i440fx() function. This has been replaced by the more generic call
acpi_get_i386_pci_host() which maybe able to find the root bus both for i440fx
machine type as well as for the q35 machine type. There seems to be no more any
need to maintain a i440fx specific version of the api call. Remove it.
Tested by building from a clean tree successfully.
Signed-off-by: Ani Sinha <ani@anisinha.ca>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210825031949.919376-2-ani@anisinha.ca>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Since commits aa57020774 ("numa: move numa global variable
nb_numa_nodes into MachineState") and 7e721e7b10 ("numa: move
numa global variable numa_info into MachineState"), we can get
NUMA information completely from MachineState::numa_state.
Remove PCMachineState::numa_nodes and PCMachineState::node_mem,
since they are just copied from MachineState::numa_state.
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Jingqi Liu <jingqi.liu@intel.com>
Message-Id: <20210823011254.28506-1-jingqi.liu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Vhost used to compare the dma_as against the address_space_memory to
detect whether the IOMMU is enabled or not. This might not work well
since the virito-bus may call get_dma_as if VIRTIO_F_IOMMU_PLATFORM is
set without an actual IOMMU enabled when device is plugged. In the
case of PCI where pci_get_address_space() is used, the bus master as
is returned. So vhost actually tries to enable device IOTLB even if
the IOMMU is not enabled. This will lead a lots of unnecessary
transactions between vhost and Qemu and will introduce a huge drop of
the performance.
For PCI, an ideal approach is to use pci_device_iommu_address_space()
just for get_dma_as. But Qemu may choose to initialize the IOMMU after
the virtio-pci which lead a wrong address space is returned during
device plugged. So this patch switch to use transport specific way via
iommu_enabled() to detect the IOMMU during vhost start. In this case,
we are fine since we know the IOMMU is initialized correctly.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20210804034803.1644-4-jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch implements the PCI transport version of iommu_enabled. This
is done by comparing the address space returned by
pci_device_iommu_address_space() against address_space_memory.
Note that an ideal approach is to use pci_device_iommu_address_space()
in get_dma_as(), but it might not work well since the IOMMU could be
initialized after the virtio-pci device is initialized.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20210804034803.1644-3-jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch introduce a new method for the virtio-bus for the transport
to report whether or not the IOMMU is enabled for the device.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20210804034803.1644-2-jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Let's compress the code a bit to improve readability. We can drop the
vm_running check in virtio_balloon_free_page_start() as it's already
properly checked in the single caller.
Cc: Wei Wang <wei.w.wang@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210708095339.20274-3-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Postcopy never worked properly with 'free-page-hint=on', as there are
at least two issues:
1) With postcopy, the guest will never receive a VIRTIO_BALLOON_CMD_ID_DONE
and consequently won't release free pages back to the OS once
migration finishes.
The issue is that for postcopy, we won't do a final bitmap sync while
the guest is stopped on the source and
virtio_balloon_free_page_hint_notify() will only call
virtio_balloon_free_page_done() on the source during
PRECOPY_NOTIFY_CLEANUP, after the VM state was already migrated to
the destination.
2) Once the VM touches a page on the destination that has been excluded
from migration on the source via qemu_guest_free_page_hint() while
postcopy is active, that thread will stall until postcopy finishes
and all threads are woken up. (with older Linux kernels that won't
retry faults when woken up via userfaultfd, we might actually get a
SEGFAULT)
The issue is that the source will refuse to migrate any pages that
are not marked as dirty in the dirty bmap -- for example, because the
page might just have been sent. Consequently, the faulting thread will
stall, waiting for the page to be migrated -- which could take quite
a while and result in guest OS issues.
While we could fix 1) comparatively easily, 2) is harder to get right and
might require more involved RAM migration changes on source and destination
[1].
As it never worked properly, let's not start free page hinting in the
precopy notifier if the postcopy migration capability was enabled to fix
it easily. Capabilities cannot be enabled once migration is already
running.
Note 1: in the future we might either adjust migration code on the source
to track pages that have actually been sent or adjust
migration code on source and destination to eventually send
pages multiple times from the source and and deal with pages
that are sent multiple times on the destination.
Note 2: virtio-mem has similar issues, however, access to "unplugged"
memory by the guest is very rare and we would have to be very
lucky for it to happen during migration. The spec states
"The driver SHOULD NOT read from unplugged memory blocks ..."
and "The driver MUST NOT write to unplugged memory blocks".
virtio-mem will move away from virtio_balloon_free_page_done()
soon and handle this case explicitly on the destination.
[1] https://lkml.kernel.org/r/e79fd18c-aa62-c1d8-c7f3-ba3fc2c25fc8@redhat.com
Fixes: c13c4153f7 ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
Cc: qemu-stable@nongnu.org
Cc: Wei Wang <wei.w.wang@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210708095339.20274-2-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
OBJECT_CHECK(PciHostState, ..., TYPE_PCI_HOST_BRIDGE) is exactly
what the PCI_HOST_BRIDGE macro does. We can just use the macro
instead of using OBJECT_CHECK manually.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20210805193431.307761-7-ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>