In spapr_machine_init() we clamp the size of the RMA to 16GiB and the
comment saying why doesn't make a whole lot of sense. In fact, this was
done because the real mode handling code elsewhere limited the RMA in TCG
mode to the maximum value configurable in LPCR[RMLS], 16GiB.
But,
* Actually LPCR[RMLS] has been able to encode a 256GiB size for a very
long time, we just didn't implement it properly in the softmmu
* LPCR[RMLS] shouldn't really be relevant anyway, it only was because we
used to abuse the RMOR based translation mode in order to handle the
fact that we're not modelling the hypervisor parts of the cpu
We've now removed those limitations in the modelling so the 16GiB clamp no
longer serves a function. However, we can't just remove the limit
universally: that would break migration to earlier qemu versions, where
the 16GiB RMLS limit still applies, no matter how bad the reasons for it
are.
So, we replace the 16GiB clamp, with a clamp to a limit defined in the
machine type class. We set it to 16 GiB for machine types 4.2 and earlier,
but set it to 0 meaning unlimited for the new 5.0 machine type.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
The Real Mode Area (RMA) is the part of memory which a guest can access
when in real (MMU off) mode. Of course, for a guest under KVM, the MMU
isn't really turned off, it's just in a special translation mode - Virtual
Real Mode Area (VRMA) - which looks like real mode in guest mode.
The mechanics of how this works when using the hash MMU (HPT) put a
constraint on the size of the RMA, which depends on the size of the
HPT. So, the latter part of spapr_setup_hpt_and_vrma() clamps the RMA
we advertise to the guest based on this VRMA limit.
There are several things wrong with this:
1) spapr_setup_hpt_and_vrma() doesn't actually clamp, it takes the minimum
of Node 0 memory size and the VRMA limit. That will *often* work the
same as clamping, but there can be other constraints on RMA size which
supersede Node 0 memory size. We have real bugs caused by this
(currently worked around in the guest kernel)
2) Some callers of spapr_setup_hpt_and_vrma() are in a situation where
we're past the point that we can actually advertise an RMA limit to the
guest
3) But most fundamentally, the VRMA limit depends on host configuration
(page size) which shouldn't be visible to the guest, but this partially
exposes it. This can cause problems with migration in certain edge
cases, although we will mostly get away with it.
In practice, this clamping is almost never applied anyway. With 64kiB
pages and the normal rules for sizing of the HPT, the theoretical VRMA
limit will be 4x(guest memory size) and so never hit. It will hit with
4kiB pages, where it will be (guest memory size)/4. However all mainstream
distro kernels for POWER have used a 64kiB page size for at least 10 years.
So, simply replace this logic with a check that the RMA we've calculated
based only on guest visible configuration will fit within the host implied
VRMA limit. This can break if running HPT guests on a host kernel with
4kiB page size. As noted that's very rare. There also exist several
possible workarounds:
* Change the host kernel to use 64kiB pages
* Use radix MMU (RPT) guests instead of HPT
* Use 64kiB hugepages on the host to back guest memory
* Increase the guest memory size so that the RMA hits one of the fixed
limits before the RMA limit. This is relatively easy on POWER8 which
has a 16GiB limit, harder on POWER9 which has a 1TiB limit.
* Use a guest NUMA configuration which artificially constrains the RMA
within the VRMA limit (the RMA must always fit within Node 0).
Previously, on KVM, we also temporarily reduced the rma_size to 256M so
that the we'd load the kernel and initrd safely, regardless of the VRMA
limit. This was a) confusing, b) could significantly limit the size of
images we could load and c) introduced a behavioural difference between
KVM and TCG. So we remove that as well.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
If a hot plug or unplug request is pending at CAS, we currently trigger
a CAS reboot, which severely increases the guest boot time. This is
because SLOF doesn't handle hot plug events and we had no way to fix
the FDT that gets presented to the guest.
We can do better thanks to recent changes in QEMU and SLOF:
- we now return a full FDT to SLOF during CAS
- SLOF was fixed to correctly detect any device that was either added or
removed since boot time and to update its internal DT accordingly.
The right solution is to process all pending hot plug/unplug requests
during CAS: convert hot plugged devices to cold plugged devices and
remove the hot unplugged ones, which is exactly what spapr_drc_reset()
does. Also clear all hot plug events that are currently queued since
they're no longer relevant.
Note that SLOF cannot currently populate hot plugged PCI bridges or PHBs
at CAS. Until this limitation is lifted, SLOF will reset the machine when
this scenario occurs : this will allow the FDT to be fully processed when
SLOF is started again (ie. the same effect as the CAS reboot that would
occur anyway without this patch).
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <158257222352.4102917.8984214333937947307.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* Fix various bugs that might result in an assert() due to
incorrect hflags for M-profile CPUs
* Fix Aspeed SMC Controller user-mode select handling
* Report correct (with-tag) address in fault address register
when TBI is enabled
* cubieboard: make sure SOC object isn't leaked
* fsl-imx25: Wire up eSDHC controllers
* fsl-imx25: Wire up USB controllers
* New board model: orangepi-pc (OrangePi PC)
* ARM/KVM: if user doesn't select GIC version and the
host kernel can only provide GICv3, use that, rather
than defaulting to "fail because GICv2 isn't possible"
* kvm: Only do KVM_SET_VCPU_EVENTS at the last stage of sync
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAl5qZsIZHHBldGVyLm1h
eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3krfD/40xprKOtpel6si3edDQsw5
j6LamqJDvaUdtG713OjR6yjvvQiXCw9yCDlfGhBlLhLW1t0aGKrrZoRC4CNMOt0J
34WAcDP0iz3ALEwpfNfr/DWFwiGjamabrRsGcq08Q42G7+UA7FhUEvL25ZApfRFy
8O2gs3bDv30pfa5oJwYJhvSHcNeR9YKueK+WGw16gRkSoNbjUxnSpyiGnxbMaNpg
aL+ZQzQ0BOAyeOg/0LUdZ9meAvWwR0NppgK0ujJxq68/6tPz8tv2+pQgllNYSQRO
vDr4mj6MlJNW62M5IAKRZ/6zTz34+7UYQ7mTK2VTWt2qtfrANz+EpcDljtc/8EIF
lAVd1W099DNdqgFcUGZzoWyRbmjz9B76WTJ43orY5AbMZN5l4XwAGItkE6yQbqKd
kqPKP2ICFj/0JhgBoTzo0J/5wV2izZKKnih990IJU390oWoiVRbdWlQDJ2ujQ3AV
havWhR/tL399K1UZl8act/J9rifq9J3mbiqpx2XEEiFMu93FDNCPtJioix1Swvpx
ERMB9VA6JNAHZ6oAGgNmTHG3nSJtpcin8XxR5YcKWSYiksPjkce1sEwtRbyxBHtq
jb/yk5mjyrbYTy3Gmg85/Fh74XnELsnwmADFdezHXUu4EPxth/ssCuXlXs8DIciI
sWGFVJpDoWSXEqi4FjvhIQ==
=LLqm
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20200312' into staging
target-arm queue:
* Fix various bugs that might result in an assert() due to
incorrect hflags for M-profile CPUs
* Fix Aspeed SMC Controller user-mode select handling
* Report correct (with-tag) address in fault address register
when TBI is enabled
* cubieboard: make sure SOC object isn't leaked
* fsl-imx25: Wire up eSDHC controllers
* fsl-imx25: Wire up USB controllers
* New board model: orangepi-pc (OrangePi PC)
* ARM/KVM: if user doesn't select GIC version and the
host kernel can only provide GICv3, use that, rather
than defaulting to "fail because GICv2 isn't possible"
* kvm: Only do KVM_SET_VCPU_EVENTS at the last stage of sync
# gpg: Signature made Thu 12 Mar 2020 16:43:46 GMT
# gpg: using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg: issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [ultimate]
# gpg: aka "Peter Maydell <pmaydell@gmail.com>" [ultimate]
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [ultimate]
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE
* remotes/pmaydell/tags/pull-target-arm-20200312: (36 commits)
target/arm: kvm: Inject events at the last stage of sync
hw/arm/virt: kvm: allow gicv3 by default if v2 cannot work
hw/arm/virt: kvm: Restructure finalize_gic_version()
target/arm/kvm: Let kvm_arm_vgic_probe() return a bitmap
hw/arm/virt: Introduce finalize_gic_version()
hw/arm/virt: Introduce VirtGICType enum type
hw/arm/virt: Document 'max' value in gic-version property description
docs: add Orange Pi PC document
tests/boot_linux_console: Test booting NetBSD via U-Boot on OrangePi PC
tests/boot_linux_console: Add a SLOW test booting Ubuntu on OrangePi PC
tests/boot_linux_console: Add a SD card test for the OrangePi PC board
tests/boot_linux_console: Add initrd test for the Orange Pi PC board
tests/boot_linux_console: Add a quick test for the OrangePi PC board
hw/arm/allwinner: add RTC device support
hw/arm/allwinner-h3: add SDRAM controller device
hw/arm/allwinner-h3: add Boot ROM support
hw/arm/allwinner-h3: add EMAC ethernet device
hw/arm/allwinner: add SD/MMC host controller
hw/arm/allwinner: add Security Identifier device
hw/arm/allwinner: add CPU Configuration module
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Let's move the code which freezes which gic-version to
be applied in a dedicated function. We also now set by
default the VIRT_GIC_VERSION_NO_SET. This eventually
turns into the legacy v2 choice in the finalize() function.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 20200311131618.7187-4-eric.auger@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We plan to introduce yet another value for the gic version (nosel).
As we already use exotic values such as 0 and -1, let's introduce
a dedicated enum type and let vms->gic_version take this
type.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 20200311131618.7187-3-eric.auger@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Allwinner System-on-Chips usually contain a Real Time Clock (RTC)
for non-volatile system date and time keeping. This commit adds a generic
Allwinner RTC device that supports the RTC devices found in Allwinner SoC
family sun4i (A10), sun7i (A20) and sun6i and newer (A31, H2+, H3, etc).
The following RTC functionality and features are implemented:
* Year-Month-Day read/write
* Hour-Minute-Second read/write
* General Purpose storage
The following boards are extended with the RTC device:
* Cubieboard (hw/arm/cubieboard.c)
* Orange Pi PC (hw/arm/orangepi.c)
Signed-off-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20200311221854.30370-13-nieklinnenbank@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In the Allwinner H3 SoC the SDRAM controller is responsible
for interfacing with the external Synchronous Dynamic Random
Access Memory (SDRAM). Types of memory that the SDRAM controller
supports are DDR2/DDR3 and capacities of up to 2GiB. This commit
adds emulation support of the Allwinner H3 SDRAM controller.
Signed-off-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20200311221854.30370-12-nieklinnenbank@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
A real Allwinner H3 SoC contains a Boot ROM which is the
first code that runs right after the SoC is powered on.
The Boot ROM is responsible for loading user code (e.g. a bootloader)
from any of the supported external devices and writing the downloaded
code to internal SRAM. After loading the SoC begins executing the code
written to SRAM.
This commits adds emulation of the Boot ROM firmware setup functionality
by loading user code from SD card in the A1 SRAM. While the A1 SRAM is
64KiB, we limit the size to 32KiB because the real H3 Boot ROM also rejects
sizes larger than 32KiB. For reference, this behaviour is documented
by the Linux Sunxi project wiki at:
https://linux-sunxi.org/BROM#U-Boot_SPL_limitations
Signed-off-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20200311221854.30370-11-nieklinnenbank@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The Allwinner Sun8i System on Chip family includes an Ethernet MAC (EMAC)
which provides 10M/100M/1000M Ethernet connectivity. This commit
adds support for the Allwinner EMAC from the Sun8i family (H2+, H3, A33, etc),
including emulation for the following functionality:
* DMA transfers
* MII interface
* Transmit CRC calculation
Signed-off-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20200311221854.30370-10-nieklinnenbank@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The Allwinner System on Chip families sun4i and above contain
an integrated storage controller for Secure Digital (SD) and
Multi Media Card (MMC) interfaces. This commit adds support
for the Allwinner SD/MMC storage controller with the following
emulated features:
* DMA transfers
* Direct FIFO I/O
* Short/Long format command responses
* Auto-Stop command (CMD12)
* Insert & remove card detection
The following boards are extended with the SD host controller:
* Cubieboard (hw/arm/cubieboard.c)
* Orange Pi PC (hw/arm/orangepi.c)
Signed-off-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200311221854.30370-9-nieklinnenbank@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The Security Identifier device found in various Allwinner System on Chip
designs gives applications a per-board unique identifier. This commit
adds support for the Allwinner Security Identifier using a 128-bit
UUID value as input.
Signed-off-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20200311221854.30370-8-nieklinnenbank@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Various Allwinner System on Chip designs contain multiple processors
that can be configured and reset using the generic CPU Configuration
module interface. This commit adds support for the Allwinner CPU
configuration interface which emulates the following features:
* CPU reset
* CPU status
Signed-off-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20200311221854.30370-7-nieklinnenbank@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The Allwinner H3 System on Chip has an System Control
module that provides system wide generic controls and
device information. This commit adds support for the
Allwinner H3 System Control module.
Signed-off-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200311221854.30370-6-nieklinnenbank@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The Allwinner H3 System on Chip contains multiple USB 2.0 bus
connections which provide software access using the Enhanced
Host Controller Interface (EHCI) and Open Host Controller
Interface (OHCI) interfaces. This commit adds support for
both interfaces in the Allwinner H3 System on Chip.
Signed-off-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200311221854.30370-5-nieklinnenbank@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The Clock Control Unit is responsible for clock signal generation,
configuration and distribution in the Allwinner H3 System on Chip.
This commit adds support for the Clock Control Unit which emulates
a simple read/write register interface.
Signed-off-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200311221854.30370-4-nieklinnenbank@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The Allwinner H3 is a System on Chip containing four ARM Cortex A7
processor cores. Features and specifications include DDR2/DDR3 memory,
SD/MMC storage cards, 10/100/1000Mbit Ethernet, USB 2.0, HDMI and
various I/O modules. This commit adds support for the Allwinner H3
System on Chip.
Signed-off-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200311221854.30370-2-nieklinnenbank@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
i.MX25 supports two USB controllers. Let's wire them up.
With this patch, imx25-pdk can boot from both USB ports.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Message-id: 20200310215146.19688-3-linux@roeck-us.net
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Wire up eSDHC controllers in fsl-imx25. For imx25-pdk, connect drives
provided on the command line to available eSDHC controllers.
This patch enables booting the imx25-pdk emulation from SD card.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Message-id: 20200310215146.19688-2-linux@roeck-us.net
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: made commit subject consistent with other patch]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Using the new 'bdrv_co_delete_file' interface, a pure co_routine function
'bdrv_co_delete_file' inside block.c can can be used in a way similar of
the existing bdrv_create_file to to clean up a created file.
We're creating a pure co_routine because the only caller of
'bdrv_co_delete_file' will be already in co_routine context, thus there
is no need to add all the machinery to check for qemu_in_coroutine() and
create a separated co_routine to do the job.
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200130213907.2830642-3-danielhb413@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Adding to Block Drivers the capability of being able to clean up
its created files can be useful in certain situations. For the
LUKS driver, for instance, a failure in one of its authentication
steps can leave files in the host that weren't there before.
This patch adds the 'bdrv_co_delete_file' interface to block
drivers and add it to the 'file' driver in file-posix.c. The
implementation is given by 'raw_co_delete_file'.
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200130213907.2830642-2-danielhb413@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20200310113831.27293-2-kwolf@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Hide structure definitions and add explicit API instead, to keep an
eye on the scope of the shared fields.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200311103004.7649-10-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
offset/bytes pair is more usual naming in block layer, let's use it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200311103004.7649-8-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
We have a lot of "chunk_end - start" invocations, let's switch to
bytes/cur_bytes scheme instead.
While being here, improve check on block_copy_do_copy parameters to not
overflow when calculating nbytes and use int64_t for bytes in
block_copy for consistency.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200311103004.7649-7-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Assume we have two regions, A and B, and region B is in-flight now,
region A is not yet touched, but it is unallocated and should be
skipped.
Correspondingly, as progress we have
total = A + B
current = 0
If we reset unallocated region A and call progress_reset_callback,
it will calculate 0 bytes dirty in the bitmap and call
job_progress_set_remaining, which will set
total = current + 0 = 0 + 0 = 0
So, B bytes are actually removed from total accounting. When job
finishes we'll have
total = 0
current = B
, which doesn't sound good.
This is because we didn't considered in-flight bytes, actually when
calculating remaining, we should have set (in_flight + dirty_bytes)
as remaining, not only dirty_bytes.
To fix it, let's refactor progress calculation, moving it to block-copy
itself instead of fixing callback. And, of course, track in_flight
bytes count.
We still have to keep one callback, to maintain backup job bytes_read
calculation, but it will go on soon, when we turn the whole backup
process into one block_copy call.
Cc: qemu-stable@nongnu.org
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Message-Id: <20200311103004.7649-3-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
We need it in separate to pass to the block-copy object in the next
commit.
Cc: qemu-stable@nongnu.org
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200311103004.7649-2-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
The qcow2 .bdrv_measure() code calculates the crypto payload offset.
This logic really belongs in crypto/block.c where it can be reused by
other image formats.
The "luks" block driver will need this same logic in order to implement
.bdrv_measure(), so extract the qcrypto_block_calculate_payload_offset()
function now.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200221112522.1497712-2-stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200308092440.23564-12-mlevitsk@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200308092440.23564-11-mlevitsk@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200308092440.23564-10-mlevitsk@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200308092440.23564-9-mlevitsk@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
hmp_snapshot_blkdev is from GPLv2 version of the hmp-cmds.c thus
have to change the licence to GPLv2
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200308092440.23564-8-mlevitsk@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200308092440.23564-7-mlevitsk@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Moved code was added after 2012-01-13, thus under GPLv2+
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200308092440.23564-6-mlevitsk@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Fixed commit message
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20200308092440.23564-5-mlevitsk@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
These days device-hotplug.c only contains the hmp_drive_add
In the next patch, rest of hmp_drive* functions will be moved
there.
Also add block-hmp-cmds.h to contain prototypes of these
functions
License for block-hmp-cmds.h since it contains the code
moved from sysemu.h which lacks license and thus according
to LICENSE is under GPLv2+
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200308092440.23564-4-mlevitsk@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
When there are many poll handlers it's likely that some of them are idle
most of the time. Remove handlers that haven't had activity recently so
that the polling loop scales better for guests with a large number of
devices.
This feature only takes effect for the Linux io_uring fd monitoring
implementation because it is capable of combining fd monitoring with
userspace polling. The other implementations can't do that and risk
starving fds in favor of poll handlers, so don't try this optimization
when they are in use.
IOPS improves from 10k to 105k when the guest has 100
virtio-blk-pci,num-queues=32 devices and 1 virtio-blk-pci,num-queues=1
device for rw=randread,iodepth=1,bs=4k,ioengine=libaio on NVMe.
[Clarified aio_poll_handlers locking discipline explanation in comment
after discussion with Paolo Bonzini <pbonzini@redhat.com>.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20200305170806.1313245-8-stefanha@redhat.com
Message-Id: <20200305170806.1313245-8-stefanha@redhat.com>
Unlike ppoll(2) and epoll(7), Linux io_uring completions can be polled
from userspace. Previously userspace polling was only allowed when all
AioHandler's had an ->io_poll() callback. This prevented starvation of
fds by userspace pollable handlers.
Add the FDMonOps->need_wait() callback that enables userspace polling
even when some AioHandlers lack ->io_poll().
For example, it's now possible to do userspace polling when a TCP/IP
socket is monitored thanks to Linux io_uring.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20200305170806.1313245-7-stefanha@redhat.com
Message-Id: <20200305170806.1313245-7-stefanha@redhat.com>
The recent Linux io_uring API has several advantages over ppoll(2) and
epoll(2). Details are given in the source code.
Add an io_uring implementation and make it the default on Linux.
Performance is the same as with epoll(7) but later patches add
optimizations that take advantage of io_uring.
It is necessary to change how aio_set_fd_handler() deals with deleting
AioHandlers since removing monitored file descriptors is asynchronous in
io_uring. fdmon_io_uring_remove() marks the AioHandler deleted and
aio_set_fd_handler() will let it handle deletion in that case.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20200305170806.1313245-6-stefanha@redhat.com
Message-Id: <20200305170806.1313245-6-stefanha@redhat.com>
The AioHandler *node, bool is_new arguments are more complicated to
think about than simply being given AioHandler *old_node, AioHandler
*new_node.
Furthermore, the new Linux io_uring file descriptor monitoring mechanism
added by the new patch requires access to both the old and the new
nodes. Make this change now in preparation.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20200305170806.1313245-5-stefanha@redhat.com
Message-Id: <20200305170806.1313245-5-stefanha@redhat.com>
The ppoll(2) and epoll(7) file descriptor monitoring implementations are
mixed with the core util/aio-posix.c code. Before adding another
implementation for Linux io_uring, extract out the existing
ones so there is a clear interface and the core code is simpler.
The new interface is AioContext->fdmon_ops, a pointer to a FDMonOps
struct. See the patch for details.
Semantic changes:
1. ppoll(2) now reflects events from pollfds[] back into AioHandlers
while we're still on the clock for adaptive polling. This was
already happening for epoll(7), so if it's really an issue then we'll
need to fix both in the future.
2. epoll(7)'s fallback to ppoll(2) while external events are disabled
was broken when the number of fds exceeded the epoll(7) upgrade
threshold. I guess this code path simply wasn't tested and no one
noticed the bug. I didn't go out of my way to fix it but the correct
code is simpler than preserving the bug.
I also took some liberties in removing the unnecessary
AioContext->epoll_available (just check AioContext->epollfd != -1
instead) and AioContext->epoll_enabled (it's implicit if our
AioContext->fdmon_ops callbacks are being invoked) fields.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20200305170806.1313245-4-stefanha@redhat.com
Message-Id: <20200305170806.1313245-4-stefanha@redhat.com>
Do not leave stale linked list pointers around after removal. It's
safer to set them to NULL so that use-after-removal results in an
immediate segfault.
The RCU queue removal macros are unchanged since nodes may still be
traversed after removal.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20200224103406.1894923-2-stefanha@redhat.com
Message-Id: <20200224103406.1894923-2-stefanha@redhat.com>
Various headers are not required by hw/i386/pc.h:
- "qemu/range.h"
- "qemu/bitmap.h"
- "qemu/module.h"
- "exec/memory.h"
- "hw/pci/pci.h"
- "hw/mem/pc-dimm.h"
- "hw/mem/nvdimm.h"
- "net/net.h"
Remove them.
Add 3 headers that were missing:
- "hw/hotplug.h"
PCMachineState::acpi_dev is of type HotplugHandler
- "qemu/notify.h"
PCMachineState::machine_done is of type Notifier
- "qapi/qapi-types-common.h"
PCMachineState::vmport/smm is of type OnOffAuto
Acked-by: John Snow <jsnow@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20200228114649.12818-19-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Only q35.c requires declarations from "hw/i386/pc.h", move it there.
Remove all the includes not used by "q35.h".
Acked-by: John Snow <jsnow@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20200228114649.12818-18-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
The MCHPCIState structure uses the Range type which is declared in
"qemu/range.h". Include it.
This fixes (when modifying unrelated headers):
In file included from hw/pci-host/q35.c:32:
include/hw/pci-host/q35.h:57:11: error: field has incomplete type 'Range' (aka 'struct Range')
Range pci_hole;
^
include/qemu/typedefs.h:116:16: note: forward declaration of 'struct Range'
typedef struct Range Range;
^
Acked-by: John Snow <jsnow@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20200228114649.12818-13-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
intel_iommu.h does not use any of these includes, remove them.
Acked-by: John Snow <jsnow@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20200228114649.12818-7-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>