Linux limits the size of iovecs to 1024 (UIO_MAXIOV in the kernel
sources, IOV_MAX in POSIX). Because of this, on some host adapters
requests with many iovecs are rejected with -EINVAL by the
io_submit() or readv()/writev() system calls.
In fact, the same limit applies to SG_IO as well. To fix both the
EINVAL and the possible performance issues from using fewer iovecs
than allowed by Linux (some HBAs have max_segments as low as 128),
introduce a separate entry in BlockLimits to hold the max_segments
value from sysfs. This new limit is used only for SG_IO and clamped
to bs->bl.max_iov anyway, just like max_hw_transfer is clamped to
bs->bl.max_transfer.
Reported-by: Halil Pasic <pasic@linux.ibm.com>
Cc: Hanna Reitz <hreitz@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-block@nongnu.org
Cc: qemu-stable@nongnu.org
Fixes: 18473467d5 ("file-posix: try BLKSECTGET on block devices too, do not round to power of 2", 2021-06-25)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210923130436.1187591-1-pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add version of bdrv_new_open_driver() that supports QDict options.
We'll use it in further commit.
Simply add one more argument to bdrv_new_open_driver() is worse, as
there are too many invocations of bdrv_new_open_driver() to update
then.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210920115538.264372-2-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There are a couple of errors in bdrv_drained_begin header comment:
- block_job_pause does not exist anymore, it has been replaced
with job_pause in b15de82867
- job_pause is automatically invoked as a .drained_begin callback
(child_job_drained_begin) by the child_job BdrvChildClass struct
in blockjob.c. So no additional pause should be required.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20210903113800.59970-1-eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Despite the comment, the members were not kept at the end.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Use the MemOpIdx directly, rather than the rearrangement
of the same bits currently done by the trace infrastructure.
Pass in enum qemu_plugin_mem_rw so that we are able to treat
read-modify-write operations as a single operation.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Move this code from tcg/tcg.h to its own header.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We're about to move this out of tcg.h, so rename it
as we did when moving MemOp.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We have lacked expressive support for memory sizes larger
than 64-bits for a while. Fixing that requires adjustment
to several points where we used this for array indexing,
and two places that develop -Wswitch warnings after the change.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
dup_const always generates a uint64_t, which may exceed the size of a
target_long (generating warnings with recent-enough compilers).
To ensure that we can use dup_const both for 64bit and 32bit targets,
this adds dup_const_tl, which either maps back to dup_const (for 64bit
targets) or provides a similar implementation using 32bit constants.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Message-Id: <20211003214243.3813425-1-philipp.tomsich@vrull.eu>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Change caf108bc58 ("hw/i386/acpi-build: Add ACPI PCI hot-plug methods to Q35")
selects an IO address range for acpi based PCI hotplug for q35 arbitrarily. It
starts at address 0x0cc4 and ends at 0x0cdb. At the time when the patch was
written but the final version of the patch was not yet pushed upstream, this
address range was free and did not conflict with any other IO address ranges.
However, with the following change, this address range was no
longer conflict free as in this change, the IO address range
(value of ACPI_PCIHP_SIZE) was incremented by four bytes:
b32bd763a1 ("pci: introduce acpi-index property for PCI device")
This can be seen from the output of QMP command 'info mtree' :
0000000000000600-0000000000000603 (prio 0, i/o): acpi-evt
0000000000000604-0000000000000605 (prio 0, i/o): acpi-cnt
0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr
0000000000000620-000000000000062f (prio 0, i/o): acpi-gpe0
0000000000000630-0000000000000637 (prio 0, i/o): acpi-smi
0000000000000cc4-0000000000000cdb (prio 0, i/o): acpi-pci-hotplug
0000000000000cd8-0000000000000ce3 (prio 0, i/o): acpi-cpu-hotplug
It shows that there is a region of conflict between IO regions of acpi
pci hotplug and acpi cpu hotplug.
Unfortunately, the change caf108bc58 did not update the IO address range
appropriately before it was pushed upstream to accommodate the increased
length of the IO address space introduced in change b32bd763a1.
Due to this bug, windows guests complain 'This device cannot find
enough free resources it can use' in the device manager panel for extended
IO buses. This issue also breaks the correct functioning of pci hotplug as the
following shows that the IO space for pci hotplug has been truncated:
(qemu) info mtree -f
FlatView #0
AS "I/O", root: io
Root memory region: io
0000000000000cc4-0000000000000cd7 (prio 0, i/o): acpi-pci-hotplug
0000000000000cd8-0000000000000cf7 (prio 0, i/o): acpi-cpu-hotplug
Therefore, in this fix, we adjust the IO address range for the acpi pci
hotplug so that it does not conflict with cpu hotplug and there is no
truncation of IO spaces. The starting IO address of PCI hotplug region
has been decremented by four bytes in order to accommodate four byte
increment in the IO address space introduced by change
b32bd763a1 ("pci: introduce acpi-index property for PCI device")
After fixing, the following are the corrected IO ranges:
0000000000000600-0000000000000603 (prio 0, i/o): acpi-evt
0000000000000604-0000000000000605 (prio 0, i/o): acpi-cnt
0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr
0000000000000620-000000000000062f (prio 0, i/o): acpi-gpe0
0000000000000630-0000000000000637 (prio 0, i/o): acpi-smi
0000000000000cc0-0000000000000cd7 (prio 0, i/o): acpi-pci-hotplug
0000000000000cd8-0000000000000ce3 (prio 0, i/o): acpi-cpu-hotplug
This change has been tested using a Windows Server 2019 guest VM. Windows
no longer complains after this change.
Fixes: caf108bc58 ("hw/i386/acpi-build: Add ACPI PCI hot-plug methods to Q35")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/561
Signed-off-by: Ani Sinha <ani@anisinha.ca>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Julia Suvorova <jusual@redhat.com>
Message-Id: <20210916132838.3469580-3-ani@anisinha.ca>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-36-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-35-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Drop usage of packed structures and explicit endian
conversions when building table and use endian agnostic
build_append_int_noprefix() API to build it.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-34-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
it replaces error-prone pointer arithmetic for build_header() API,
with 2 calls to start and finish table creation,
which hides offsets magic from API user.
while at it, replace packed structure with endian agnostic
build_append_FOO() API.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-33-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
it replaces error-prone pointer arithmetic for build_header() API,
with 2 calls to start and finish table creation,
which hides offsets magic from API user.
while at it, replace packed structure with endian agnostic
build_append_FOO() API.
PS:
Spec is Microsoft hosted, however 1.02 is no where to be found
(MS lists only the current revision) and the current revision is 1.07,
so bring comments in line with 1.07 as this is the only available spec.
There is no content change between originally implemented 1.02
(using QEMU code as reference) and 1.07. The only change is renaming
'Reserved2' field to 'Language', with the same 0 value.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-32-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Drop usage of packed structures and explicit endian conversions
when building IORT table use endian agnostic build_append_int_noprefix()
API to build it.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20210924122802.1455362-30-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
it replaces error-prone pointer arithmetic for build_header() API,
with 2 calls to start and finish table creation,
which hides offsets magic from API user.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-29-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Drop usage of packed structures and explicit endian conversions
when building MADT table for arm/x86 and use endian agnostic
build_append_int_noprefix() API to build it.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-26-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Drop usage of packed structures and explicit endian conversions
when building MADT table for arm/x86 and use endian agnostic
build_append_int_noprefix() API to build it.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-25-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Instead of composing disabled _MAT entry and then later on
patching it to enabled for hotpluggbale CPUs in DSDT,
set it to enabled at the time _MAT entry is built.
It will allow to drop usage of packed structures in
following patches when build_madt() is switched to use
build_append_int_noprefix() API.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-24-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-23-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
it replaces error-prone pointer arithmetic for build_header() API,
with 2 calls to start and finish table creation,
which hides offsets magic from API user.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-22-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
it replaces error-prone pointer arithmetic for build_header() API,
with 2 calls to start and finish table creation,
which hides offsets magic from API user.
While at it switch to build_append_int_noprefix() to build
table entries tables.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-19-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Drop usage of packed structures and explicit endian conversions
when building SRAT tables for arm/x86 and use endian agnostic
build_append_int_noprefix() API to build it.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-18-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
it replaces error-prone pointer arithmetic for build_header() API,
with 2 calls to start and finish table creation,
which hides offsets magic from API user.
While at it switch to build_append_int_noprefix() to build
table entries (which also removes some manual offset
calculations)
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-17-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
it replaces error-prone pointer arithmetic for build_header() API,
with 2 calls to start and finish table creation,
which hides offsets magic from API user.
While at it switch to build_append_int_noprefix() to build
table entries (which also removes some manual offset
calculations).
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-16-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
it replaces error-prone pointer arithmetic for build_header() API,
with 2 calls to start and finish table creation,
which hides offsets magic from API user.
while at it convert build_hpet() to endian agnostic
build_append_FOO() API
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20210924122802.1455362-15-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
it replaces error-prone pointer arithmetic for build_header() API,
with 2 calls to start and finish table creation,
which hides offsets magic from API user.
While at it switch to build_append_int_noprefix() to build
entries to other tables (which also removes some manual offset
calculations).
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-4-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
it replaces error-prone pointer arithmetic for build_header() API,
with 2 calls to start and finish table creation,
which hides offests magic from API user.
While at it switch to build_append_int_noprefix() to build
entries to other tables (which also removes some manual offset
calculations).
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-3-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Patch introduces acpi_table_begin()/ acpi_table_end() API
that hides pointer/offset arithmetic from user as opposed
to build_header(), to prevent errors caused by it [1].
acpi_table_begin():
initializes table header and keeps track of
table data/offsets
acpi_table_end():
sets actual table length and tells bios loader
where table is for the later initialization on
guest side.
1) commits
bb9feea431 x86: acpi: use offset instead of pointer when using build_header()
4d027afeb3 Virt: ACPI: fix qemu assert due to re-assigned table data address
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-2-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Stefan Berger <stefanb@linux.ibm.com>
Tested-by: Yanan Wang <wangyanan55@huawei.com>
virtio-vsock features, like VIRTIO_VSOCK_F_SEQPACKET, can be handled
by vhost-vsock-common parent class. In this way, we can reuse the
same code for all virtio-vsock backends (i.e. vhost-vsock,
vhost-user-vsock).
Let's move `seqpacket` property to vhost-vsock-common class, add
vhost_vsock_common_get_features() used by children, and disable
`seqpacket` for vhost-user-vsock device for machine types < 6.2.
The behavior of vhost-vsock device doesn't change; vhost-user-vsock
device now supports `seqpacket` property.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20210921161642.206461-3-sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Commit 1e08fd0a46 ("vhost-vsock: SOCK_SEQPACKET feature bit support")
enabled the SEQPACKET feature bit.
This commit is released with QEMU 6.1, so if we try to migrate a VM where
the host kernel supports SEQPACKET but machine type version is less than
6.1, we get the following errors:
Features 0x130000002 unsupported. Allowed features: 0x179000000
Failed to load virtio-vhost_vsock:virtio
error while loading state for instance 0x0 of device '0000:00:05.0/virtio-vhost_vsock'
load of migration failed: Operation not permitted
Let's disable the feature bit for machine types < 6.1.
We add a new OnOffAuto property for this, called `seqpacket`.
When it is `auto` (default), QEMU behaves as before, trying to enable the
feature, when it is `on` QEMU will fail if the backend (vhost-vsock
kernel module) doesn't support it.
Fixes: 1e08fd0a46 ("vhost-vsock: SOCK_SEQPACKET feature bit support")
Cc: qemu-stable@nongnu.org
Reported-by: Jiang Wang <jiang.wang@bytedance.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20210921161642.206461-2-sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Now we have a common structure SMPCompatProps used to store information
about SMP compatibility stuff, so we can also move smp_prefer_sockets
there for cleaner code.
No functional change intended.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-15-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now we have a generic smp parser for all arches, and there will
not be any other arch specific ones, so let's remove the callback
from MachineClass and call the parser directly.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-14-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently the only difference between smp_parse and pc_smp_parse
is the support of dies parameter and the related error reporting.
With some arch compat variables like "bool dies_supported", we can
make smp_parse generic enough for all arches and the PC specific
one can be removed.
Making smp_parse() generic enough can reduce code duplication and
ease the code maintenance, and also allows extending the topology
with more arch specific members (e.g., clusters) in the future.
Suggested-by: Andrew Jones <drjones@redhat.com>
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-13-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that all the possible topology parameters are integrated in struct
CpuTopology, tweak the order of topology members to be "cpus/sockets/
dies/cores/threads/maxcpus" for readability and consistency. We also
tweak the comment by adding explanation of dies parameter.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210929025816.21076-12-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In the real SMP hardware topology world, it's much more likely that
we have high cores-per-socket counts and few sockets totally. While
the current preference of sockets over cores in smp parsing results
in a virtual cpu topology with low cores-per-sockets counts and a
large number of sockets, which is just contrary to the real world.
Given that it is better to make the virtual cpu topology be more
reflective of the real world and also for the sake of compatibility,
we start to prefer cores over sockets over threads in smp parsing
since machine type 6.2 for different arches.
In this patch, a boolean "smp_prefer_sockets" is added, and we only
enable the old preference on older machines and enable the new one
since type 6.2 for all arches by using the machine compat mechanism.
Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-10-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Here's the next batch of ppc related patches for qemu-6.2. Highlights
are:
* Fixes for several TCG math instructions from the El Dorado Institute
* A number of improvements to the powernv machine type
* Support for a new DEVICE_UNPLUG_GUEST_ERROR QAPI event from Daniel
Barboza
* Support for the new FORM2 PAPR NUMA representation. This allows
more specific NUMA distances, as well as asymmetric configurations
* Fix for 64-bit decrementer (used on MicroWatt CPUs)
* Assorted fixes and cleanups
* A number of updates to MAINTAINERS
Note that the DEVICE_UNPLUG_GUEST_ERROR stuff includes changes to
files outside my normal area, but has suitable Acks.
The MAINTAINERS updates are mostly about marking minor platforms
unmaintained / orphaned, and moving some pieces away from myself and
Greg. As we move onto other projects, we're going to need to drop
more of the ppc maintainership, though we're hoping we can avoid too
abrupt a change.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAmFVTlEACgkQbDjKyiDZ
s5IATRAAi3qB5riIXQEO41SoG5EjwJ9+2VQmagMkgwyNTQp5SyrqiWpTJ7Cq8IMF
inEgV8vHv40GFJLdXXjTvdsRkAxh1siVZR7okmbd144ZdwL/ntJ4FFiGG5BjFgf2
DajbDW/RgNDsWkTFbnhKVDNTzFbxHgeGf/g+KOPmQ5MtPPJk/Og2jJVOVE+CCY+Q
FsF7xe9ehffrGP4ObkwKM8pNMWEaJ8KP+Z1ZV5WQwQSHPZD4x1qb6aaH8JQWv7Fl
pqep5nlUv0KNsRPFj4hGhzjB9JUMMnWtSEuw5GNL0bQAj1v6tPNESRMw6fInAR/A
JlvG6iYba2ZKZ7+SGipwYyVzBPnE7ur9i2E5MaANTSQF1zgsSwcYT6J5dYSCZmNH
XrEZYYm1MyCTem8PNKrP44QRQAQz+vfHb8hUkwzNsZC7DLLsZ3JQ2AWcAvWFOKRe
/s3WyODeXa3gZMCBFbvCWVAPiKNNFGQMsa2/bkF7FpvNGl99xL3o6MDskA2dSdTK
0oreMT9OKdDS2W2LpDCasq6HHVsoJ+hc+TkWCEshEEPb7jUAIOnKqhHh/aqlXVf7
J3r68FFbNbBoPR0SgdLc4teBmUKDJIGpZT8gaJ7Vbx6IiHXH21ZzOHKuTjbJZWWN
u1vcjBBqvbQmrwhCyi6Fe5gGM63GcB/ulizM1rNyqIpTUuzyx9I=
=shYQ
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.2-20210930' into staging
ppc patch queue for 2021-09-30
Here's the next batch of ppc related patches for qemu-6.2. Highlights
are:
* Fixes for several TCG math instructions from the El Dorado Institute
* A number of improvements to the powernv machine type
* Support for a new DEVICE_UNPLUG_GUEST_ERROR QAPI event from Daniel
Barboza
* Support for the new FORM2 PAPR NUMA representation. This allows
more specific NUMA distances, as well as asymmetric configurations
* Fix for 64-bit decrementer (used on MicroWatt CPUs)
* Assorted fixes and cleanups
* A number of updates to MAINTAINERS
Note that the DEVICE_UNPLUG_GUEST_ERROR stuff includes changes to
files outside my normal area, but has suitable Acks.
The MAINTAINERS updates are mostly about marking minor platforms
unmaintained / orphaned, and moving some pieces away from myself and
Greg. As we move onto other projects, we're going to need to drop
more of the ppc maintainership, though we're hoping we can avoid too
abrupt a change.
# gpg: Signature made Thu 30 Sep 2021 06:42:41 BST
# gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dg-gitlab/tags/ppc-for-6.2-20210930: (44 commits)
MAINTAINERS: Demote sPAPR from "Supported" to "Maintained"
MAINTAINERS: Add information for OpenPIC
MAINTAINERS: Remove David & Greg as reviewers/co-maintainers of powernv
MAINTAINERS: Orphan obscure ppc platforms
MAINTAINERS: Remove David & Greg as reviewers for a number of boards
MAINTAINERS: Remove machine specific files from ppc TCG CPUs entry
spapr/xive: Fix kvm_xive_source_reset trace event
spapr_numa.c: fixes in spapr_numa_FORM2_write_rtas_tables()
hw/intc: openpic: Clean up the styles
hw/intc: openpic: Drop Raven related codes
hw/intc: openpic: Correct the reset value of IPIDR for FSL chipset
target/ppc: Fix 64-bit decrementer
target/ppc: Convert debug to trace events (decrementer and IRQ)
spapr_numa.c: handle auto NUMA node with no distance info
spapr_numa.c: FORM2 NUMA affinity support
spapr: move FORM1 verifications to post CAS
spapr_numa.c: rename numa_assoc_array to FORM1_assoc_array
spapr_numa.c: parametrize FORM1 macros
spapr_numa.c: scrap 'legacy_numa' concept
spapr_numa.c: split FORM1 code into helpers
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Provide a name field for all the memory listeners. It can be used to identify
which memory listener is which.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210817013553.30584-2-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Libvirt can use query-sgx-capabilities to get the host
sgx capabilities to decide how to allocate SGX EPC size to VM.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210910102258.46648-3-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The QMP and HMP interfaces can be used by monitor or QMP tools to retrieve
the SGX information from VM side when SGX is enabled on Intel platform.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210910102258.46648-2-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add helpers to detect if SGX EPC exists above 4g, and if so, where SGX
EPC above 4g ends. Use the helpers to adjust the device memory range
if SGX EPC exists above 4g.
For multiple virtual EPC sections, we just put them together physically
contiguous for the simplicity because we don't support EPC NUMA affinity
now. Once the SGX EPC NUMA support in the kernel SGX driver, we will
support this in the future.
Note that SGX EPC is currently hardcoded to reside above 4g.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-18-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Expose SGX to the guest if and only if KVM is enabled and supports
virtualization of SGX. While the majority of ENCLS can be emulated to
some degree, because SGX uses a hardware-based root of trust, the
attestation aspects of SGX cannot be emulated in software, i.e.
ultimately emulation will fail as software cannot generate a valid
quote/report. The complexity of partially emulating SGX in Qemu far
outweighs the value added, e.g. an SGX specific simulator for userspace
applications can emulate SGX for development and testing purposes.
Note, access to the PROVISIONKEY is not yet advertised to the guest as
KVM blocks access to the PROVISIONKEY by default and requires userspace
to provide additional credentials (via ioctl()) to expose PROVISIONKEY.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-13-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Because SGX EPC is enumerated through CPUID, EPC "devices" need to be
realized prior to realizing the vCPUs themselves, i.e. long before
generic devices are parsed and realized. From a virtualization
perspective, the CPUID aspect also means that EPC sections cannot be
hotplugged without paravirtualizing the guest kernel (hardware does
not support hotplugging as EPC sections must be locked down during
pre-boot to provide EPC's security properties).
So even though EPC sections could be realized through the generic
-devices command, they need to be created much earlier for them to
actually be usable by the guest. Place all EPC sections in a
contiguous block, somewhat arbitrarily starting after RAM above 4g.
Ensuring EPC is in a contiguous region simplifies calculations, e.g.
device memory base, PCI hole, etc..., allows dynamic calculation of the
total EPC size, e.g. exposing EPC to guests does not require -maxmem,
and last but not least allows all of EPC to be enumerated in a single
ACPI entry, which is expected by some kernels, e.g. Windows 7 and 8.
The new compound properties command for sgx like below:
......
-object memory-backend-epc,id=mem1,size=28M,prealloc=on \
-object memory-backend-epc,id=mem2,size=10M \
-M sgx-epc.0.memdev=mem1,sgx-epc.1.memdev=mem2
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-6-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
SGX EPC is enumerated through CPUID, i.e. EPC "devices" need to be
realized prior to realizing the vCPUs themselves, which occurs long
before generic devices are parsed and realized. Because of this,
do not allow 'sgx-epc' devices to be instantiated after vCPUS have
been created.
The 'sgx-epc' device is essentially a placholder at this time, it will
be fully implemented in a future patch along with a dedicated command
to create 'sgx-epc' devices.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-5-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
EPC (Enclave Page Cahe) is a specialized type of memory used by Intel
SGX (Software Guard Extensions). The SDM desribes EPC as:
The Enclave Page Cache (EPC) is the secure storage used to store
enclave pages when they are a part of an executing enclave. For an
EPC page, hardware performs additional access control checks to
restrict access to the page. After the current page access checks
and translations are performed, the hardware checks that the EPC
page is accessible to the program currently executing. Generally an
EPC page is only accessed by the owner of the executing enclave or
an instruction which is setting up an EPC page.
Because of its unique requirements, Linux manages EPC separately from
normal memory. Similar to memfd, the device /dev/sgx_vepc can be
opened to obtain a file descriptor which can in turn be used to mmap()
EPC memory.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-3-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a new RAMBlock flag to denote "protected" memory, i.e. memory that
looks and acts like RAM but is inaccessible via normal mechanisms,
including DMA. Use the flag to skip protected memory regions when
mapping RAM for DMA in VFIO.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The function ide_bus_new() does an in-place initialization. Rename
it to ide_bus_init() to follow our _init vs _new convention.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Corey Minyard <cminyard@mvista.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Acked-by: John Snow <jsnow@redhat.com> (Feel free to merge.)
Message-id: 20210923121153.23754-7-peter.maydell@linaro.org
Rename the "allocate and return" qbus creation function to
qbus_new(), to bring it into line with our _init vs _new convention.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Corey Minyard <cminyard@mvista.com>
Message-id: 20210923121153.23754-6-peter.maydell@linaro.org
Rename qbus_create_inplace() to qbus_init(); this is more in line
with our usual naming convention for functions that in-place
initialize objects.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20210923121153.23754-5-peter.maydell@linaro.org
Rename the pci_root_bus_new_inplace() function to
pci_root_bus_init(); this brings the bus type in to line with a
"_init for in-place init, _new for allocate-and-return" convention.
To do this we need to rename the implementation-internal function
that was using the pci_root_bus_init() name to
pci_root_bus_internal_init().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20210923121153.23754-4-peter.maydell@linaro.org
Rename ipack_bus_new_inplace() to ipack_bus_init(), to bring it in to
line with a "_init for in-place init, _new for allocate-and-return"
convention. Drop the 'name' argument, because the only caller does
not pass in a name. If a future caller does need to specify the bus
name, we should create an ipack_bus_init_named() function at that
point.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20210923121153.23754-3-peter.maydell@linaro.org
The function scsi_bus_new() creates a new SCSI bus; callers can
either pass in a name argument to specify the name of the new bus, or
they can pass in NULL to allow the bus to be given an automatically
generated unique name. Almost all callers want to use the
autogenerated name; the only exception is the virtio-scsi device.
Taking a name argument that should almost always be NULL is an
easy-to-misuse API design -- it encourages callers to think perhaps
they should pass in some standard name like "scsi" or "scsi-bus". We
don't do this anywhere for SCSI, but we do (incorrectly) do it for
other bus types such as i2c.
The function name also implies that it will return a newly allocated
object, when it in fact does in-place allocation. We more commonly
name such functions foo_init(), with foo_new() being the
allocate-and-return variant.
Replace all the scsi_bus_new() callsites with either:
* scsi_bus_init() for the usual case where the caller wants
an autogenerated bus name
* scsi_bus_init_named() for the rare case where the caller
needs to specify the bus name
and document that for the _named() version it's then the caller's
responsibility to think about uniqueness of bus names.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20210923121153.23754-2-peter.maydell@linaro.org
Connect the support for ZynqMP eFUSE one-time field-programmable
bit array.
The command argument:
-drive if=pflash,index=3,...
Can be used to optionally connect the bit array to a
backend storage, such that field-programmed values
in one invocation can be made available to next
invocation.
The backend storage must be a seekable binary file, and
its size must be 768 bytes or larger. A file with all
binary 0's is a 'blank'.
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-9-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Connect the support for Xilinx ZynqMP Battery-Backed RAM (BBRAM)
The command argument:
-drive if=pflash,index=2,...
Can be used to optionally connect the bbram to a backend
storage, such that field-programmed values in one
invocation can be made available to next invocation.
The backend storage must be a seekable binary file, and
its size must be 36 bytes or larger. A file with all
binary 0's is a 'blank'.
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-8-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Connect the support for Versal eFUSE one-time field-programmable
bit array.
The command argument:
-drive if=pflash,index=1,...
Can be used to optionally connect the bit array to a
backend storage, such that field-programmed values
in one invocation can be made available to next
invocation.
The backend storage must be a seekable binary file, and
its size must be 3072 bytes or larger. A file with all
binary 0's is a 'blank'.
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-7-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Connect the support for Versal Battery-Backed RAM (BBRAM)
The command argument:
-drive if=pflash,index=0,...
Can be used to optionally connect the bbram to a backend
storage, such that field-programmed values in one
invocation can be made available to next invocation.
The backend storage must be a seekable binary file, and
its size must be 36 bytes or larger. A file with all
binary 0's is a 'blank'.
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-6-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This device is present in Versal and ZynqMP product
families to store a 256-bit encryption key.
Co-authored-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Co-authored-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-5-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This implements the Xilinx ZynqMP eFuse, an one-time
field-programmable non-volatile storage device. There is
only one such device in the Xilinx ZynqMP product family.
Co-authored-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Co-authored-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-4-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This implements the Xilinx Versal eFuse, an one-time
field-programmable non-volatile storage device. There is
only one such device in the Xilinx Versal product family.
This device has two separate mmio interfaces, a controller
and a flatten readback.
The controller provides interfaces for field-programming,
configuration, control, and status.
The flatten readback is a cache to provide a byte-accessible
read-only interface to efficiently read efuse array.
Co-authored-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Co-authored-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-3-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This introduces the QOM for Xilinx eFuse, an one-time
field-programmable storage bit array.
The actual mmio interface to the array varies by device
families and will be provided in different change-sets.
Co-authored-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Co-authored-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Sai Pavan Boddu <sai.pavan.boddu@xilinx.com>
Signed-off-by: Tong Ho <tong.ho@xilinx.com>
Message-id: 20210917052400.1249094-2-tong.ho@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
- Vladimir Sementsov-Ogievskiy: Rework coroutines of qemu NBD client
to improve reconnect support
- Eric Blake: Relax server in regards to NBD_OPT_LIST_META_CONTEXT
- Vladimir Sementsov-Ogievskiy: Plumb up 64-bit bulk-zeroing support
in block layer, in preparation for future NBD spec extensions
- Nir Soffer: Default to writeback cache in qemu-nbd
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmFU1a4ACgkQp6FrSiUn
Q2obEggAq4KgnBILBip6TsFc9p7yzVpK9qWzri0vhJHOI7p+dE5Bxqt8sLBEdMQP
OiV5WE/ZRHogGULXXJChBmUTK2/z0U57AXVvbCpKWgr48kSj818dk8uhuUjBeWaM
5Qc8PsH+/Rij5WRlVmTePu7QMwXp1h3+gkw3fmLlHRkvzO4MmOBn1lrOqnCxSGbo
xnFvfdeplNiexmpImdO6QSaHfDsmnUqOFpBPZUsKXfdHOiRqg2I3eU1ibv8eQH7B
oTtBxWI0KHc6kWaJUWqZbEe4ChJTHvAsVdZxmB+il1diIl46lq+s/Zn7nT+0HXTD
pS/Fws9rrGKS/7PnGeFOfskVVCRnIw==
=VGqR
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2021-09-27-v2' into staging
nbd patches for 2021-09-27
- Vladimir Sementsov-Ogievskiy: Rework coroutines of qemu NBD client
to improve reconnect support
- Eric Blake: Relax server in regards to NBD_OPT_LIST_META_CONTEXT
- Vladimir Sementsov-Ogievskiy: Plumb up 64-bit bulk-zeroing support
in block layer, in preparation for future NBD spec extensions
- Nir Soffer: Default to writeback cache in qemu-nbd
# gpg: Signature made Wed 29 Sep 2021 22:07:58 BST
# gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg: aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A
* remotes/ericb/tags/pull-nbd-2021-09-27-v2:
block/nbd: check that received handle is valid
block/nbd: drop connection_co
block/nbd: refactor nbd_recv_coroutines_wake_all()
block/nbd: move nbd_recv_coroutines_wake_all() up
block/nbd: nbd_channel_error() shutdown channel unconditionally
nbd/client-connection: nbd_co_establish_connection(): fix non set errp
nbd/server: Allow LIST_META_CONTEXT without STRUCTURED_REPLY
block/io: allow 64bit discard requests
block: use int64_t instead of int in driver discard handlers
block: make BlockLimits::max_pdiscard 64bit
block/io: allow 64bit write-zeroes requests
block: use int64_t instead of int in driver write_zeroes handlers
block: make BlockLimits::max_pwrite_zeroes 64bit
block: use int64_t instead of uint64_t in copy_range driver handlers
block: use int64_t instead of uint64_t in driver write handlers
block: use int64_t instead of uint64_t in driver read handlers
qcow2: check request on vmstate save/load path
block/io: bring request check to bdrv_co_(read,write)v_vmstate
qemu-nbd: Change default cache mode to writeback
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Correct the multi-line comment format. No functional changes.
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Message-Id: <20210918032653.646370-3-bin.meng@windriver.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There is no machine that uses Motorola MCP750 (aka Raven) model.
Drop the related codes.
While we are here, drop the mentioning of Intel GW80314 I/O
companion chip in the comments as it has been obsolete for years,
and correct a typo too.
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Message-Id: <20210918032653.646370-2-bin.meng@windriver.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The main feature of FORM2 affinity support is the separation of NUMA
distances from ibm,associativity information. This allows for a more
flexible and straightforward NUMA distance assignment without relying on
complex associations between several levels of NUMA via
ibm,associativity matches. Another feature is its extensibility. This base
support contains the facilities for NUMA distance assignment, but in the
future more facilities will be added for latency, performance, bandwidth
and so on.
This patch implements the base FORM2 affinity support as follows:
- the use of FORM2 associativity is indicated by using bit 2 of byte 5
of ibm,architecture-vec-5. A FORM2 aware guest can choose to use FORM1
or FORM2 affinity. Setting both forms will default to FORM2. We're not
advertising FORM2 for pseries-6.1 and older machine versions to prevent
guest visible changes in those;
- ibm,associativity-reference-points has a new semantic. Instead of
being used to calculate distances via NUMA levels, it's now used to
indicate the primary domain index in the ibm,associativity domain of
each resource. In our case it's set to {0x4}, matching the position
where we already place logical_domain_id;
- two new RTAS DT artifacts are introduced: ibm,numa-lookup-index-table
and ibm,numa-distance-table. The index table is used to list all the
NUMA logical domains of the platform, in ascending order, and allows for
spartial NUMA configurations (although QEMU ATM doesn't support that).
ibm,numa-distance-table is an array that contains all the distances from
the first NUMA node to all other nodes, then the second NUMA node
distances to all other nodes and so on;
- get_max_dist_ref_points(), get_numa_assoc_size() and get_associativity()
now checks for OV5_FORM2_AFFINITY and returns FORM2 values if the guest
selected FORM2 affinity during CAS.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-7-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
FORM2 NUMA affinity is prepared to deal with empty (memory/cpu less)
NUMA nodes. This is used by the DAX KMEM driver to locate a PAPR SCM
device that has a different latency than the original NUMA node from the
regular memory. FORM2 is also able to deal with asymmetric NUMA
distances gracefully, something that our FORM1 implementation doesn't
do.
Move these FORM1 verifications to a new function and wait until after
CAS, when we're sure that we're sticking with FORM1, to enforce them.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-6-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Introducing a new NUMA affinity, FORM2, requires a new mechanism to
switch between affinity modes after CAS. Also, we want FORM2 data
structures and functions to be completely separated from the existing
FORM1 code, allowing us to avoid adding new code that inherits the
existing complexity of FORM1.
The idea of switching values used by the write_dt() functions in
spapr_numa.c was already introduced in the previous patch, and
the same approach will be used when dealing with the FORM1 and FORM2
arrays.
We can accomplish that by that by renaming the existing numa_assoc_array
to FORM1_assoc_array, which now is used exclusively to handle FORM1 affinity
data. A new helper get_associativity() is then introduced to be used by the
write_dt() functions to retrieve the current ibm,associativity array of
a given node, after considering affinity selection that might have been
done during CAS. All code that was using numa_assoc_array now needs to
retrieve the array by calling this function.
This will allow for an easier plug of FORM2 data later on.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-5-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The next preliminary step to introduce NUMA FORM2 affinity is to make
the existing code independent of FORM1 macros and values, i.e.
MAX_DISTANCE_REF_POINTS, NUMA_ASSOC_SIZE and VCPU_ASSOC_SIZE. This patch
accomplishes that by doing the following:
- move the NUMA related macros from spapr.h to spapr_numa.c where they
are used. spapr.h gets instead a 'NUMA_NODES_MAX_NUM' macro that is used
to refer to the maximum number of NUMA nodes, including GPU nodes, that
the machine can support;
- MAX_DISTANCE_REF_POINTS and NUMA_ASSOC_SIZE are renamed to
FORM1_DIST_REF_POINTS and FORM1_NUMA_ASSOC_SIZE. These FORM1 specific
macros are used in FORM1 init functions;
- code that uses MAX_DISTANCE_REF_POINTS now retrieves the
max_dist_ref_points value using get_max_dist_ref_points().
NUMA_ASSOC_SIZE is replaced by get_numa_assoc_size() and VCPU_ASSOC_SIZE
is replaced by get_vcpu_assoc_size(). These functions are used by the
generic device tree functions and h_home_node_associativity() and will
allow them to switch between FORM1 and FORM2 without changing their core
logic.
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-4-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
So, convert driver discard handlers bytes parameter to int64_t.
The only caller of all updated function is bdrv_co_pdiscard in
block/io.c. It is already prepared to work with 64bit requests, but
pass at most max(bs->bl.max_pdiscard, INT_MAX) to the driver.
Let's look at all updated functions:
blkdebug: all calculations are still OK, thanks to
bdrv_check_qiov_request().
both rule_check and bdrv_co_pdiscard are 64bit
blklogwrites: pass to blk_loc_writes_co_log which is 64bit
blkreplay, copy-on-read, filter-compress: pass to bdrv_co_pdiscard, OK
copy-before-write: pass to bdrv_co_pdiscard which is 64bit and to
cbw_do_copy_before_write which is 64bit
file-posix: one handler calls raw_account_discard() is 64bit and both
handlers calls raw_do_pdiscard(). Update raw_do_pdiscard, which pass
to RawPosixAIOData::aio_nbytes, which is 64bit (and calls
raw_account_discard())
gluster: somehow, third argument of glfs_discard_async is size_t.
Let's set max_pdiscard accordingly.
iscsi: iscsi_allocmap_set_invalid is 64bit,
!is_byte_request_lun_aligned is 64bit.
list.num is uint32_t. Let's clarify max_pdiscard and
pdiscard_alignment.
mirror_top: pass to bdrv_mirror_top_do_write() which is
64bit
nbd: protocol limitation. max_pdiscard is alredy set strict enough,
keep it as is for now.
nvme: buf.nlb is uint32_t and we do shift. So, add corresponding limits
to nvme_refresh_limits().
preallocate: pass to bdrv_co_pdiscard() which is 64bit.
rbd: pass to qemu_rbd_start_co() which is 64bit.
qcow2: calculations are still OK, thanks to bdrv_check_qiov_request(),
qcow2_cluster_discard() is 64bit.
raw-format: raw_adjust_offset() is 64bit, bdrv_co_pdiscard too.
throttle: pass to bdrv_co_pdiscard() which is 64bit and to
throttle_group_co_io_limits_intercept() which is 64bit as well.
test-block-iothread: bytes argument is unused
Great! Now all drivers are prepared to handle 64bit discard requests,
or else have explicit max_pdiscard limits.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210903102807.27127-11-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We are going to support 64 bit discard requests. Now update the
limit variable. It's absolutely safe. The variable is set in some
drivers, and used in bdrv_co_pdiscard().
Update also max_pdiscard variable in bdrv_co_pdiscard(), so that
bdrv_co_pdiscard() is now prepared for 64bit requests. The remaining
logic including num, offset and bytes variables is already
supporting 64bit requests.
So the only thing that prevents 64 bit requests is limiting
max_pdiscard variable to INT_MAX in bdrv_co_pdiscard().
We'll drop this limitation after updating all block drivers.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210903102807.27127-10-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
So, convert driver write_zeroes handlers bytes parameter to int64_t.
The only caller of all updated function is bdrv_co_do_pwrite_zeroes().
bdrv_co_do_pwrite_zeroes() itself is of course OK with widening of
callee parameter type. Also, bdrv_co_do_pwrite_zeroes()'s
max_write_zeroes is limited to INT_MAX. So, updated functions all are
safe, they will not get "bytes" larger than before.
Still, let's look through all updated functions, and add assertions to
the ones which are actually unprepared to values larger than INT_MAX.
For these drivers also set explicit max_pwrite_zeroes limit.
Let's go:
blkdebug: calculations can't overflow, thanks to
bdrv_check_qiov_request() in generic layer. rule_check() and
bdrv_co_pwrite_zeroes() both have 64bit argument.
blklogwrites: pass to blk_log_writes_co_log() with 64bit argument.
blkreplay, copy-on-read, filter-compress: pass to
bdrv_co_pwrite_zeroes() which is OK
copy-before-write: Calls cbw_do_copy_before_write() and
bdrv_co_pwrite_zeroes, both have 64bit argument.
file-posix: both handler calls raw_do_pwrite_zeroes, which is updated.
In raw_do_pwrite_zeroes() calculations are OK due to
bdrv_check_qiov_request(), bytes go to RawPosixAIOData::aio_nbytes
which is uint64_t.
Check also where that uint64_t gets handed:
handle_aiocb_write_zeroes_block() passes a uint64_t[2] to
ioctl(BLKZEROOUT), handle_aiocb_write_zeroes() calls do_fallocate()
which takes off_t (and we compile to always have 64-bit off_t), as
does handle_aiocb_write_zeroes_unmap. All look safe.
gluster: bytes go to GlusterAIOCB::size which is int64_t and to
glfs_zerofill_async works with off_t.
iscsi: Aha, here we deal with iscsi_writesame16_task() that has
uint32_t num_blocks argument and iscsi_writesame16_task() has
uint16_t argument. Make comments, add assertions and clarify
max_pwrite_zeroes calculation.
iscsi_allocmap_() functions already has int64_t argument
is_byte_request_lun_aligned is simple to update, do it.
mirror_top: pass to bdrv_mirror_top_do_write which has uint64_t
argument
nbd: Aha, here we have protocol limitation, and NBDRequest::len is
uint32_t. max_pwrite_zeroes is cleanly set to 32bit value, so we are
OK for now.
nvme: Again, protocol limitation. And no inherent limit for
write-zeroes at all. But from code that calculates cdw12 it's obvious
that we do have limit and alignment. Let's clarify it. Also,
obviously the code is not prepared to handle bytes=0. Let's handle
this case too.
trace events already 64bit
preallocate: pass to handle_write() and bdrv_co_pwrite_zeroes(), both
64bit.
rbd: pass to qemu_rbd_start_co() which is 64bit.
qcow2: offset + bytes and alignment still works good (thanks to
bdrv_check_qiov_request()), so tail calculation is OK
qcow2_subcluster_zeroize() has 64bit argument, should be OK
trace events updated
qed: qed_co_request wants int nb_sectors. Also in code we have size_t
used for request length which may be 32bit. So, let's just keep
INT_MAX as a limit (aligning it down to pwrite_zeroes_alignment) and
don't care.
raw-format: Is OK. raw_adjust_offset and bdrv_co_pwrite_zeroes are both
64bit.
throttle: Both throttle_group_co_io_limits_intercept() and
bdrv_co_pwrite_zeroes() are 64bit.
vmdk: pass to vmdk_pwritev which is 64bit
quorum: pass to quorum_co_pwritev() which is 64bit
Hooray!
At this point all block drivers are prepared to support 64bit
write-zero requests, or have explicitly set max_pwrite_zeroes.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210903102807.27127-8-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: use <= rather than < in assertions relying on max_pwrite_zeroes]
Signed-off-by: Eric Blake <eblake@redhat.com>
We are going to support 64 bit write-zeroes requests. Now update the
limit variable. It's absolutely safe. The variable is set in some
drivers, and used in bdrv_co_do_pwrite_zeroes().
Update also max_write_zeroes variable in bdrv_co_do_pwrite_zeroes(), so
that bdrv_co_do_pwrite_zeroes() is now prepared to 64bit requests. The
remaining logic including num, offset and bytes variables is already
supporting 64bit requests.
So the only thing that prevents 64 bit requests is limiting
max_write_zeroes variable to INT_MAX in bdrv_co_do_pwrite_zeroes().
We'll drop this limitation after updating all block drivers.
Ah, we also have bdrv_check_request32() in bdrv_co_pwritev_part(). It
will be modified to do bdrv_check_request() for write-zeroes path.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210903102807.27127-7-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
So, convert driver copy_range handlers parameters which are already
64bit to signed type.
Now let's consider all callers. Simple
git grep '\->bdrv_co_copy_range'
shows the only caller:
bdrv_co_copy_range_internal(), which does bdrv_check_request32(),
so everything is OK.
Still, the functions may be called directly, not only by drv->...
Let's check:
git grep '\.bdrv_co_copy_range_\(from\|to\)\s*=' | \
awk '{print $4}' | sed 's/,//' | sed 's/&//' | sort | uniq | \
while read func; do git grep "$func(" | \
grep -v "$func(BlockDriverState"; done
shows no more callers. So, we are done.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210903102807.27127-6-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
So, convert driver write handlers parameters which are already 64bit to
signed type.
While being here, convert also flags parameter to be BdrvRequestFlags.
Now let's consider all callers. Simple
git grep '\->bdrv_\(aio\|co\)_pwritev\(_part\)\?'
shows that's there three callers of driver function:
bdrv_driver_pwritev() and bdrv_driver_pwritev_compressed() in
block/io.c, both pass int64_t, checked by bdrv_check_qiov_request() to
be non-negative.
qcow2_save_vmstate() does bdrv_check_qiov_request().
Still, the functions may be called directly, not only by drv->...
Let's check:
git grep '\.bdrv_\(aio\|co\)_pwritev\(_part\)\?\s*=' | \
awk '{print $4}' | sed 's/,//' | sed 's/&//' | sort | uniq | \
while read func; do git grep "$func(" | \
grep -v "$func(BlockDriverState"; done
shows several callers:
qcow2:
qcow2_co_truncate() write at most up to @offset, which is checked in
generic qcow2_co_truncate() by bdrv_check_request().
qcow2_co_pwritev_compressed_task() pass the request (or part of the
request) that already went through normal write path, so it should
be OK
qcow:
qcow_co_pwritev_compressed() pass int64_t, it's updated by this patch
quorum:
quorum_co_pwrite_zeroes() pass int64_t and int - OK
throttle:
throttle_co_pwritev_compressed() pass int64_t, it's updated by this
patch
vmdk:
vmdk_co_pwritev_compressed() pass int64_t, it's updated by this
patch
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210903102807.27127-5-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
So, convert driver read handlers parameters which are already 64bit to
signed type.
While being here, convert also flags parameter to be BdrvRequestFlags.
Now let's consider all callers. Simple
git grep '\->bdrv_\(aio\|co\)_preadv\(_part\)\?'
shows that's there three callers of driver function:
bdrv_driver_preadv() in block/io.c, passes int64_t, checked by
bdrv_check_qiov_request() to be non-negative.
qcow2_load_vmstate() does bdrv_check_qiov_request().
do_perform_cow_read() has uint64_t argument. And a lot of things in
qcow2 driver are uint64_t, so converting it is big job. But we must
not work with requests that don't satisfy bdrv_check_qiov_request(),
so let's just assert it here.
Still, the functions may be called directly, not only by drv->...
Let's check:
git grep '\.bdrv_\(aio\|co\)_preadv\(_part\)\?\s*=' | \
awk '{print $4}' | sed 's/,//' | sed 's/&//' | sort | uniq | \
while read func; do git grep "$func(" | \
grep -v "$func(BlockDriverState"; done
The only one such caller:
QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, &data, 1);
...
ret = bdrv_replace_test_co_preadv(bs, 0, 1, &qiov, 0);
in tests/unit/test-bdrv-drain.c, and it's OK obviously.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210903102807.27127-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: fix typos]
Signed-off-by: Eric Blake <eblake@redhat.com>
We modify the request by adding an offset to vmstate. Let's check the
modified request. It will help us to safely move .bdrv_co_preadv_part
and .bdrv_co_pwritev_part to int64_t type of offset and bytes.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210903102807.27127-3-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
This to avoid possible conflicts with the "id" property of QOM objects.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20210901094153.227671-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Introduce uabs64(), a function that returns the absolute value of
a 64-bit int as an unsigned value. This avoids the undefined behavior
for common abs implementations, where abs of the most negative value is
undefined.
Signed-off-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20210910112624.72748-4-luis.pires@eldorado.org.br>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*plow (lower 64 bits of the dividend) is passed into divs128() as
a signed 64-bit integer. When building an __int128_t from it, it
must be zero-extended, instead of sign-extended.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Luis Pires <luis.pires@eldorado.org.br>
Message-Id: <20210910112624.72748-3-luis.pires@eldorado.org.br>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Each Nubus slot has an IRQ line that can be used to request service from the
CPU. Connect the IRQs to the Nubus bridge so that they can be wired up using qdev
gpios accordingly, and introduce a new nubus_set_irq() function that can be used
by Nubus devices to control the slot IRQ.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210924073808.1041-19-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Since nubus-bridge is a container for NubusBus then it should be embedded
directly within the bridge device using qbus_create_inplace().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210924073808.1041-17-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Now that Nubus has its own address space rather than mapping directly into the
system bus, move the Nubus reference from MacNubusBridge to NubusBridge.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210924073808.1041-16-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
This better reflects that the mac-nubus-bridge device is derived from the
nubus-bridge device, and that the structure represents the state of the bridge
device and not the Nubus itself. Also update the comment in the file header to
reflect that mac-nubus-bridge is specific to the Macintosh.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210924073808.1041-15-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
This is to allow the Nubus bridge to store its own additional state. Also update
the comment in the file header to reflect that nubus-bridge is not specific to
the Macintosh.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210924073808.1041-14-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
According to "Designing Cards and Drivers for the Macintosh Family" the Nubus
has its own 32-bit address space based upon physical slot addressing.
Move Nubus to its own 32-bit address space and then use memory region aliases
to map available slot and super slot ranges into the q800 system address
space via the Macintosh Nubus bridge.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210924073808.1041-13-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
The declaration ROM is located at the top-most address of the standard slot
space.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210924073808.1041-12-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Since there is no need to generate a dummy declaration ROM, remove both
nubus_register_rom() and nubus_register_format_block(). These will shortly be
replaced with a mechanism to optionally load a declaration ROM from disk to
allow real images to be used within QEMU.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210924073808.1041-11-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Convert nubus_device_realize() to use a bitmap to manage available slots to allow
for future Nubus devices to be plugged into arbitrary slots from the command line
using a new qdev "slot" parameter for nubus devices.
Update mac_nubus_bridge_init() to only allow slots 0x9 to 0xe on Macintosh machines
as documented in "Designing Cards and Drivers for the Macintosh Family".
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210924073808.1041-5-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
According to "Designing Cards and Drivers for the Macintosh Family" each physical
nubus slot can access 2 separate address ranges: a super slot memory region which
is 256MB and a standard slot memory region which is 16MB.
Currently a Nubus device uses the physical slot number to determine whether it is
using a standard slot memory region or a super slot memory region rather than
exposing both memory regions for use as required.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210924073808.1041-4-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
This is in preparation for creating a qdev property of the same name.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210924073808.1041-3-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
The PC_ROM_* definitions are only used by the PC machine,
and are irrelevant to the other architectures / machines.
Reduce their scope by moving them to hw/i386/pc.c.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210917185949.2244956-1-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Fix is added to check for reserved value in select field for
namespace attachment
CC: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Naveen Nagar <naveen.n1@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
While we may have had some thought of allowing system-mode
to return from this hook, we have no guests that require this.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
There is nothing target specific about this. The implementation
is host specific, but the declaration is 100% common.
Reviewed-By: Warner Losh <imp@bsdimp.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
- ePMP CSR address updates
- Convert internal interrupts to use QEMU GPIO lines
- SiFive PWM support
- Support for RISC-V ACLINT
- SiFive PDMA fixes
- Update to u-boot instructions for sifive_u
- mstatus.SD bug fix for hypervisor extensions
- OpenTitan fix for USB dev address
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE9sSsRtSTSGjTuM6PIeENKd+XcFQFAmFJgSoACgkQIeENKd+X
cFQTOwf8DC7rBqOWQS3v/r+H2hlfDqW+4G3pPPBcoyCEiqO+cL26ox+EmTHDbieh
+0yWyp7L6SU/zcJ86oBAFNGH46ltXuUKOYWhkfA1QwlGzAwjZ82hnZ3jJqXf1jin
Wq0ElzKk6rvcRkHTVhdjkGvoxskaXPQ/kFzyTHrxMDlkmHO3L4IaYe0xsamRI11D
E7UJC97YmpSAsCNUc5irpkeLyiFobyR8TEL3nBEPK/6Xj0ojRT4zoGe1EotC7+sN
zL8a9ZuU0bL3rQH8Ai7wnXBP8D2PQa0tZQV6wne/BzeEUSpKrC/rGW73vQCz0Pps
U8VNkIlbAqD1s6aXlqE24H535x10Mw==
=WYF5
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/alistair23/tags/pull-riscv-to-apply-20210921' into staging
Second RISC-V PR for QEMU 6.2
- ePMP CSR address updates
- Convert internal interrupts to use QEMU GPIO lines
- SiFive PWM support
- Support for RISC-V ACLINT
- SiFive PDMA fixes
- Update to u-boot instructions for sifive_u
- mstatus.SD bug fix for hypervisor extensions
- OpenTitan fix for USB dev address
# gpg: Signature made Mon 20 Sep 2021 11:52:26 PM PDT
# gpg: using RSA key F6C4AC46D4934868D3B8CE8F21E10D29DF977054
# gpg: Good signature from "Alistair Francis <alistair@alistair23.me>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: F6C4 AC46 D493 4868 D3B8 CE8F 21E1 0D29 DF97 7054
* remotes/alistair23/tags/pull-riscv-to-apply-20210921: (21 commits)
hw/riscv: opentitan: Correct the USB Dev address
target/riscv: csr: Rename HCOUNTEREN_CY and friends
target/riscv: Backup/restore mstatus.SD bit when virtual register swapped
docs/system/riscv: sifive_u: Update U-Boot instructions
hw/dma: sifive_pdma: don't set Control.error if 0 bytes to transfer
hw/dma: sifive_pdma: allow non-multiple transaction size transactions
hw/dma: sifive_pdma: claim bit must be set before DMA transactions
hw/dma: sifive_pdma: reset Next* registers when Control.claim is set
hw/riscv: virt: Add optional ACLINT support to virt machine
hw/riscv: virt: Re-factor FDT generation
hw/intc: Upgrade the SiFive CLINT implementation to RISC-V ACLINT
hw/intc: Rename sifive_clint sources to riscv_aclint sources
sifive_u: Connect the SiFive PWM device
hw/timer: Add SiFive PWM support
hw/intc: ibex_timer: Convert the timer to use RISC-V CPU GPIO lines
hw/intc: sifive_plic: Convert the PLIC to use RISC-V CPU GPIO lines
hw/intc: ibex_plic: Convert the PLIC to use RISC-V CPU GPIO lines
hw/intc: sifive_clint: Use RISC-V CPU GPIO lines
target/riscv: Expose interrupt pending bits as GPIO lines
target/riscv: Fix satp write
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>