The new Cluster-Aware Scheduling support has landed in Linux 5.16,
which has been proved to benefit the scheduling performance (e.g.
load balance and wake_affine strategy) on both x86_64 and AArch64.
So now in Linux 5.16 we have four-level arch-neutral CPU topology
definition like below and a new scheduler level for clusters.
struct cpu_topology {
int thread_id;
int core_id;
int cluster_id;
int package_id;
int llc_id;
cpumask_t thread_sibling;
cpumask_t core_sibling;
cpumask_t cluster_sibling;
cpumask_t llc_sibling;
}
A cluster generally means a group of CPU cores which share L2 cache
or other mid-level resources, and it is the shared resources that
is used to improve scheduler's behavior. From the point of view of
the size range, it's between CPU die and CPU core. For example, on
some ARM64 Kunpeng servers, we have 6 clusters in each NUMA node,
and 4 CPU cores in each cluster. The 4 CPU cores share a separate
L2 cache and a L3 cache tag, which brings cache affinity advantage.
In virtualization, on the Hosts which have pClusters (physical
clusters), if we can design a vCPU topology with cluster level for
guest kernel and have a dedicated vCPU pinning. A Cluster-Aware
Guest kernel can also make use of the cache affinity of CPU clusters
to gain similar scheduling performance.
This patch adds infrastructure for CPU cluster level topology
configuration and parsing, so that the user can specify cluster
parameter if their machines support it.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Message-Id: <20211228092221.21068-3-wangyanan55@huawei.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
[PMD: Added '(since 7.0)' to @clusters in qapi/machine.json]
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
dma_memory_rw() returns a MemTxResult type. Do not discard
it, return it to the caller.
Since dma_buf_rw() was previously returning the QEMUSGList
size not consumed, add an extra argument where this size
can be stored.
Update the 2 callers.
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211223115554.3155328-14-philmd@redhat.com>
Let devices specify transaction attributes when calling
dma_buf_read().
Keep the default MEMTXATTRS_UNSPECIFIED in the few callers.
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211223115554.3155328-13-philmd@redhat.com>
Let devices specify transaction attributes when calling
dma_buf_write().
Keep the default MEMTXATTRS_UNSPECIFIED in the few callers.
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211223115554.3155328-12-philmd@redhat.com>
Let devices specify transaction attributes when calling dma_buf_rw().
Keep the default MEMTXATTRS_UNSPECIFIED in the 2 callers.
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211223115554.3155328-11-philmd@redhat.com>
DMA operations are run on any kind of buffer, not arrays of
uint8_t. Convert dma_buf_read/dma_buf_write functions to take
a void pointer argument and save us pointless casts to uint8_t *.
Remove this pointless casts in the megasas device model.
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211223115554.3155328-9-philmd@redhat.com>
DMA operations are run on any kind of buffer, not arrays of
uint8_t. Convert dma_buf_rw() to take a void pointer argument
to save us pointless casts to uint8_t *.
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211223115554.3155328-8-philmd@redhat.com>
Let devices specify transaction attributes when calling
dma_memory_map().
Patch created mechanically using spatch with this script:
@@
expression E1, E2, E3, E4;
@@
- dma_memory_map(E1, E2, E3, E4)
+ dma_memory_map(E1, E2, E3, E4, MEMTXATTRS_UNSPECIFIED)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20211223115554.3155328-7-philmd@redhat.com>
Let devices specify transaction attributes when calling
dma_memory_rw().
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20211223115554.3155328-5-philmd@redhat.com>
Let devices specify transaction attributes when calling
dma_memory_set().
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20211223115554.3155328-3-philmd@redhat.com>
When we set cpu->cflags_next_tb it is because we want to carefully
control the execution of the next TB. Currently there is a race that
causes the second stage of watchpoint handling to get ignored if an
IRQ is processed before we finish executing the instruction that
triggers the watchpoint. Use the new CF_NOIRQ facility to avoid the
race.
We also suppress IRQs when handling precise self modifying code to
avoid unnecessary bouncing.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Cc: Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru>
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/245
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20211129140932.4115115-3-alex.bennee@linaro.org>
To work correctly -dump-vmstate and vmstate-static-checker.py need to
dump all the supported vmstates.
But as some devices can be modules, they are not loaded at startup and not
dumped. Fix that by loading all available modules before dumping the
machine vmstate.
Fixes: 7ab6e7fcce ("qdev: device module support")
Cc: kraxel@redhat.com
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211116072840.132731-1-lvivier@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
pci power management fixes
acpi hotplug fixes
misc other fixes
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmGSh40PHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpH2AH/RLY+ONL98GT4D+hi8MGCOhN669jrPbAqJ1L
l9p2GxDoXSD4PCDReU3VCzLtRsxsv/camgMx/DnDxaZtdgKm8SlXGJutMtNHpTY6
PHZLMLoixZSsKi6Tm3xGut/FSzsTZl4Gc3rqwaryHQLqptNO+XQJSBmP+oEAdjAd
nKVepHRveOTVVBBzCmoNpFA+BEXRTItdfG0ZKprPkXUobc2jeV7ymkTX9s2OBLEf
/pE49tZj1K4ab8g4+RY4cFEFoDZbXZ55Aq3ck5LAb47qIr/1cXPVP7PxINmasy4Y
H+oTUpWLBM7rFbdP/GBFANu5HkEQ5pnjeosWYOKsopE4UFCyDxc=
=y9C8
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of git://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
pci,pc,virtio: bugfixes
pci power management fixes
acpi hotplug fixes
misc other fixes
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Mon 15 Nov 2021 05:15:09 PM CET
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
* tag 'for_upstream' of git://git.kernel.org/pub/scm/virt/kvm/mst/qemu:
pcie: expire pending delete
pcie: fast unplug when slot power is off
pcie: factor out pcie_cap_slot_unplug()
pcie: add power indicator blink check
pcie: implement slot power control for pcie root ports
pci: implement power state
vdpa: Check for existence of opts.vhostdev
vdpa: Replace qemu_open_old by qemu_open at
virtio: use virtio accessor to access packed event
virtio: use virtio accessor to access packed descriptor flags
tests: bios-tables-test update expected blobs
hw/i386/acpi-build: Deny control on PCIe Native Hot-plug in _OSC
bios-tables-test: Allow changes in DSDT ACPI tables
hw/acpi/ich9: Add compat prop to keep HPC bit set for 6.1 machine type
pcie: rename 'native-hotplug' to 'x-native-hotplug'
hw/mem/pc-dimm: Restrict NUMA-specific code to NUMA machines
vhost: Fix last vq queue index of devices with no cvq
vhost: Rename last_index to vq_index_end
softmmu/qdev-monitor: fix use-after-free in qdev_set_id()
net/vhost-vdpa: fix memory leak in vhost_vdpa_get_max_queue_pairs()
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Add an expire time for pending delete, once the time is over allow
pressing the attention button again.
This makes pcie hotplug behave more like acpi hotplug, where one can
try sending an 'device_del' monitor command again in case the guest
didn't respond to the first attempt.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20211111130859.1171890-7-kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reported by Coverity (CID 1465222).
Fixes: 4a1d937796 ("softmmu/qdev-monitor: add error handling in qdev_set_id")
Cc: Damien Hedde <damien.hedde@greensocs.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20211102163342.31162-1-stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Damien Hedde <damien.hedde@greensocs.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Recent commit 6952026120 "monitor: Tidy up find_device_state()"
assumed the function's argument is "the device's ID or QOM path" (as
documented for device_del). It's actually either an absolute QOM
path, or a QOM path relative to /machine/peripheral/. Such a relative
path is a device ID when it doesn't contain a slash. When it does,
the function now always fails. Broke iotest 200, which uses relative
path "vda/virtio-backend".
It fails because object_resolve_path_component() resolves just one
component, not a relative path.
The obvious function to resolve relative paths is
object_resolve_path(). It picks a parent automatically. Too much
magic, we want to specify the parent. Create new
object_resolve_path_at() for that, and use it in find_device_state().
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20211019085711.86377-1-armbru@redhat.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
* DMA support in the multiboot option ROM
* Rename default-bus-bypass-iommu
* Deprecate -watchdog and cleanup -watchdog-action
* HVF fix for <PAGE_SIZE regions
* Support TSC scaling for AMD nested virtualization
* Fix for ESP fuzzing bug
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmGBUeEUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOh+Qf+OMRhRiv6dYjbK/5zXrx81AgxYAY3
dBUSr8v16LyrMl1U3DZWzhD+MzQsC83m/Xsh4lGxlHDWtkK9QQA5xDG95JZdY26i
MGCbbjnFHISbyBQV9Y724gPfPjOOODuoFbzafSx6VLITOcyv1ye0cm7TOjOPB+tt
E4c3JqTZ7g8a5yMe8ItkVhz5pPY+oVw8dxMNRp6Sup5Dbfx0DjacIwLasLsHfPL7
qBADfqB20ovHUzLjXu7oWgEd4KxJ6kiSCaJJu/KD36hg0wB8+WVP1o43j4PkczHT
QjU7eZaeaTrN5Cf34ttPge6QReMi5SFNCaA9O9/HLqrQgdEtt/diZWuqjQ==
=a2mC
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* Build system fixes and cleanups
* DMA support in the multiboot option ROM
* Rename default-bus-bypass-iommu
* Deprecate -watchdog and cleanup -watchdog-action
* HVF fix for <PAGE_SIZE regions
* Support TSC scaling for AMD nested virtualization
* Fix for ESP fuzzing bug
# gpg: Signature made Tue 02 Nov 2021 10:57:37 AM EDT
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
* remotes/bonzini/tags/for-upstream: (27 commits)
configure: fix --audio-drv-list help message
configure: Remove the check for the __thread keyword
Move the l2tpv3 test from configure to meson.build
meson: remove unnecessary coreaudio test program
meson: remove pointless warnings
meson.build: Allow to disable OSS again
meson: bump submodule to 0.59.3
qtest/am53c974-test: add test for cancelling in-flight requests
esp: ensure in-flight SCSI requests are always cancelled
KVM: SVM: add migration support for nested TSC scaling
hw/i386: fix vmmouse registration
watchdog: remove select_watchdog_action
vl: deprecate -watchdog
watchdog: add information from -watchdog help to -device help
hw/i386: Rename default_bus_bypass_iommu
hvf: Avoid mapping regions < PAGE_SIZE as ram
configure: do not duplicate CPU_CFLAGS into QEMU_LDFLAGS
configure: remove useless NPTL probe
target/i386: use DMA-enabled multiboot ROM for new-enough QEMU machine types
optionrom: add a DMA-enabled multiboot ROM
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This is a counterpart to the HMP "info ramblock" command. It is being
added with an "x-" prefix because this QMP command is intended as an
adhoc debugging tool and will thus not be modelled in QAPI as fully
structured data, nor will it have long term guaranteed stability.
The existing HMP command is rewritten to call the QMP command.
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Instead of invoking select_watchdog_action from both HMP and command line,
go directly from HMP to QMP and use QemuOpts as the intermediary for the
command line.
This makes -watchdog-action explicitly a shortcut for "-action watchdog",
so that "-watchdog-action" and "-action watchdog" override each other
based on the position on the command line; previously, "-action watchdog"
always won.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-watchdog is the same as -device except that it is case insensitive (and it
allows only watchdog devices of course). Now that "-device help" can list
as such the available watchdog devices, we can deprecate it.
Note that even though -watchdog tries to be case insensitive, it fails
at that: "-watchdog i6300xyz" fails with "Unknown -watchdog device",
but "-watchdog i6300ESB" also fails (when the generated -device option
is processed) with an error "'i6300ESB' is not a valid device model name".
For this reason, the documentation update does not mention the case
insensitivity of -watchdog.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Hi
this includes pending bits of migration patches.
- virtio-mem support by David Hildenbrand
- dirtyrate improvements by Hyman Huang
- fix rdma wrid by Li Zhijian
- dump-guest-memory fixes by Peter Xu
Pleas apply.
Thanks, Juan.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEGJn/jt6/WMzuA0uC9IfvGFhy1yMFAmGAZEAACgkQ9IfvGFhy
1yPMlxAAx3HRMTCqlluM6B28TKHpGmg7O87g6F0U5fRZNJEro+8p08zYC1Yo2HNm
Po7dd++lZxcGPKrq7q1IKPH+wbQ5Yg/3jCeruXP2GRq3AKo9MyUK4WKd2BKRZbnl
q2oioUSLKYmsUqyl6YI/8nlgyDvmdGet8+GHxhmG5fVNGabWnGhwJDlCbOh1LAqb
cqACvahXuIVj3X7nMbz3e3Xy4YY/hJqJb3+e0DrQwlPDQRLDhadlQ7zv9vJ75BeY
Lt0/jnYI223m5LuiTecjv1S9AQjQpqJZq9N2K9miXmd3jtVkm2iqHdXZDK/Sr5oO
TE5OCf8xtFEcZ2KNwxQYMW+gkx2Gj6aoxIobu3HJ5kELErmvVhdnM7rkLmSHf8WB
Un/O55xUE/Hyg4G/oZOjAwk6eHS7RM+fIBq5wDGn5MNyYpBXid6JhWxSKv0i/gFX
8JA5i8wyzkUD23c8Ez+Ms6nmIL9LJS7xpVx9jqV2fNBdf+15opHg2ufnB5NnQ9y8
JJkzPjW2xKh5EsznY8iDeTztN7Im9Bn+4VcNl53Okugh5QFlTOtcAE21EjPrhv0K
XC6PJmDnSZenhJkhgXeDzUe4wZu9wvAjH/R/yTVrW2jT51Azebw3dtreX8F/Dqap
n+T+jupShCrrNFw0tCWsuLu+OZJrSwA83tFo+6DfH/idi0CJoJs=
=8B3Y
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/juanquintela/tags/migration-20211031-pull-request' into staging
Migration Pull request
Hi
this includes pending bits of migration patches.
- virtio-mem support by David Hildenbrand
- dirtyrate improvements by Hyman Huang
- fix rdma wrid by Li Zhijian
- dump-guest-memory fixes by Peter Xu
Pleas apply.
Thanks, Juan.
# gpg: Signature made Mon 01 Nov 2021 06:03:44 PM EDT
# gpg: using RSA key 1899FF8EDEBF58CCEE034B82F487EF185872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>" [full]
# gpg: aka "Juan Quintela <quintela@trasno.org>" [full]
* remotes/juanquintela/tags/migration-20211031-pull-request:
migration/dirtyrate: implement dirty-bitmap dirtyrate calculation
memory: introduce total_dirty_pages to stat dirty pages
migration/ram: Handle RAMBlocks with a RamDiscardManager on background snapshots
migration/ram: Factor out populating pages readable in ram_block_populate_pages()
migration: Simplify alignment and alignment checks
migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the destination
virtio-mem: Drop precopy notifier
migration/ram: Handle RAMBlocks with a RamDiscardManager on the migration source
virtio-mem: Implement replay_discarded RamDiscardManager callback
memory: Introduce replay_discarded callback for RamDiscardManager
dump-guest-memory: Block live migration
migration: Add migrate_add_blocker_internal()
migration: Make migration blocker work for snapshots too
migration/dirtyrate: implement dirty-ring dirtyrate calculation
migration/dirtyrate: move init step of calculation to main thread
migration/dirtyrate: adjust order of registering thread
migration/dirtyrate: introduce struct and adjust DirtyRateStat
memory: make global_dirty_tracking a bitmask
KVM: introduce dirty_pages and kvm_dirty_ring_enabled
migration/rdma: Fix out of order wrid
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Introduce replay_discarded callback similar to our existing
replay_populated callback, to be used my migration code to never migrate
discarded memory.
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
since dirty ring has been introduced, there are two methods
to track dirty pages of vm. it seems that "logging" has
a hint on the method, so rename the global_dirty_log to
global_dirty_tracking would make description more accurate.
dirty rate measurement may start or stop dirty tracking during
calculation. this conflict with migration because stop dirty
tracking make migration leave dirty pages out then that'll be
a problem.
make global_dirty_tracking a bitmask can let both migration and
dirty rate measurement work fine. introduce GLOBAL_DIRTY_MIGRATION
and GLOBAL_DIRTY_DIRTY_RATE to distinguish what current dirty
tracking aims for, migration or dirty rate.
Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Message-Id: <9c9388657cfa0301bd2c1cfa36e7cf6da4aeca19.1624040308.git.huangy81@chinatelecom.cn>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Add an early check to test if the requested sysbus device type
is allowed by the current machine before creating the device. This
impacts both -device cli option and device_add qmp command.
Before this patch, the check was done well after the device has
been created (in a machine init done notifier). We can now report
the error right away.
Signed-off-by: Damien Hedde <damien.hedde@greensocs.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20211029142258.484907-3-damien.hedde@greensocs.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Watchpoints that should fire after the memory access
break an execution of the current block, try to
translate current instruction into the separate block,
which then causes debug interrupt.
But cpu_interrupt can't be called in such block when
icount is enabled, because interrupts muse be allowed
explicitly.
This patch sets CF_LAST_IO flag for retranslated block,
allowing interrupt request for the last instruction.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <163542169727.2127597.8141772572696627329.stgit@pasha-ThinkPad-X280>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
cpu_check_watchpoint function checks cpu->watchpoint_hit at the entry.
But then it also does the same in the middle of the function,
while this field can't change.
That is why this patch removes this useless condition.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <163542169094.2127597.8801843697434113110.stgit@pasha-ThinkPad-X280>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Watchpoint processing code restores vCPU state twice:
in tb_check_watchpoint and in cpu_loop_exit_restore/cpu_restore_state.
Normally it does not affect anything, but in icount mode instruction
counter is incremented twice and becomes incorrect.
This patch eliminates unneeded CPU state restore.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <163542168516.2127597.8781375223437124644.stgit@pasha-ThinkPad-X280>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Fix the comment to match what the code is doing, as explained in
the changelog of commit 86cf9e1546
that introduced the change:
Commit 9458a9a1df added synchronization
of vCPU and migration operations through calling run_on_cpu operation.
However, in replay mode this synchronization is unneeded, because
I/O and vCPU threads are already synchronized.
This patch disables such synchronization for record/replay mode.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <163429018454.1146856.3429437540871060739.stgit@bahia.huguette>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
qemu_fdt_add_path() works like qemu_fdt_add_subnode(), except it
also adds all missing subnodes from the given path. We'll use it
in a coming patch where we will add cpu-map to the device tree.
And we also tweak an error message of qemu_fdt_add_subnode().
Co-developed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20211020142125.7516-3-wangyanan55@huawei.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Commit f3a8505656 ("qdev/qbus: add hidden device support") has
introduced a generic way to hide a device but it has modified
qdev_device_add() to check a specific option of the failover device,
"failover_pair_id", before calling the generic mechanism.
It's not needed (and not generic) to do that in qdev_device_add() because
this is also checked by the failover_hide_primary_device() function that
uses the generic mechanism to hide the device.
Cc: Jens Freimann <jfreimann@redhat.com>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20211019071532.682717-3-lvivier@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Like we already do for -object, introduce support for JSON syntax in
-device, which can be kept stable in the long term and guarantees that a
single code path with identical behaviour is used for both QMP and the
command line. Compared to the QemuOpts based code, the parser contains
less surprises and has support for non-scalar options (lists and
structs). Switching management tools to JSON means that we can more
easily change the "human" CLI syntax from QemuOpts to the keyval parser
later.
In the QAPI schema, a feature flag is added to the device-add command to
allow management tools to detect support for this.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20211008133442.141332-16-kwolf@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
QDicts are both what QMP natively uses and what the keyval parser
produces. Going through QemuOpts isn't useful for either one, so switch
the main device creation function to QDicts. By sharing more code with
the -object/object-add code path, we can even reduce the code size a
bit.
This commit doesn't remove the detour through QemuOpts from any code
path yet, but it allows the following commits to do so.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211008133442.141332-15-kwolf@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
hide_device() is used for virtio-net failover, where the standby virtio
device delays creation of the primary device. It only makes sense to
have a single primary device for each standby device. Adding a second
one should result in an error instead of hiding it and never using it
afterwards.
Prepare for this by adding an Error parameter to the hide_device()
callback where virtio-net is informed about adding a primary device.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211008133442.141332-12-kwolf@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qdev_set_id() is mostly used when the user adds a device (using
-device cli option or device_add qmp command). This commit adds
an error parameter to handle the case where the given id is
already taken.
Also document the function and add a return value in order to
be able to capture success/failure: the function now returns the
id in case of success, or NULL in case of failure.
The commit modifies the 2 calling places (qdev-monitor and
xen-legacy-backend) to add the error object parameter.
Note that the id is, right now, guaranteed to be unique because
all ids came from the "device" QemuOptsList where the id is used
as key. This addition is a preparation for a future commit which
will relax the uniqueness.
Signed-off-by: Damien Hedde <damien.hedde@greensocs.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211008133442.141332-10-kwolf@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
DeviceState.id is a pointer to a string that is stored in the QemuOpts
object DeviceState.opts and freed together with it. We want to create
devices without going through QemuOpts in the future, so make this a
separately allocated string.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211008133442.141332-9-kwolf@redhat.com>
Reviewed-by: Damien Hedde <damien.hedde@greensocs.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The only thing the string visitor adds compared to a keyval visitor is
list support. git grep for 'visit_start_list' and 'visit.*List' shows
that devices don't make use of this.
In a world with a QAPIfied command line interface, the keyval visitor is
used to parse the command line. In order to make sure that no devices
start using this feature that would make backwards compatibility harder,
just switch away from object_property_parse(), which internally uses the
string visitor, to a keyval visitor and object_property_set().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20211008133442.141332-8-kwolf@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add cpu_ld/st_mmu memory primitives.
Move helper_ld/st memory helpers out of tcg.h.
Canonicalize alignment flags in MemOp.
-----BEGIN PGP SIGNATURE-----
iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmFnG/0dHHJpY2hhcmQu
aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV/P8Qf/TIb+nP/q4ZesoHV5
hNuKIMcGMiIWjP7YkuXg7H8n4QQxSK+nKXI3qlWCTIVtKOQFC3jkqNnxV8ncHUyS
RW6ePEcmJfb+yv20MnDLObxMcAq6mIkHtOjARQcvcHiXxMNEZdIvJ8f8/qrkYib1
RRJarqIGlYFJvGyfbplq/JA/WYcJleIElEUx7JPSewz38Kk0gDIH2+BR2TBFrWAD
TDfh+GvlHeX8IYU19rWnt7pFv8TVPVQODqJBtlRPEYnl+LGdpJPCP2ATUAggWHiA
hucYKsuMWXXXhGx2nsurkpSNrBfGe6OHybOE5d1ARqmq0MnyHJat+ryh6qTx3Z9w
oZKi+Q==
=QpK0
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20211013' into staging
Use MO_128 for 16-byte atomic memory operations.
Add cpu_ld/st_mmu memory primitives.
Move helper_ld/st memory helpers out of tcg.h.
Canonicalize alignment flags in MemOp.
# gpg: Signature made Wed 13 Oct 2021 10:48:45 AM PDT
# gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg: issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [ultimate]
* remotes/rth/tags/pull-tcg-20211013:
tcg: Canonicalize alignment flags in MemOp
tcg: Move helper_*_mmu decls to tcg/tcg-ldst.h
target/arm: Use cpu_*_mmu instead of helper_*_mmu
target/sparc: Use cpu_*_mmu instead of helper_*_mmu
target/s390x: Use cpu_*_mmu instead of helper_*_mmu
target/mips: Use 8-byte memory ops for msa load/store
target/mips: Use cpu_*_data_ra for msa load/store
accel/tcg: Move cpu_atomic decls to exec/cpu_ldst.h
accel/tcg: Add cpu_{ld,st}*_mmu interfaces
target/hexagon: Implement cpu_mmu_index
target/s390x: Use MO_128 for 16 byte atomics
target/ppc: Use MO_128 for 16 byte atomics
target/i386: Use MO_128 for 16 byte atomics
target/arm: Use MO_128 for 16 byte atomics
memory: Log access direction for invalid accesses
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
In memory_region_access_valid() invalid accesses are logged to help
debugging but the log message does not say if it was a read or write.
Log that too to better identify the access causing the problem.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <20211011173616.F1DE0756022@zero.eik.bme.hu>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Commit 6287d827d4 "monitor: allow device_del to accept QOM paths"
extended find_device_state() to accept QOM paths in addition to qdev
IDs. This added a checked conversion to TYPE_DEVICE at the end, which
duplicates the check done for the qdev ID case earlier, except it sets
a *different* error: GenericError "ID is not a hotpluggable device"
when passed a QOM path, and DeviceNotFound "Device 'ID' not found"
when passed a qdev ID. Fortunately, the latter won't happen as long
as we add only devices to /machine/peripheral/.
Earlier, commit b6cc36abb2 "qdev: device_del: Search for to be
unplugged device in 'peripheral' container" rewrote the lookup by qdev
ID to use QOM instead of qdev_find_recursive(), so it can handle
buss-less devices. It does so by constructing an absolute QOM path.
Works, but object_resolve_path_component() is easier. Switching to it
also gets rid of the unclean duplication described above.
While there, avoid converting to TYPE_DEVICE twice, first to check
whether it's possible, and then for real.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Damien Hedde <damien.hedde@greensocs.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210916111707.84999-1-armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
virtio-mem logically plugs/unplugs memory within a sparse memory region
and notifies via the RamDiscardManager interface when parts become
plugged (populated) or unplugged (discarded).
Currently, we end up (via the two users)
1) zeroing all logically unplugged/discarded memory during TPM resets.
2) reading all logically unplugged/discarded memory when dumping, to
figure out the content is zero.
1) is always bad, because we assume unplugged memory stays discarded
(and is already implicitly zero).
2) isn't that bad with anonymous memory, we end up reading the zero
page (slow and unnecessary, though). However, once we use some
file-backed memory (future use case), even reading will populate memory.
Let's cut out all parts marked as not-populated (discarded) via the
RamDiscardManager. As virtio-mem is the single user, this now means that
logically unplugged memory ranges will no longer be included in the
dump, which results in smaller dump files and faster dumping.
virtio-mem has a minimum granularity of 1 MiB (and the default is usually
2 MiB). Theoretically, we can see quite some fragmentation, in practice
we won't have it completely fragmented in 1 MiB pieces. Still, we might
end up with many physical ranges.
Both, the ELF format and kdump seem to be ready to support many
individual ranges (e.g., for ELF it seems to be UINT32_MAX, kdump has a
linear bitmap).
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Claudio Fontana <cfontana@suse.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Alex Bennée" <alex.bennee@linaro.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210727082545.17934-5-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Let's factor out adding a MemoryRegionSection to the list, to be reused in
RamDiscardManager context next.
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Claudio Fontana <cfontana@suse.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Alex Bennée" <alex.bennee@linaro.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210727082545.17934-4-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Let's make sure to not merge when different memory regions are involved.
Unlikely, but theoretically possible.
Acked-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Claudio Fontana <cfontana@suse.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Alex Bennée" <alex.bennee@linaro.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210727082545.17934-3-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Trace at memory_region_sync_dirty_bitmap() for log_sync() or global_log_sync()
on memory regions. One trace line should suffice when it finishes, so as to
estimate the time used for each log sync process.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210817013706.30986-1-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Provide a name field for all the memory listeners. It can be used to identify
which memory listener is which.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210817013553.30584-2-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a new RAMBlock flag to denote "protected" memory, i.e. memory that
looks and acts like RAM but is inaccessible via normal mechanisms,
including DMA. Use the flag to skip protected memory regions when
mapping RAM for DMA in VFIO.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
By default, QEMU will allow devices to be plugged into a bus up to
the bus class's device count limit. If the user creates a device on
the command line or via the monitor and doesn't explicitly specify
the bus to plug it in, QEMU will plug it into the first non-full bus
that it finds.
This is fine in most cases, but some machines have multiple buses of
a given type, some of which are dedicated to on-board devices and
some of which have an externally exposed connector for user-pluggable
devices. One example is I2C buses.
Provide a new function qbus_mark_full() so that a machine model can
mark this kind of "internal only" bus as 'full' after it has created
all the devices that should be plugged into that bus. The "find a
non-full bus" algorithm will then skip the internal-only bus when
looking for a place to plug in user-created devices.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210903151435.22379-2-peter.maydell@linaro.org
It's not that much complicated to type "-display sdl" or "-display curses",
so we should not clutter our main option name space with such simple
wrapper options and rather present the users with a concise interface
instead. Thus let's deprecate the "-sdl" and "-curses" wrapper options now.
Message-Id: <20210825092023.81396-4-thuth@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
The alt_grab and ctrl_grab parameter of the -display sdl option prevent
the QAPIfication of the "sdl" part of the -display option, so we should
eventually remove them. And since this feature is also rather niche anyway,
we should not clutter the top-level option list with these, so let's
also deprecate the "-alt-grab" and the "-ctrl-grab" options while we're
at it.
Once the deprecation period of "alt_grab" and "ctrl_grab" is over, we
then can finally switch the -display sdl option to use QAPI internally,
too.
Message-Id: <20210825092023.81396-3-thuth@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
The -display sdl option is not using QAPI internally yet, and uses hand-
crafted parsing instead (see parse_display() in vl.c), which is quite
ugly, since most of the other code is using the QAPIfied DisplayOption
already. Unfortunately, the "alt_grab" and "ctrl_grab" use underscores in
their names which has recently been forbidden in new QAPI code, so
a straight conversion is not possible. While we could add some exceptions
to the QAPI schema parser for this, the way these parameters have been
designed was maybe a bad idea anyway: First, it's not possible to enable
both parameters at the same time, thus instead of two boolean parameters
it would be better to have only one multi-choice parameter instead.
Second, the naming is also somewhat unfortunate since the "alt_grab"
parameter is not about the ALT key, but rather about the left SHIFT key
that has to be used additionally when the parameter is enabled.
So instead of trying to QAPIfy "alt_grab" and "ctrl_grab", let's rather
introduce an alternative to these parameters instead, a new parameter
called "grab-mod" which can either be set to "lshift-lctrl-lalt" or to
"rctrl". In case we ever want to support additional modes later, we can
then also simply extend the list of supported strings here.
Message-Id: <20210825092023.81396-2-thuth@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>