This moves the details of the ISA bus creation under the LPC model but
more important, the new PnvChip operation will let us choose the chip
class to use when we introduce the different chip classes for Power9
and Power8. It hides away the processor chip controllers from the
machine.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On Power9, the thread interrupt presenter has a different type and is
linked to the chip owning the cores.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
- accommodate guests using vfio-ccw without specifying unlimited
prefetch, but actually working fine
- add cpu model for the z14 Model ZR1
- add support for pxelinux.cfg-style network booting to the s390x
firmware
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEw9DWbcNiT/aowBjO3s9rk8bwL68FAlsozdISHGNvaHVja0By
ZWRoYXQuY29tAAoJEN7Pa5PG8C+vafQQAIUl+TPrwXQQCp9xBy+CpMAAQtPp0PI2
DGisaVohLQARfNorqDXqPWMy9ulbYQ1bXcdpbhVwzIHlBZopMAvC+QTRwqs4EIFw
0vjvm5ttC2b3zt+7BAom2xWqaxmSen02O0Qj2l5HpFBVp+5afT74geiWhmmirWbw
x/JEEnlOcP2dHAvujfbv9Slst1B/OQ4xz7gf5IOlfX2MCsDcIyvqrw6a2D5G4PlX
DHPZ7PzA6edoG+qP1DvH6Myi19zs3edAKJp5sdCK92FgJBxW6nI1+ThoN3ApbuDm
U7HYeLzrQ/dXWiHMwA+2QP5Iqe/VfKCvqVcm9+m458dQLiN8ZiyGmUbbZtL0L0X8
KQB70mSQjawWRVLr1vsvZNAnCeqjT2hZz37EAwA3gGOeIfFjxFLNtBu4WKgt4soG
7Uq99irLBkfJKY56gQUc0LYOKhTMlDk3Z3xzv2Y2q29T+Cp0927ruHQdRFo3xQAu
kBc1pIjt2Rfuh2EJkyoprMuM/e5E8OZZv1uLbqUOYKxIHCHp05wqNEva5WOhS5Ah
05Un03Vw/CVWnXuNSYR+eKrMydIZALA+GTpQYZuiAo8+dT9R3dmTHjQRqOdck3lB
BU1MzU3kfflmmToi4Pto1PtNPeAaCWMcFNLOgky3L+Wfx2p4WQxYZP2g42be0K4U
m4G6RXHStHCy
=hMax
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20180619' into staging
- cleanup in virtio-ccw
- accommodate guests using vfio-ccw without specifying unlimited
prefetch, but actually working fine
- add cpu model for the z14 Model ZR1
- add support for pxelinux.cfg-style network booting to the s390x
firmware
# gpg: Signature made Tue 19 Jun 2018 10:33:06 BST
# gpg: using RSA key DECF6B93C6F02FAF
# gpg: Good signature from "Cornelia Huck <conny@cornelia-huck.de>"
# gpg: aka "Cornelia Huck <huckc@linux.vnet.ibm.com>"
# gpg: aka "Cornelia Huck <cornelia.huck@de.ibm.com>"
# gpg: aka "Cornelia Huck <cohuck@kernel.org>"
# gpg: aka "Cornelia Huck <cohuck@redhat.com>"
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0 18CE DECF 6B93 C6F0 2FAF
* remotes/cohuck/tags/s390x-20180619:
pc-bios/s390-ccw: Update the s390-netboot.img binary
pc-bios/s390-ccw: Optimize the s390-netboot.img for size
pc-bios/s390-ccw/net: Try to load pxelinux.cfg file accoring to the UUID
pc-bios/s390-ccw/net: Add support for pxelinux-style config files
pc-bios/s390-ccw/net: Update code for the latest changes in SLOF
roms: Update SLOF submodule to current status
pc-bios/s390-ccw: define loadparm length
s390x/cpumodels: add z14 Model ZR1
s390x/ipl: Try to detect Linux vs non Linux for initial IPL PSW
vfio-ccw: remove orb.c64 (64 bit data addresses) check
vfio-ccw: add force unlimited prefetch property
virtio-ccw: clean up notify
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Next batch of ppc and spapr related patches for the 3.0 release.
* Improved handling of Spectre/Meltdown mitigations for POWER8
* Numerous Mac machine type cleanups and improvements
* Cleanup to cpu realize/unrealize path for spapr
* Create a place for machine-specific per-cpu information, and
start moving some things to it
* Assorted bugfixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAlsnLIUACgkQbDjKyiDZ
s5KrEQ//dzJthAN4beCbBKdeY6sOTsmOTMbrKAZyH0zMZBgn1QyfghwqY0iGmbDA
zDJQM62Q4hVUtELWslROqHbdVAo3wa8w3mOddqGeVr3Jla6VGpJxNzJVUgGgGDK0
OoInkljzqnlOGMKn09aW2v21dBbiLakplNDJOCJXw1ZKg75JYgAmo1no/1lgbAUQ
9qBGbGA5wswUK/o/cBatqG180BWDmRLiHyWWAMJzQrvAuz4WvkhRhkBecA14OK2E
Iw3l0LJ8acd4zKVeUhh/FOlsViRi/lTKYvDte14SIADEfvpBW77Ts49NG/vGJFHu
OOEhsdOBIIrMi05lUkeAqGJuECw0Y8ADBfmeB9Z03QeBCio45CFEifPSsFvmi4k9
RpRUHOOSJR0QOB9vp8Ryy68krFcX0NDuH7YnOn3hTFoJiiUC/SNs1O+3CYL0fYdm
IgdN3SoFWJ9vJciBGZuUwBxulVT5z6JoPfAzi8Gz9euebnhcCwBP2gqpUICon0SG
ELnSNRgnTp1GNbAOFb23nItJeyRtf0BisBuNR34Kv4OWUvWEKBE0BRsPi+VQnlxB
PzUgJpibfdMBBqsotNAt9zefQRamrphl12MMR96+melWPfNLsFz+sxMUNji+50lG
rLVgk1/ySEynuU6g3rYegUZSW9W7Iq/5e7CKQ5Iv9IV219f+reg=
=CAxa
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-3.0-20180618' into staging
ppc patch queue 2018-06-18
Next batch of ppc and spapr related patches for the 3.0 release.
* Improved handling of Spectre/Meltdown mitigations for POWER8
* Numerous Mac machine type cleanups and improvements
* Cleanup to cpu realize/unrealize path for spapr
* Create a place for machine-specific per-cpu information, and
start moving some things to it
* Assorted bugfixes
# gpg: Signature made Mon 18 Jun 2018 04:52:37 BST
# gpg: using RSA key 6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-3.0-20180618: (28 commits)
spapr: fix xics_system_init() error path
target/ppc, spapr: Move VPA information to machine_data
ppc/pnv: introduce a pnv_chip_core_realize() routine
spapr_cpu_core: introduce spapr_create_vcpu()
spapr_cpu_core: add missing rollback on realization path
spapr_cpu_core: fix potential leak in spapr_cpu_core_realize()
spapr_cpu_core: convert last snprintf() to g_strdup_printf()
pnv: Add cpu unrealize path
pnv: Clean up cpu realize path
pnv_core: Allocate cpu thread objects individually
pnv: Fix some error handling cpu realize()
spapr: Clean up cpu realize/unrealize paths
sm501: Do not clear read only bits when writing registers
mos6522: expose mos6522_update_irq() through MOS6522DeviceClass
mos6522: remove additional interrupt flag filter from mos6522_update_irq()
mos6522: only clear the shift register interrupt upon write
xics_kvm: fix a build break
mac_newworld: add PMU device
adb: add property to disable direct reg 3 writes
adb: fix read reg 3 byte ordering
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This patch allows the user to specify whether to use active or only
background mode for mirror block jobs. Currently, this setting will
remain constant for the duration of the entire block job.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180613181823.13618-14-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
This patch implements active synchronous mirroring. In active mode, the
passive mechanism will still be in place and is used to copy all
initially dirty clusters off the source disk; but every write request
will write data both to the source and the target disk, so the source
cannot be dirtied faster than data is mirrored to the target. Also,
once the block job has converged (BLOCK_JOB_READY sent), source and
target are guaranteed to stay in sync (unless an error occurs).
Active mode is completely optional and currently disabled at runtime. A
later patch will add a way for users to enable it.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180613181823.13618-13-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180613181823.13618-12-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
This will allow us to access the block job data when the mirror block
driver becomes more complex.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180613181823.13618-11-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
This new function allows to look for a consecutively dirty area in a
dirty bitmap.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20180613181823.13618-10-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Add a function that wraps hbitmap_iter_next() and always calls it in
non-advancing mode first, and in advancing mode next. The result should
always be the same.
By using this function everywhere we called hbitmap_iter_next() before,
we should get good test coverage for non-advancing hbitmap_iter_next().
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20180613181823.13618-9-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
This new parameter allows the caller to just query the next dirty
position without moving the iterator.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20180613181823.13618-8-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Currently, bdrv_replace_node() refuses to create loops from one BDS to
itself if the BDS to be replaced is the backing node of the BDS to
replace it: Say there is a node A and a node B. Replacing B by A means
making all references to B point to A. If B is a child of A (i.e. A has
a reference to B), that would mean we would have to make this reference
point to A itself -- so we'd create a loop.
bdrv_replace_node() (through should_update_child()) refuses to do so if
B is the backing node of A. There is no reason why we should create
loops if B is not the backing node of A, though. The BDS graph should
never contain loops, so we should always refuse to create them.
If B is a child of A and B is to be replaced by A, we should simply
leave B in place there because it is the most sensible choice.
A more specific argument would be: Putting filter drivers into the BDS
graph is basically the same as appending an overlay to a backing chain.
But the main child BDS of a filter driver is not "backing" but "file",
so restricting the no-loop rule to backing nodes would fail here.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180613181823.13618-7-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
With this, the mirror_top_bs is no longer just a technically required
node in the BDS graph but actually represents the block job operation.
Also, drop MirrorBlockJob.source, as we can reach it through
mirror_top_bs->backing.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180613181823.13618-6-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
This patch makes the mirror code differentiate between simply waiting
for any operation to complete (mirror_wait_for_free_in_flight_slot())
and specifically waiting for all operations touching a certain range of
the virtual disk to complete (mirror_wait_on_conflicts()).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180613181823.13618-5-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Attach a CoQueue to each in-flight operation so if we need to wait for
any we can use it to wait instead of just blindly yielding and hoping
for some operation to wake us.
A later patch will use this infrastructure to allow requests accessing
the same area of the virtual disk to specifically wait for each other.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180613181823.13618-4-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
In order to talk to the source BDS (and maybe in the future to the
target BDS as well) directly, we need to convert our existing AIO
requests into coroutine I/O requests.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180613181823.13618-3-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
When converting mirror's I/O to coroutines, we are going to need a point
where these coroutines are created. mirror_perform() is going to be
that point.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180613181823.13618-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Introduce a new global big lock for mon_fdsets. Take it where needed.
The monitor_fdset_get_fd() handling is a bit tricky: now we need to call
qemu_mutex_unlock() which might pollute errno, so we need to make sure
the correct errno be passed up to the callers. To make things simpler,
we let monitor_fdset_get_fd() return the -errno directly when error
happens, then in qemu_open() we move it back into errno.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180608035511.7439-8-peterx@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Before this patch, monitor fd helpers might be called even earlier than
monitor_init_globals(). This can be problematic.
After previous work, now monitor_init_globals() does not depend on
accelerator initialization any more. Call it earlier (before CLI
parsing; that's where the monitor APIs might be called) to make sure it
is called before any of the monitor APIs.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180608035511.7439-7-peterx@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Instead, use a dynamic function to detect which clock we'll use. The
problem is that the old code will let monitor initialization depend on
configure_accelerator() (that's where qtest_enabled() start to take
effect). After this change, we don't have such a dependency any more.
We just need to make sure configure_accelerator() is called when we
start to use it. Now it's only used in monitor_qapi_event_queue() and
monitor_qapi_event_handler(), so we're good.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180608035511.7439-6-peterx@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[monitor_get_event_clock() name and comment tweaked]
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Fix typo in d622cb5879. Meanwhile move these variables close to each
other. monitor_qapi_event_state can be declared static, add that.
Reported-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180608035511.7439-5-peterx@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Add some explicit comments for both Readline and cpu_set/cpu_get helpers
that they do not need the mon_lock protection.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180608035511.7439-4-peterx@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
mon->fds were protected by BQL. Now protect it by mon_lock so that it
can even be used in monitor iothread.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180608035511.7439-3-peterx@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
The out_lock is protecting a few Monitor fields. In the future the
monitor code will start to run in multiple threads. We are going to
turn it into a bigger lock to protect not only the out buffer but also
most of the rest.
Since at it, rearrange the Monitor struct a bit.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180608035511.7439-2-peterx@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
The -O2 optimization flag is passed via CFLAGS to the firmware Makefile,
but in netbook.mak, we've got some rules that only use QEMU_CFLAGS for
compiling the libc and libnet from SLOF, so these files get compiled
without optimization so far. Use CFLAGS here, too, to create faster
and smaller code.
We can additionally save some more bytes in the firmware images by compi-
ling the code with -fno-asynchronous-unwind-tables. This will omit some
ELF sections (used for stack unwinding for example) from the image that
we do not need in the firmware.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
With the STSI instruction, we can get the UUID of the current VM instance,
so we can support loading pxelinux config files via UUID in the file name,
too.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Since it is quite cumbersome to manually create a combined kernel with
initrd image for network booting, we now support loading via pxelinux
configuration files, too. In these files, the kernel, initrd and command
line parameters can be specified seperately, and the firmware then takes
care of glueing everything together in memory after the files have been
downloaded. See this URL for details about the config file layout:
https://www.syslinux.org/wiki/index.php?title=PXELINUX
The user can either specify a config file directly as bootfile via DHCP
(but in this case, the file has to start either with "default" or a "#"
comment so we can distinguish it from binary kernels), or a folder (i.e.
the bootfile name must end with "/") where the firmware should look for
the typical pxelinux.cfg file names, e.g. based on MAC or IP address.
We also support the pxelinux.cfg DHCP options 209 and 210 from RFC 5071.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
The ip_version information now has to be stored in the filename_ip_t
structure, and there is now a common function called tftp_get_error_info()
which can be used to get the error string for a TFTP error code.
We can also get rid of some superfluous "(char *)" casts now.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Loadparm is defined by the s390 architecture to be 8 bytes
in length. Let's define this size in the s390-ccw bios.
Suggested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Removing a drive with drive_del while it is being used to run an I/O
intensive workload can cause QEMU to crash.
An AIO flush can yield at some point:
blk_aio_flush_entry()
blk_co_flush(blk)
bdrv_co_flush(blk->root->bs)
...
qemu_coroutine_yield()
and let the HMP command to run, free blk->root and give control
back to the AIO flush:
hmp_drive_del()
blk_remove_bs()
bdrv_root_unref_child(blk->root)
child_bs = blk->root->bs
bdrv_detach_child(blk->root)
bdrv_replace_child(blk->root, NULL)
blk->root->bs = NULL
g_free(blk->root) <============== blk->root becomes stale
bdrv_unref(child_bs)
bdrv_delete(child_bs)
bdrv_close()
bdrv_drained_begin()
bdrv_do_drained_begin()
bdrv_drain_recurse()
aio_poll()
...
qemu_coroutine_switch()
and the AIO flush completion ends up dereferencing blk->root:
blk_aio_complete()
scsi_aio_complete()
blk_get_aio_context(blk)
bs = blk_bs(blk)
ie, bs = blk->root ? blk->root->bs : NULL
^^^^^
stale
The problem is that we should avoid making block driver graph
changes while we have in-flight requests. Let's drain all I/O
for this BB before calling bdrv_root_unref_child().
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This tests both adding and remove a node between bdrv_drain_all_begin()
and bdrv_drain_all_end(), and enabled the existing detach test for
drain_all.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_drain_all_*() used bdrv_next() to iterate over all root nodes and
did a subtree drain for each of them. This works fine as long as the
graph is static, but sadly, reality looks different.
If the graph changes so that root nodes are added or removed, we would
have to compensate for this. bdrv_next() returns each root node only
once even if it's the root node for multiple BlockBackends or for a
monitor-owned block driver tree, which would only complicate things.
The much easier and more obviously correct way is to fundamentally
change the way the functions work: Iterate over all BlockDriverStates,
no matter who owns them, and drain them individually. Compensation is
only necessary when a new BDS is created inside a drain_all section.
Removal of a BDS doesn't require any action because it's gone afterwards
anyway.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In the future, bdrv_drained_all_begin/end() will drain all invidiual
nodes separately rather than whole subtrees. This means that we don't
want to propagate the drain to all parents any more: If the parent is a
BDS, it will already be drained separately. Recursing to all parents is
unnecessary work and would make it an O(n²) operation.
Prepare the drain function for the changed drain_all by adding an
ignore_bds_parents parameter to the internal implementation that
prevents the propagation of the drain to BDS parents. We still (have to)
propagate it to non-BDS parents like BlockBackends or Jobs because those
are not drained separately.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Before we can introduce a single polling loop for all nodes in
bdrv_drain_all_begin(), we must make sure to run it outside of coroutine
context like we already do for bdrv_do_drained_begin().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_drain_all() wants to have a single polling loop for draining the
in-flight requests of all nodes. This means that the AIO_WAIT_WHILE()
condition relies on activity in multiple AioContexts, which is polled
from the mainloop context. We must therefore call AIO_WAIT_WHILE() from
the mainloop thread and use the AioWait notification mechanism.
Just randomly picking the AioContext of any non-mainloop thread would
work, but instead of bothering to find such a context in the caller, we
can just as well accept NULL for ctx.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>